Saturday, May 18, 2013

Perceptrons in Haskell

I was talking to some friends about artificial intelligence and wanting to learn something a bit more complicated than some rigid pattern matching. So I was linked an article about perceptrons. Now as far as I've learned about them, a perceptron can learn linear patterns. The learning part is what makes it cool, the linear part makes it a bit restricted without some more advanced stuff, but I'm not that far into it yet. So let's just dive into how a perceptron works.

So a perceptron has a "weight" that it uses to get a value from the inputs using a dot product, compares it to a threshold and gives a binary output. The output is then compared to an expected output, if this is incorrect, it will attempt to "learn" by adjusting the weights.

So the first thing we need to know is what a dot product is. A dot product takes two equal length sets of data and returns a single product.

-- weights (dot) inputValues
dotProduct :: Fractional a => [a] -> [a] -> a
dotProduct (w:ws) (i:is) =
    w * i + dotProduct ws is

dotProduct [] [] =
    0.0

Mathematically, it is the summation of weightn times inputn. So far so good, however we want a datatype to represent the perceptron. The way I did it was to have a list of weights and the threshold, I just figured it would be good to keep with it as the threshold is when the perceptron should know when to fire. This one is nice and easy to understand.

data Perceptron a = Perceptron {
    threshold   :: a,   -- When it should fire
    weights     :: [a]  -- The weight each value is worth
} deriving Show

I didn't do any enforcement of type because the functions handle all of that. I actually kept any error handling or pattern matching to a minimum because if you're not going to use it right, well then there's really no recovery from that point.

Now the next thing we want to do is make a function to create a perceptron because writing out a record syntax in Haskell is a bit of a nuisance when we can make it more compact and easy to read kind of sort of.

initPerceptrons :: Fractional a => a -> [a] -> Perceptron a
initPerceptrons t w =
    Perceptron {
        threshold = t,
        weights = w
    }

Okay, so now that we have a function to create perceptrons, we want to make a function to teach the perceptrons. To teach a perceptron, we alter the weights. The process of teaching involves changing every weight in the correct direction by a certain rate, meaning it may take multiple iterations to get the correct weights. The equation is simply this:

weightn = weightn + learning_rate × (expected_output - actual_output) × input

Converting this to code is a bit of a two-step because I'm using the record syntax.

teach :: Fractional a => Perceptron a -> [a] -> a -> a -> Perceptron a
teach (Perceptron { threshold = t, weights = w }) inputs err rate =
    Perceptron {
        threshold = t,
        weights = getWeights w inputs rate err
    }
    where
        getWeights :: Fractional a => [a] -> [a] -> a -> a -> [a]
        getWeights (w:ws) (i:is) rate err =
            w + (rate * err * i) : getWeights ws is rate err

        getWeights [] [] _ _ = []

If you have been following closely, you may notice that err is in place of (expected_output - actual_output). That is because it would never change per calculation on top of the value is used elsewhere outside of this function. As a result, I decided to just reduce it in place of the larger picture. This won't become apparent until the full example I have is shown.

Now there's one more thing I want to cover that I have not seen anyone in anything I've read really get into, however it was mentioned on the wiki page for perceptrons. That is, when working with binary inputs, a zero represents a negative input (-1). With some testing done by a friend and myself, when we change the 0 inputs with a -1 for the teaching process, the amount of iterations needed to learn were greatly reduced. This to myself also makes sense because a zero input should still have signficance as far as weight when working with binary. So that's how I do it and I made a quick little function to do it for me with some list comprehension because why not.

setBinInputs inputs =
    [if i == 0 then (-1) else 1 | i <- inputs]

Now this is all I used to implement a test case. My test case will teach OR logic to starting weights all of zero. Read all the way to see why there are 3 inputs (summary in comments).

testCaseOR = testCaseOR1 $ initPerceptrons 0.5 [0, 0, 0]

testCaseOR1 ps =
    testCaseOR2 0 is os ps
    where
        is =
            [ [1, 0, 0]  -- All the inputs
            , [1, 0, 1]  -- First input remains constant
            , [1, 1, 0]  -- This takes place of a bias
            , [1, 1, 1]] -- It's the zeroth order term

        os = [ 0, 1, 1, 1] -- Boring expected outputs

testCaseOR2 errCount (i:is) (o:os) ps
    -- No error, no need to teach
    -- Can try to if you want, weights wouldn't change
    | err == 0 = testCaseOR2 errCount is os ps
    -- This means there was an error, time to get some learning done
    | otherwise =
        testCaseOR2 (errCount + 1) is os (teach ps (setBinInputs i) err rate)
    where
        rate = 0.1 -- Too large or small can cause a problem

        out =
            if ((weights ps) `dotProduct` i) > (threshold ps)
                then 1
                else 0

        err = o - out

testCaseOR2 errCount [] [] ps
    -- There were no errors, everything works
    | errCount == 0 = ps
    -- There were errors, need to iterate again
    | otherwise =
        testCaseOR1 ps

You'll have to excuse the imperative undertone, however when doing this, it made sense to keep some state information. It's also a bit messy, but that's what happens when you're tired and try to do something rather than opt in for some sleep.

Now as to why there are three inputs is a bit strange and I don't fully understand it myself, but here's my best attempt at explaining what I know. It is a term that cannot change. The bias itself can be shown in one of two ways, adding it to the dot product of the inputs or as a weight itself (the example uses it as a weight). The bias can be any non-zero number. When we set it up like it is in the example, it can be any constant non-zero number but traditionally people use one. Without the bias, the result would always be zero because of some property about planes and lines that I don't quite understand.

Either way, the bottom line is if you remove that, you get an infinite loop and nothing works.

Hope this helps someone. If you see any errors or something I overlooked, let me know so I can fix it.

Thursday, May 2, 2013

The cogs of functools.partial

(SOURCE CODE AT BOTTOM)

I noticed people ending up at my site looking for how functools.partial in Python works. I didn't know the answer, but I figured it would be worth looking into since I think partial function application has some good potential to it. My findings where a bit deeper and more complicated than I anticipated, so feel free to correct me if I get something wrong.

Now the rough pure Python example given on the documents page is actually pretty easy to pick apart. The design is a function that creates a function that calls a function. Might seem hard to follow at first, but it's similar to a decorator.

def partial(func, *args, **keywords):
    def newfunc(*fargs, **fkeywords):
        newkeywords = keywords.copy()
        newkeywords.update(fkeywords)
        return func(*(args + fargs), **newkeywords)
    newfunc.func = func
    newfunc.args = args
    newfunc.keywords = keywords
    return newfunc

First off, partial will return a function. The function returned is responsible for storing arguments and keyword arguments as well as the function itself. When the return function is called, it appends the arguments then merges the keyword arguments (meaning keyword arguments can overwrite one that is already there), calls the argument with all that, then returns the value.

As simple as this is, it is a very rough explanation of how the real partial function actually works. One of the first differences is the guts of partial is written in C. My guess is this is to improve speed. Another difference is that partial returns the type partial and not a function. While this is treated like a function, it's a type.

Now as to how it works, this is a bit confusing. From what I gather, it creates a partial object that stores arguments (args), keyword arguments (keywords), and a function (func), all of which are actually rather easy to access.

>>> part = functools.partial(dict, a=1, b=2)
>>> dir(part)
['__call__', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', 'args', 'func', 'keywords']
>>> part.keywords
{'a': 1, 'b': 2}
>>> type(part)
<type 'functools.partial'>
>>> part.keywords = {}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: readonly attribute

The last part is the kicker. Attributes cannot be modified like any other object.

When the partial object is called, it goes through a process of appending on arguments, merging keywords (updating), then calls it with those arguments and keyword arguments. So a similar concept and pattern occurs. Advantages of using C is that it is faster and the readonly is a nice touch, as partial applications should use read only. I'm personally not a fan of being to update keyword arguments, seems inconsistent and like it could lead to misunderstandings and unnecessary side effects (from a functional programming standpoint, where I grew accustomed to partial applications and currying).

Now for the source code:
http://hg.python.org/cpython/file/c6880edaf6f3/Modules/_functoolsmodule.c

Tag Cloud

.NET (1) A+ (1) addon (6) Android (3) anonymous functions (5) application (9) arduino (1) artificial intelligence (2) bash (3) c (7) camera (1) certifications (1) cobol (1) comptia (2) computing (2) css (2) customize (15) encryption (2) error (15) exploit (13) ftp (2) gadget (2) games (2) Gtk (1) GUI (5) hardware (6) haskell (15) help (5) HTML (4) irc (1) java (5) javascript (20) Linux (18) Mac (4) malware (1) math (8) network (5) objects (2) OCaml (1) perl (4) php (8) plugin (6) programming (42) python (24) radio (1) regex (3) security (21) sound (1) speakers (1) ssh (1) telnet (1) tools (11) troubleshooting (1) Ubuntu (3) Unix (4) virtualization (1) web design (14) Windows (6) wx (2)