I was talking to some friends about artificial intelligence and wanting to learn something a bit more complicated than some rigid pattern matching. So I was linked an article about perceptrons. Now as far as I've learned about them, a perceptron can learn linear patterns. The learning part is what makes it cool, the linear part makes it a bit restricted without some more advanced stuff, but I'm not that far into it yet. So let's just dive into how a perceptron works.

So a perceptron has a "weight" that it uses to get a value from the inputs using a dot product, compares it to a threshold and gives a binary output. The output is then compared to an expected output, if this is incorrect, it will attempt to "learn" by adjusting the weights.

So the first thing we need to know is what a dot product is. A dot product takes two equal length sets of data and returns a single product.

Mathematically, it is the summation of weight

I didn't do any enforcement of type because the functions handle all of that. I actually kept any error handling or pattern matching to a minimum because if you're not going to use it right, well then there's really no recovery from that point.

Now the next thing we want to do is make a function to create a perceptron because writing out a record syntax in Haskell is a bit of a nuisance when we can make it more compact and easy to read kind of sort of.

Okay, so now that we have a function to create perceptrons, we want to make a function to teach the perceptrons. To teach a perceptron, we alter the weights. The process of teaching involves changing every weight in the correct direction by a certain rate, meaning it may take multiple iterations to get the correct weights. The equation is simply this:

weight

Converting this to code is a bit of a two-step because I'm using the record syntax.

If you have been following closely, you may notice that

Now there's one more thing I want to cover that I have not seen anyone in anything I've read really get into, however it was mentioned on the wiki page for perceptrons. That is, when working with binary inputs, a zero represents a negative input (-1). With some testing done by a friend and myself, when we change the 0 inputs with a -1 for the teaching process, the amount of iterations needed to learn were greatly reduced. This to myself also makes sense because a zero input should still have signficance as far as weight when working with binary. So that's how I do it and I made a quick little function to do it for me with some list comprehension because why not.

Now this is all I used to implement a test case. My test case will teach OR logic to starting weights all of zero. Read all the way to see why there are 3 inputs (summary in comments).

You'll have to excuse the imperative undertone, however when doing this, it made sense to keep some state information. It's also a bit messy, but that's what happens when you're tired and try to do something rather than opt in for some sleep.

Now as to why there are three inputs is a bit strange and I don't fully understand it myself, but here's my best attempt at explaining what I know. It is a term that cannot change. The bias itself can be shown in one of two ways, adding it to the dot product of the inputs or as a weight itself (the example uses it as a weight). The bias can be any non-zero number. When we set it up like it is in the example, it can be any constant non-zero number but traditionally people use one. Without the bias, the result would always be zero because of some property about planes and lines that I don't quite understand.

Either way, the bottom line is if you remove that, you get an infinite loop and nothing works.

Hope this helps someone. If you see any errors or something I overlooked, let me know so I can fix it.

So a perceptron has a "weight" that it uses to get a value from the inputs using a dot product, compares it to a threshold and gives a binary output. The output is then compared to an expected output, if this is incorrect, it will attempt to "learn" by adjusting the weights.

So the first thing we need to know is what a dot product is. A dot product takes two equal length sets of data and returns a single product.

-- weights (dot) inputValues dotProduct :: Fractional a => [a] -> [a] -> a dotProduct (w:ws) (i:is) = w * i + dotProduct ws is dotProduct [] [] = 0.0

Mathematically, it is the summation of weight

_{n}times input_{n}. So far so good, however we want a datatype to represent the perceptron. The way I did it was to have a list of weights and the threshold, I just figured it would be good to keep with it as the threshold is when the perceptron should know when to fire. This one is nice and easy to understand.data Perceptron a = Perceptron { threshold :: a, -- When it should fire weights :: [a] -- The weight each value is worth } deriving Show

I didn't do any enforcement of type because the functions handle all of that. I actually kept any error handling or pattern matching to a minimum because if you're not going to use it right, well then there's really no recovery from that point.

Now the next thing we want to do is make a function to create a perceptron because writing out a record syntax in Haskell is a bit of a nuisance when we can make it more compact and easy to read kind of sort of.

initPerceptrons :: Fractional a => a -> [a] -> Perceptron a initPerceptrons t w = Perceptron { threshold = t, weights = w }

Okay, so now that we have a function to create perceptrons, we want to make a function to teach the perceptrons. To teach a perceptron, we alter the weights. The process of teaching involves changing every weight in the correct direction by a certain rate, meaning it may take multiple iterations to get the correct weights. The equation is simply this:

weight

_{n}= weight_{n}+ learning_rate × (expected_output - actual_output) × inputConverting this to code is a bit of a two-step because I'm using the record syntax.

teach :: Fractional a => Perceptron a -> [a] -> a -> a -> Perceptron a teach (Perceptron { threshold = t, weights = w }) inputs err rate = Perceptron { threshold = t, weights = getWeights w inputs rate err } where getWeights :: Fractional a => [a] -> [a] -> a -> a -> [a] getWeights (w:ws) (i:is) rate err = w + (rate * err * i) : getWeights ws is rate err getWeights [] [] _ _ = []

If you have been following closely, you may notice that

`err`

is in place of (expected_output - actual_output). That is because it would never change per calculation on top of the value is used elsewhere outside of this function. As a result, I decided to just reduce it in place of the larger picture. This won't become apparent until the full example I have is shown.Now there's one more thing I want to cover that I have not seen anyone in anything I've read really get into, however it was mentioned on the wiki page for perceptrons. That is, when working with binary inputs, a zero represents a negative input (-1). With some testing done by a friend and myself, when we change the 0 inputs with a -1 for the teaching process, the amount of iterations needed to learn were greatly reduced. This to myself also makes sense because a zero input should still have signficance as far as weight when working with binary. So that's how I do it and I made a quick little function to do it for me with some list comprehension because why not.

setBinInputs inputs = [if i == 0 then (-1) else 1 | i <- inputs]

Now this is all I used to implement a test case. My test case will teach OR logic to starting weights all of zero. Read all the way to see why there are 3 inputs (summary in comments).

testCaseOR = testCaseOR1 $ initPerceptrons 0.5 [0, 0, 0] testCaseOR1 ps = testCaseOR2 0 is os ps where is = [ [1, 0, 0] -- All the inputs , [1, 0, 1] -- First input remains constant , [1, 1, 0] -- This takes place of a bias , [1, 1, 1]] -- It's the zeroth order term os = [ 0, 1, 1, 1] -- Boring expected outputs testCaseOR2 errCount (i:is) (o:os) ps -- No error, no need to teach -- Can try to if you want, weights wouldn't change | err == 0 = testCaseOR2 errCount is os ps -- This means there was an error, time to get some learning done | otherwise = testCaseOR2 (errCount + 1) is os (teach ps (setBinInputs i) err rate) where rate = 0.1 -- Too large or small can cause a problem out = if ((weights ps) `dotProduct` i) > (threshold ps) then 1 else 0 err = o - out testCaseOR2 errCount [] [] ps -- There were no errors, everything works | errCount == 0 = ps -- There were errors, need to iterate again | otherwise = testCaseOR1 ps

You'll have to excuse the imperative undertone, however when doing this, it made sense to keep some state information. It's also a bit messy, but that's what happens when you're tired and try to do something rather than opt in for some sleep.

Now as to why there are three inputs is a bit strange and I don't fully understand it myself, but here's my best attempt at explaining what I know. It is a term that cannot change. The bias itself can be shown in one of two ways, adding it to the dot product of the inputs or as a weight itself (the example uses it as a weight). The bias can be any non-zero number. When we set it up like it is in the example, it can be any constant non-zero number but traditionally people use one. Without the bias, the result would always be zero because of some property about planes and lines that I don't quite understand.

Either way, the bottom line is if you remove that, you get an infinite loop and nothing works.

Hope this helps someone. If you see any errors or something I overlooked, let me know so I can fix it.