Wednesday, July 30, 2014

Haskell Binary Serialization of Basic Math Data

I was reading up on some stuff and decided to try to do some things. The end result was a bit of research, long headache and obviously forgotten parentheses. Now despite Haskell's math appeal, it seems to lack a generic binary system to work with even though it supports octal and hexadecimal. This meant hunting down appropriate datatypes and binary modules that would cooperate together. This brought me to Data.Bits, which has all the generic binary operators and Data.Word for the Word8 datatype. The idea was to take some information, serialize it into binary and be able to unserialize it when needed. This would be easy enough, but I need to complicate things. Each piece of information only needs 4 bits, so I decided to cram 2 pieces of information on one byte.

So the first goal is to break down the data into some variables so I can be lazy, as well as have a point of reference. It was basically just to be a table of contents. That is of course, after the imports.

import Data.Bits
import Data.Word

zero    = 0 :: Word8
one     = 1 :: Word8
two     = 2 :: Word8
three   = 3 :: Word8
four    = 4 :: Word8
five    = 5 :: Word8
six     = 6 :: Word8
seven   = 7 :: Word8
eight   = 8 :: Word8
nine    = 9 :: Word8
plus    = 10:: Word8
minus   = 11:: Word8
times   = 12:: Word8
divide  = 13:: Word8
ignore  = 15:: Word88

Easy enough, simple math info. The ignore piece is just to have a filler should something accidentally slip into the stream. Okay, now that that is all out of the way, it's time to look at serializing a String.

serialize :: String -> [Word8]

serialize [] =
    []

serialize (x:y:xs) =
    sHelper y ((sHelper x 0) `shiftL` 4) : serialize xs

serialize [x] =
    sHelper x 0 `shiftL` 4 : []

sHelper :: Char -> Word8 -> Word8

sHelper c i
    | c == '0' = i
    | c == '1' = i .|. one
    | c == '2' = i .|. two
    | c == '3' = i .|. three
    | c == '4' = i .|. four
    | c == '5' = i .|. five
    | c == '6' = i .|. six
    | c == '7' = i .|. seven
    | c == '8' = i .|. eight
    | c == '9' = i .|. nine
    | c == '+' = i .|. plus
    | c == '-' = i .|. minus
    | c == '*' = i .|. times
    | c == '/' = i .|. divide
    | otherwise= i .|. ignore

The serialize function is to break down the information to be consumed by the helper function. The first character goes to the helper, which takes a Word8 that may or may not contain information. I did this because that seemed easiest, just pass a null byte. Since only 4 bits are used, I then shift the bits to the left for and OR some more information on. If there is an odd number of information, simply pad it with ones to be sure it's not mistaken for anything else (since all zeroes is a zero).

Now that we have that sorted, the next thing to do is reverse the process. Like the serialize function, the unserialize function will handle erroneous data simply by using a pound sign (#).

unserialize :: [Word8] -> String

unserialize [] =
    []

unserialize (x:xs) =
    usHelper (x `shiftR` 4) : usHelper x : unserialize xs

usHelper :: Word8 -> Char

usHelper i
    | test == divide = '/'
    | test == times  = '*'
    | test == minus  = '-'
    | test == plus   = '+'
    | test == nine   = '9'
    | test == eight  = '8'
    | test == seven  = '7'
    | test == six    = '6'
    | test == five   = '5'
    | test == four   = '4'
    | test == three  = '3'
    | test == two    = '2'
    | test == one    = '1'
    | test == zero   = '0'
    | otherwise      = '#'
    where
        test = i .&. ignore

Since we are only using 4 bits, I took the same approach of shifting the bits to work with them. Shifting aside, to extract the number  of bits we need, it's as simple as ANDing against all ones of the bits we wish to check, then comparing that to the expected value. This may be a bit more extravagant than it needs to be, but it works and it seems efficient enough (running in GHCi and serializing/unserializing 1000000 bytes).

Tag Cloud

.NET (1) A+ (1) addon (6) Android (3) anonymous functions (5) application (9) arduino (1) artificial intelligence (2) bash (3) c (7) camera (1) certifications (1) cobol (1) comptia (2) computing (2) css (2) customize (15) encryption (2) error (15) exploit (13) ftp (2) gadget (2) games (2) Gtk (1) GUI (5) hardware (6) haskell (15) help (5) HTML (4) irc (1) java (5) javascript (20) Linux (18) Mac (4) malware (1) math (8) network (5) objects (2) OCaml (1) perl (4) php (8) plugin (6) programming (42) python (24) radio (1) regex (3) security (21) sound (1) speakers (1) ssh (1) telnet (1) tools (11) troubleshooting (1) Ubuntu (3) Unix (4) virtualization (1) web design (14) Windows (6) wx (2)