Sunday, April 17, 2011

Implementing scanl for Iteratees

In looking over Data.Enumerator.List, I noticed that scanl was missing, So I set out to write it. I'm posting it as a literate Haskell file. I intend to create a patch for the maintainer.

Start with a standard module declaration


>module Data.Enumerator.Scanl where
>
>import Prelude hiding (scanl, scanl1)
>import Data.Enumerator
>import qualified Data.Enumerator.List as EL


The code seems strangely devoid of an operator for repeatedly chaining enumeratees together, so I created one.


>infixl 0 $|
>($|) :: Monad m
>      => Enumerator ao m (Step ai m b)
>      -> Enumeratee ao ai m b
>      -> Enumerator ai m b
>($|) = joinE


scanl takes a function and a seed value, and combines the seed and any input to get a new seed. The seeds (including the first) are fed along the pipe.


>scanl :: (Monad m) => (ai -> ao -> ai) -> ai -> Enumeratee ao ai m b


I borrowed the structure of Data.Enumerator.List.map for most of this. CheckDone is a common construction for Enumeratees, if the inner iteratee yields, the outer one yields as well.


>scanl op ai0 = checkDone (continue . step ai0) where


The final step here is important. For map it is:


  step k EOF = yield (Continue k) EOF


For scanl, we have one more piece of data (the last seed) that needs to be fed into the iteratee.


>  step ai0 k EOF = k (Chunks [ai0]) >>== \step -> yield step EOF
>  step ai0 k (Chunks xs) = loop ai0 k xs
>
>  loop ai0 k [] = continue (step ai0 k)


Note that we compute the next seed and store it, but send the old seed to the iteratee.


>  loop ai0 k (x:xs) = case ai0 `op` x of
>    ai1 -> k (Chunks [ai0]) >>==
>      checkDoneEx (Chunks xs) (\k' -> loop ai1 k' xs)


And now, some test code:


>cubes :: Monad m => Enumerator Integer m b
>cubes = EL.replicate 7 6 $| scanl (+) 12 $| scanl (+) 7 $| scanl (+) 1


The important thing about the test code is that there should be exactly 10 cubes in the list. Each scanl adds exactly one element, and we start with 7 repetitions of 6, so we should (and do) have 10 cubes.
and to test:

>main = do
>  list <- run_ $ cubes $$ EL.take 10
>  print list

Friday, April 15, 2011

Iteratees part II

I'd like to go over the basics of the enumerator package for Haskell, covering the types, and re-solving the cube-permutation problem from the last post. I'm not the first to cover this, see this explanation of the enumerator package.

The first datatype defined in the enumerator package is Stream:



data Stream a
= Chunks [a]
| EOF
deriving (Show, Eq)

Which is fairly straightforward. Next is a Step:



data Step a m b
= Continue (Stream a -> Iteratee a m b)
| Yield b (Stream a)
| Error Exc.SomeException



I'll explain what an Iteratee is next. A step is one of 3 states: Continue, it can accept more data. Yield, it can't accept any more data, and Error, an error occured. Note that unlike EarlyFoldable, a Step doesn't have a type for the state, instead it passes a continuation that holds the state.


So what is an Iteratee? It's simply a Step wrapped in a Monad.



newtype Iteratee a m b = Iteratee
{ runIteratee :: m (Step a m b)
}

Let's see if we can solve the cube problem using an iteratee.




module Main where


import Data.List
import Data.Enumerator hiding (head, foldl', repeat)
import qualified Data.Enumerator.List as EL (head)
import qualified Data.Map as M


digitize :: (Integral a) => a -> [a]
digitize 0 = [0]
digitize n = (reverse . (unfoldr strip_digit)) n
    where 
        strip_digit 0 = Nothing
        strip_digit a =
            case (a `divMod` 10) of
                (a,b)  -> Just (b,a)


undigitize :: Integral a => [a] -> a
undigitize = foldl' (\x y -> x*10+y) 0


--digitize and undigitize are for number processing


cubes :: [Integer]
cubes = scanl (+) 1 $ scanl (+) 7 $ scanl (+) 12 $ repeat 6


make_key :: Integer -> Integer
make_key = undigitize . sortBy (flip compare) . digitize


update_map :: Integer -> M.Map Integer (Integer, Int) -> M.Map Integer (Integer, Int)
update_map n = M.insertWith
  (\(n2, count2) (n1, count1) -> (n1, (count1+count2)))
  (make_key n) (n, 1)


--cubes, make_key and update_map are the same as from last time.


cube_perm_iteratee :: (Monad m) => Iteratee Integer m Integer
cube_perm_iteratee = go M.empty
  where
    go m = do -- Iteratee's are a Monad, to be covered later.
      (Just n) <- EL.head --EL.head reads at most one element.
      case update_map n m of
        m' | (snd $ m' M.! (make_key n)) == 5 ->
          yield (fst $ m' M.! (make_key n)) (Chunks [])
        m' -> go m'


main = do
  -- The driver code will be covered on another article on Iteratees
  result <- run_ $ (enumList 100 cubes) $$ cube_perm_iteratee
  print result



Wednesday, April 13, 2011

Putting an early terminating left fold to work.

I'd like to go over a piece of code that I wrote a while ago to solve a simple problem: Find the first cube to have 5 permutations of its digits that are also cubes. It's a project euler problem. My solution:

class EarlyFoldable f where
    earlyFoldlc :: (a -> b -> Either c a) -> a -> c -> f b -> c
    earlyFoldl :: (a -> b -> Either a a) -> a -> f b -> a
    earlyFoldl h a f = earlyFoldlc h a a f
    {-# INLINE earlyFoldl #-}

instance EarlyFoldable [] where
    earlyFoldlc f a c [] = c
    earlyFoldlc f a c (b:bs) =
        case (f a b) of 
            Left c' -> c'
            Right a' -> earlyFoldlc f a' c bs

p62_cubes = scanl (+) 1 $ scanl (+) 7 $ scanl (+) 12 $ repeat 6
p62_make_key = undigitize . sortBy (flip compare) . digitize
p62_update_map n = M.insertWith (+) (p62_make_key n) 1
p62_early_update m n =
    case p62_update_map n m of
        m' | m' M.! (p62_make_key n) == 5 -> Left n
        m' -> Right m'

p62 = earlyFoldlc p62_early_update M.empty 0 p62_cubes.

Now, some explanation. I defined a type class for an early terminating fold, with the obvious instance for list. earlyFoldlc returns Either c a, where c is the return type, and a is an intermediate state: an accumulator. The definition of p62_cubes should be familiar from a previous post. p62_make_key simply makes an integer key out of an integer. (It sorts the values in descending order so that zeros aren't lost.) p62_update_map is a simple helper. This is a case where we can't use a standard left or right fold. We can't use a standard left fold because the list is infinitely long, and we can't use a right fold because we need to look at the accumulated value before deciding to return.

Sunday, April 10, 2011

Computing Integer powers without multiplication.

An interesting fact about the squares is that if you're computing a list of them, you can do it without multiplication. A quick examination of the squares and their differences shows how:

 3 5 7  9  11  13  15  17  19
1 4 9 16 25  36  49  64  81  100


Cursory examination shows that the differences of the differences is 2. We could exploit this to create an infinite list of the squares in Haskell:


squares = scanl (+) 1 $ scanl (+) 3 $ repeat 2


scanl is a cousin to foldl, but it computes all the intermediate results as well. (If this is unfamiliar, scans and folds are fairly simple ways to learn a little Haskell.)


For a programming task, I needed a list of all the cubes. It turns out that you can do something similar with the cubes.

    6   6   6   6    6
  12  18  24  30   36    42 
 7  19  37  61   91   127   169
1 8   27  64  125  216   343  512  729  1000


cubes = scanl (+) 1 $ scanl (+) 7 $ scanl (+) 12 $ repeat 6


In order to do the cubes, you have to have one more level of indirection. Being curious, I wanted to figure out where the inital values came from, and If we could generalize to any integer power of integers. I calculated out the values for n^4, and the starting values were 1,15,50,60,24. This, combined with the initial values for cubes, made me think of the Stirling numbers. It looks like the starting values are:

{n} * k!
{k}


Where n is the power, and k is the row, bottom first. You can prove this is true for any n. Consider the operator ∆: ∆f(x) = f(x+1) - f(x). (∆ is the h-derivative for h = 1). ∆(xn) is hard to compute, but the function x to the n falling: xn = x*(x-1)*(x-2)*...*(x-n) is much nicer. ∆xn = nxn-1. The Stirling numbers relate ordinary powers to falling powers. xn = ∑{n k}xk


Applying ∆ to the above equation, we get:
1xn = ∑k>1k{n k}xn-1 + 1!{n 1}.
Repeating, we get
2xn = ∑k>2k(k-1){n k}xn-2 + 2!{n 2}.
By extension,
j(xn) = ∑k>jk!/(k-j)!{n k}xn-l + j!{n j}.
n(xn) = n!{n n} = n!.

A quick test on n = 5 helps to verify this. The stirling numbers for n = 5 are: 1, 31, 90, 65, 15, 1. So the starting coefficients for n = 5 are 1*0! = 1, 31*1! = 31, 90 * 2! = 180, 65 * 3! = 390, 15 * 4! = 360, 1 * 5! = 120.


            120
         360   480
      390   750    1330
   180   570   1320    2550
 31   211   781    2101    4651
1   32   243   1024    3125    7776


Yep, everything looks like 5th powers.

Saturday, April 9, 2011

On Iteratee's, part I

There's a concept that's been explored recently in Haskell, the idea of enumerated IO. I'd like to go over the idea in detail.

Folds.

The basic idea is that of a fold. In Haskell, there are two basic folds over lists.

foldr :: (b->a->a) -> a -> [b] -> a
foldr f y0 [] = y0 -- folding an empty list returns the base case.
foldr f y0 (x:xs) = f x (foldr f y0 xs)


-- Note that outermost is f. This is important in Haskell.
-- You can right fold over an infinite list, but not a left fold.


foldl :: (a->b->a) -> a -> [b] -> a
foldl f y0 [] = y0 -- folding an empty list returns the base case.
foldl f y0 (x:xs) = foldl (f x y0) xs


-- There's also a strict version foldl'.


Read "foldr ::" as "foldr has the type". foldr takes a function, a value of type a, a list of b's, and returns an a.
foldl is similar. For this post, we'll mainly deal with left folds. A good example to start with is the function sum:


sum :: [Int] -> Int  --sum maps a list of Ints to an Int
sum xs = foldl (+) 0 xs


reverse :: [a] -> a
reverse as = foldl (flip (:)) [] as


A left fold has the advantage that it uses an accumulating parameter. It gets compiled down to a tight loop. You would write sum as a strict left fold. A right fold has the advantage that it can skip part of the input. You would write and as a right fold.

and :: [Bool] -> Bool
and as = foldr (&&) True as.


This has the nice property that it returns as soon as it sees a False (short circuit evaluation).
There's a way to get both properties in a left fold. It's called an early terminating left fold.

etfoldl :: (a->b->(a, Bool)) -> a -> [b] -> a
etfoldl f y0 [] = y0
etfoldl f y0 (x:xs) =
  case (f y0 x) of
    (y, True) -> y  -- True means we're done.
    (y, False) -> etfoldl f y xs


I used it to solve some problems where I was looking for the smallest value that satisfied some condition, but I had to chain some data structure along. Early terminating folds are a good place to stop.


Next up, Iteratees, A generalization of early terminating folds.
                          

Friday, April 8, 2011

Blog Title

A few of my readers may be wondering: "What is a zeppelin bend?" It's a knot.
It gets more nerdy. The reason I chose a knot, is because one of the coolest things you'll encounter if you learn Haskell is Knot Tying, which is essentially passing the result of a computation into the computation, (Think like a credit card, buy now, pay later). It works as long as you don't peek at the result. You can use it to create loops in data structures in a pure way. In most other languages, you would just use mutable state.

My work on RestyGWT

For those that aren't in the know about Java and Google, GWT is the Google Web Toolkit. It's a magical framework that turns Java into JavaScript, and wishes into rainbows. One cool feature is that you can write server rpc's that have the same interface in Java and JavaScript.  This works great if you only intend to write your server in Java, or your clients in GWT.  I didn't want that constraint, so I looked around and found RestyGWT, which does similar magic to allow you to call REST interfaces just like they were java methods.

I added the ability to send and receive overlay types, which get compiled to native javascript objects. Basically you use the JSON data returned via rest as is, rather than processing into a Java Object, which is a little more heavy weight. It's also useful in cases where the server and client don't share a codebase, or even the same language. Since you'd have to model the data in Java anyway, it might as well be an overlay type.