2009-05-13

Okasaki's Random Access Lists

To close the chapter on Okasaki's Random Access Lists, I must record that there is an errata on the bottom of page 145 of his Purely Functional Data Structures. The argument to the recursive call to update it must read Zero (update (i div 2, p, ps)) end. It took me a while to figure it out, thinking that I had made a mistake.

For completeness, here's the translation for the chapter's pseudo-SML into functional, typesafe OCaml:

module Vec : sig
  type 'a t
  val nil : 'a t
  val cons : 'a -> 'a t -> 'a t
  val head : 'a t -> 'a
  val tail : 'a t -> 'a t
  val length : 'a t -> int
  val index : int -> 'a t -> 'a
  val update : int -> 'a -> 'a t -> 'a t
end = struct
  type 'a t = Nil | Zero of ('a * 'a) t | One  of 'a * ('a * 'a) t

  let nil = Nil

  type cons = { cons : 'a . 'a -> 'a t -> 'a t }
  let cons =
    let rec cons = { cons = fun x l -> match l with
    | Nil         -> One (x, Nil)
    | Zero    ps  -> One (x, ps)
    | One (y, ps) -> Zero (cons.cons (x, y) ps)
    } in cons.cons

  type uncons = { uncons : 'a . 'a t -> 'a * 'a t }
  let uncons =
    let rec uncons = { uncons = function
    | Nil          -> failwith "uncons"
    | One (x, Nil) -> (x, Nil)
    | One (x, ps ) -> (x, Zero ps)
    | Zero    ps   ->
      let ((x, y), ps) = uncons.uncons ps in (x, One (y, ps))
    } in uncons.uncons

  let head l = fst (uncons l)
  and tail l = snd (uncons l)

  type length = { length : 'a . 'a t -> int }
  let length v =
    let rec length = { length = function
    | Nil         -> 0
    | One (_, ps) -> 1 + length.length (Zero ps)
    | Zero    ps  -> 2 * length.length ps
    } in length.length v

  type index = { index : 'a . int -> 'a t -> 'a }
  let index n =
    let rec index = { index = fun n l -> match l with
    | Nil                    -> failwith "index"
    | One (x, ps) when n = 0 -> x
    | One (x, ps)            -> index.index (n - 1) (Zero ps)
    | Zero    ps             ->
      let (l, r) = index.index (n / 2) ps in
      if n mod 2 = 0 then l else r
    } in index.index n

  type update = { update : 'a . int -> ('a -> 'a) -> 'a t -> 'a t }
  let update n e =
    let rec update = { update = fun n f l -> match l with
    | Nil                    -> failwith "update"
    | One (x, ps) when n = 0 -> One (f x, ps)
    | One (x, ps)            -> cons x (update.update (pred n) f (Zero ps))
    | Zero ps                ->
      let g (x, y) = if n mod 2 = 0 then (f x, y) else (x, f y) in
      Zero (update.update (n / 2) g ps)
    } in update.update n (fun _ -> e)
end

Enjoy!

3 comments:

SpiceGuid said...

Thanks, that's a nice nested datatype full-featured example.
Does a fast append exist or just the naive one ?

As i understand it the skewBinaryList allows a fast append yet it's not a nested datatype. I guess your module is exactly what Mike Furr mentions as the missing array module in OCaml-Reins. OCaml-Reins is so much inspired by Okasaki that i would describe it as Okasaki for OCaml.

Matías Giovannini said...

I guess (I haven't tried) that you could write a O(log n) append by emulating a full adder.

The problem with this datatype is that both cons and uncons are O(log n) worst-case. Skew binary random-access lists have two advantages: O(1) cons and uncons, and no special type discipline required to implement the datatype.

As it is, this is just to (1) record the erratum I found in the book, and (2) as a simple example of how to encode structural polymorphism, perhaps to implement more advanced data structures (I'll see how far I can get with regular quadtrees).

SpiceGuid said...

I have some more comments, sorry for the late reply.

1) no special type discipline is required to implement this datatype, nested datatypes allow to statically enforce some invariants, i don't think they really contribute in the performance department.

type 'a tree =
| Leaf of 'a
| Node of 'a tree * 'a tree

type 'a seq =
| Nil
| Zero of 'a seq
| One of 'a tree * 'a seq

let cons x l =
let rec loop x = function
| Nil -> One (x,Nil)
| Zero ps -> One (x,ps)
| One (y,ps) -> Zero (loop (Node(x,y)) ps)
in loop (Leaf x) l


2) i have implemented the full adder operation as you suggested. the motivation is that if i provide a fold then a slow uncons is no longer such a big deal, and the slow cons could be compensated by a fast append plus an unfold.

let rec add a b carry =
match carry with
| None ->
( match a,b with
| _,Nil -> a
| Nil,_ -> b
| One(x,p),Zero q | Zero p,One(x,q) -> One(x,add p q None)
| Zero p,Zero q -> Zero(add p q None)
| One(x,p),One(y,q) -> Zero(add p q (Some(Node(x,y)))) )
| Some c ->
( match a,b with
| Nil,Nil -> One(c,Nil)
| Zero p,Nil | Nil,Zero p -> One(c,p)
| One(x,p),Nil | Nil,One(x,p) -> Zero(add p Nil (Some(Node(x,c))))
| One(x,p),Zero q | Zero p,One(x,q) -> Zero(add p q (Some(Node(x,c))))
| Zero p,Zero q -> One(c,add p q None)
| One(x,p),One(y,q) -> One(c,add p q (Some(Node(x,y)))) )

let merge a b = add a b None

unfortunately, as the name suggests, my merge operation is not what expected, all the items are chained but they are not in the append order.