2011-12-30

(One by) Four by Nine


The four nines puzzle asks which positive numbers can be obtained from arithmetic expressions involving four "9" digits and an assortment of operations, minimally "+", "-" (binary and unary), "×" and "/", also frequently "√" and sometimes "!" (factorial). I'll show how to find them by brute force. In this case, I'll forgo using factorials; this means that not every number under 100 can be so obtained. As it is, at 130-odd lines, this is a longish program.


Expressions are labeled trees, where type α is the type of labels and type β is the type of leaves:


type ('a, 'b) expr =
| Con of 'a * 'b
| Neg of 'a * ('a, 'b) expr
| Sqr of 'a * ('a, 'b) expr
| Add of 'a * ('a, 'b) expr * ('a, 'b) expr
| Sub of 'a * ('a, 'b) expr * ('a, 'b) expr
| Mul of 'a * ('a, 'b) expr * ('a, 'b) expr
| Div of 'a * ('a, 'b) expr * ('a, 'b) expr

Later I'll make clear what labels are useful for. For now, the simplest operation on an expression is extracting its label:


let label = function
| Con (l, _    )
| Neg (l, _    )
| Sqr (l, _    )
| Add (l, _, _ )
| Sub (l, _, _ )
| Mul (l, _, _ )
| Div (l, _, _ ) -> l

Note that this is not a recursive function; it just extracts the label of the root. Converting expression trees into algebraic syntax is a tedious exercise in unparsing a precedence operator grammar. In this case, since the leaves are of an arbitrary type, format needs a conversion function cnv:


let format cnv e =
  let buf = Buffer.create 10 in
  let rec go level e =
    let prf op prec e =
      Buffer.add_string buf op;
      if prec < level then Buffer.add_char buf '(';
      go prec e;
      if prec < level then Buffer.add_char buf ')'
    in
    let bin op prec e e' =
      if prec < level then Buffer.add_char buf '(';
      go prec e;
      Buffer.add_char   buf ' ';
      Buffer.add_string buf op;
      Buffer.add_char   buf ' ';
      go prec e';
      if prec < level then Buffer.add_char buf ')'
    in match e with
    | Con (_, x    ) -> Buffer.add_string buf (cnv x)
    | Neg (_, e    ) -> prf "-" 10 e
    | Sqr (_, e    ) -> prf "\xe2\x88\x9a" 10 e
    | Add (_, e, e') -> bin "+"  1 e e'
    | Sub (_, e, e') -> bin "-"  2 e e'
    | Mul (_, e, e') -> bin "*"  5 e e'
    | Div (_, e, e') -> bin "/"  6 e e'
  in go 0 e; Buffer.contents buf

The inner function prf formats prefix operators with precedence prec, while bin formats binary operators. In both cases, if op's binding power prec is less than the current precedence level, the whole expression is surrounded by parentheses. Note that I use the UTF-8 representation of the radical sign U+221A; who said that OCaml doesn't do Unicode?


If expressions are labeled with their values, they can be evaluated in constant time with label; to avoid losing precision, I use rational labels. If any expression is out of range, I use the "fraction" 1/0, infinity_ratio, as a sentinel. For this to work with OCaml ratios, I must turn off error checking on null denominators:


let () = Arith_status.set_error_when_null_denominator false

let infinity_ratio = Ratio.(inverse_ratio (ratio_of_int 0))

How to compute the square root of a fraction? If both the numerator and denominator are positive perfect squares, the answer is clear. In any other case, I signal failure via infinity_ratio:


let sqrt_ratio r =
  let open Ratio in
  let open Big_int in
  match sign_ratio r with
  | -1 -> infinity_ratio
  |  0 -> r
  |  1 ->
    let r1, r2 = numerator_ratio r, denominator_ratio r in
    let q1, q2 = sqrt_big_int   r1, sqrt_big_int     r2 in
    let s1, s2 = square_big_int q1, square_big_int   q2 in
    if eq_big_int r1 s1 && eq_big_int r2 s2
      then div_big_int_ratio q1 (ratio_of_big_int q2)
      else infinity_ratio
  |  _ -> assert false

Low-level but straightforward. Now smart constructors make sure that every expression is correctly labeled with its value:


let con n    = Con (Ratio.ratio_of_int        n            , n    )
and neg e    = Neg (Ratio.minus_ratio  (label e)           , e    )
and sqr e    = Sqr (      sqrt_ratio   (label e)           , e    )
and add e e' = Add (Ratio.add_ratio    (label e) (label e'), e, e')
and sub e e' = Sub (Ratio.sub_ratio    (label e) (label e'), e, e')
and mul e e' = Mul (Ratio.mult_ratio   (label e) (label e'), e, e')
and div e e' = Div (Ratio.div_ratio    (label e) (label e'), e, e')
and abs e    =
  let r = label e in
  if Ratio.sign_ratio r != -1 then e else
  Neg (Ratio.minus_ratio r, e)

The stage is set for generating all the expressions. If nines size generates all the expressions using size nines, the recursion schema it obeys is something like this:


  if size == 1 then [   9] else
  if size == 2 then [  99] @ mix (nines 1) (nines 1) else
  if size == 3 then [ 999] @ mix (nines 2) (nines 1) @ mix (nines 1) (nines 2) else
  if size == 4 then [9999] @ mix (nines 3) (nines 1) @ mix (nines 2) (nines 2) @ mix (nines 1) (nines 3) else...

where mix is a hypothetical function that merges two lists of expressions using all the possible binary operators. In each case, the constant 10size - 1 = 99…9 is the base of the recursion, which proceeds by building all binary trees of a given size by partitioning size in two summands. OK, maybe it is simpler to show the actual code than trying to explain it in English. A small function generates the number having its n digits equal to d:


let rec rep d n = if n == 0 then 0 else d + 10 * (rep d (pred n))

(Yes, the same code can solve the four fours puzzle, or the nine nines puzzle, or…) Another basic function generates a list of integers in a given range:


let range i j =
  let rec go l i j =
    if i > j then l else go (j :: l) i (pred j)
  in go [] i j

(This one is tail recursive just because.) Now, given an expression e, I will count it as valid by adding it to the list es of expressions if it is finite and positive. Furthermore, if it is valid and it has a rational square root, I will count the latter as valid too:


let with_root e es =
  let open Ratio in
  let e = abs e in
  if null_denominator (label e)
  || sign_ratio (label e) == 0 then es else
  let q = sqr e in
  if null_denominator (label q) then e :: es else q :: e :: es

Finally, the workhorse:


let rec puzzle digit size =
  List.fold_right (fun i ->
    let j = size - i in
    List.fold_right (fun e0 ->
      List.fold_right (fun e1 ->
        List.fold_right (fun op ->
          with_root (op e0 e1)
        ) [add; sub; mul; div]
      ) (puzzle digit j)
    ) (puzzle digit i)
  ) (range 1 (size - 1))
  (with_root (con (rep digit size)) [])

It looks more complicated than it is, really; just a list comprehension in a different shape that forces one to read it from the outside in, alas. It all starts with the constant "dd… d", together with_(its)root if it is valid, as a seed for the list of expressions. Then for each i in the range from 1 to size - 1, it recursively builds solutions e0 of size i and e1 of size j = size - i. For each binary operation op, it builds the expression op e0 e1 and adds it to the resulting list of solutions, together with_(its)root if they are valid.


Almost done! The problem is that there could be many possible expressions for a given number; it would be best to find just one exemplar for it. I've written about grouping lists in another era:


let rec group ?(by=(=)) =
  let rec filter e l = match l with
  | [] -> [], []
  | x :: xs ->
    if not (by e x) then [], l else
    let gs, ys = filter e xs in
    x :: gs, ys
  in function
  | [] -> []
  | x :: xs ->
    let gs, ys = filter x xs in
    (x :: gs) :: group ~by ys

A bit of syntax will let me build a filtering pipeline to select the best candidates:


let (|>) x f = f x

Best in this case means that, out of all the expressions evaluating to a given integer, I prefer the one having the shortest representation. Now from all expressions I filter those that have integral value, decorate the expression as a string with its value, sort them and group them by value and select the first:


let fournines =
  let cmp (n, s) (n', s') =
    let c = Pervasives.compare n n' in
    if c != 0 then c else
    let c = Pervasives.compare (String.length s) (String.length s') in
    if c != 0 then c else
    Pervasives.compare s s'
  in puzzle 9 4
  |> List.filter (fun e ->  Ratio.is_integer_ratio (label e))
  |> List.map    (fun e -> (Ratio.int_of_ratio (label e), format string_of_int e))
  |> List.sort cmp
  |> group ~by:(fun (n, _) (n', _) -> n == n')
  |> List.map List.hd

Very operational. It only remains to format the list to standard output:

let () = List.iter (fun (n, s) -> Printf.printf "%4d = %s\n" n s) fournines

Behold, in all its glory, the solution to the puzzle:


11 =99 / 99
22 =99 / 9 - 9
33 =(9 + 9 + 9) / 9
44 =9 / 9 + 9 / √9
55 =9 - 9 / 9 - √9
66 =9 * 9 / 9 - √9
77 =9 - (9 + 9) / 9
88 =99 / 9 - √9
99 =√(99 - 9 - 9)
1010 =(99 - 9) / 9
1111 =(9 + 9) / 9 + 9
1212 =(9 + 99) / 9
1313 =9 + 9 / 9 + √9
1414 =99 / 9 + √9
1515 =9 + 9 - 9 / √9
1617 =9 + 9 - 9 / 9
1718 =99 - 9 * 9
1819 =9 + 9 + 9 / 9
1920 =9 + 99 / 9
2021 =9 + 9 + 9 / √9
2124 =99 / √9 - 9
2226 =9 * √9 - 9 / 9
2327 =9 * 9 * √9 / 9
2428 =9 * √9 + 9 / 9
2530 =(99 - 9) / √9
2632 =(99 - √9) / √9
2733 =99 * √9 / 9
2834 =(99 + √9) / √9
2936 =9 + 9 + 9 + 9
3039 =9 * √9 + 9 + √9
3142 =9 + 99 / √9
3245 =9 * √9 + 9 + 9
3351 =(9 + 9) * √9 - √9
3454 =9 * 9 - 9 * √9
3557 =(9 + 9) * √9 + √9
3663 =9 * 9 - 9 - 9
3769 =9 * 9 - 9 - √9
3872 =99 - 9 * √9
3975 =9 * 9 - 9 + √9
4078 =9 * 9 - 9 / √9
4180 =9 * 9 - 9 / 9
4281 =99 - 9 - 9
4382 =9 * 9 + 9 / 9
4484 =9 * 9 + 9 / √9
4587 =99 - 9 - √9
4690 =(9 + 9 / 9) * 9
4793 =99 - 9 + √9
4896 =99 - 9 / √9
4998 =99 - 9 / 9
5099 =9 * 99 / 9
51100 =9 / 9 + 99
52102 =9 / √9 + 99
53105 =9 + 99 - √9
54108 =99 + √(9 * 9)
55111 =999 / 9
56117 =9 + 9 + 99
57126 =9 * √9 + 99
58135 =(9 + 9 - √9) * 9
59144 =(9 + √9) * (9 + √9)
60153 =(9 + 9) * 9 - 9
61159 =(9 + 9) * 9 - √9
62162 =9 * 9 + 9 * 9
63165 =(9 + 9) * 9 + √9
64171 =(9 + 9) * 9 + 9
65180 =9 * 9 + 99
66189 =(9 + 9 + √9) * 9
67198 =99 + 99
68216 =(9 * 9 - 9) * √9
69234 =9 * 9 * √9 - 9
70240 =9 * 9 * √9 - √9
71243 =(9 + 9 + 9) * 9
72246 =9 * 9 * √9 + √9
73252 =9 * 9 * √9 + 9
74270 =(99 - 9) * √9
75288 =99 * √9 - 9
76294 =99 * √9 - √9
77297 =9 * 99 / √9
78300 =99 * √9 + √9
79306 =9 + 99 * √9
80324 =(9 + 99) * √9
81333 =999 / √9
82486 =(9 + 9) * 9 * √9
83594 =(9 - √9) * 99
84648 =(9 * 9 - 9) * 9
85702 =(9 * 9 - √9) * 9
86720 =9 * 9 * 9 - 9
87726 =9 * 9 * 9 - √9
88729 =9 * 9 * √(9 * 9)
89732 =9 * 9 * 9 + √9
90738 =9 * 9 * 9 + 9
91756 =(9 * 9 + √9) * 9
92810 =(99 - 9) * 9
93864 =(99 - √9) * 9
94882 =9 * 99 - 9
95888 =9 * 99 - √9
96891 =99 * √(9 * 9)
97894 =9 * 99 + √9
98900 =9 * 99 + 9
99918 =(99 + √9) * 9
100972 =(9 + 99) * 9
101990 =999 - 9
102996 =999 - √9
1031002 =999 + √9
1041008 =9 + 999
1051188 =(9 + √9) * 99
1061458 =(9 + 9) * 9 * 9
1071782 =(9 + 9) * 99
1082187 =9 * 9 * 9 * √9
1092673 =9 * 99 * √9
1102997 =999 * √9
1116561 =9 * 9 * 9 * 9
1128019 =9 * 9 * 99
1138991 =9 * 999
1149801 =99 * 99
1159999 =9999

(The table is built out of the actual output to Mac OS Terminal. The Unicode characters are printed perfectly.) Using factorials to fill in the gaps is left as an exercise to the reader (it is not simple).

2011-12-29

Vose's Alias Method

This is a note regarding the implementation of Vose's Method to construct the alias tables for non-uniform discrete probability sampling presented by Keith Schwartz (as of 2011-12-29). Mr. Schwartz otherwise very informative write-up contains an error in the presentation of the Method. Step 5 of the algorithm fails if the Small list is nonempty but the Large list is. This can happen either:


  1. if the initial probabilities sum to less than 1, or
  2. if updating in step 5.5 the Large probability results in cancellation so that pg results in an approximation less than its true value

The first case can be mitigated by scaling by n / ∑ pj in step 3. The second case can be mitigated by replacing 5.5 by "Set pg := (pg + pl) - 1." In any case, there is no proof that the algorithm as shown is correct, which is not the case for Vose's presentation. In the interest of correctness and completeness, here is an OCaml implementation of Vose's Method:


type alias = { n : int; prob : float array; alias : int array; }

let alias pa =
  let rec split pa n j (small, large as part) =
    if j == n then part else
    if pa.(j) > 1.
      then split pa n (succ j) (     small, j :: large)
      else split pa n (succ j) (j :: small,      large)
  in
  let rec init r pa part = match part with
  | j :: small, k :: large ->
    r.prob.(j)  <- pa.(j);
    r.alias.(j) <- k;
    pa.(k)      <- (pa.(k) +. pa.(j)) -. 1.;
    if pa.(k) > 1.
      then init r pa (small, k :: large)
      else init r pa (k :: small, large)
  | j :: small, []         ->
    r.prob.(j) <- 1.;
    init r pa (small, [])
  | []        , k :: large ->
    r.prob.(k) <- 1.;
    init r pa ([], large)
  | []        , []         -> r
  in
  let n  = Array.length pa in
  if n == 0 then invalid_arg "alias" else
  let sc = float n /. Array.fold_left (fun s p ->
    if p < 0. then invalid_arg "alias" else
    s +. p) 0. pa in
  let sa = Array.map (( *. ) sc) pa in
  let r  = { n; prob = Array.make n 0.; alias = Array.make n (-1); } in
  init r sa (split sa n 0 ([], []))

let choose r =
  let p, e = modf (Random.float (float r.n)) in
  let j = truncate e in
  if p <= r.prob.(j) then j else r.alias.(j)

Since this algorithm is recursive, it might not be immediately obvious how to translate it to more mainstream languages. A perhaps more faithful rendition in rather low-level Java is the following:


import java.util.Random;

public class Vose {
  private final Random random;
  private final int limit;
  private final double[] prob;
  private final int[] alias;

  public Vose(final double[] pa) {
    this(pa, new Random());
  }

  public int getLimit() {
    return limit;
  }

  public Vose(final double[] pa, final Random random) {
    final int limit = pa.length;
    if (limit == 0)
      throw new IllegalArgumentException("Vose");
    double sum = 0;
    for (int j = 0; j != limit; j++) {
      if (pa[j] < 0)
        throw new IllegalArgumentException("Vose");
      sum += pa[j];
    }
    final double scale = (double) limit / sum;
    final double[] sa = new double[limit];
    for (int j = 0; j != limit; j++)
      sa[j] = pa[j] * scale;

    this.random = random;
    this.limit  = limit;
    this.prob   = new double[limit];
    this.alias  = new int[limit];
    init(sa);
  }

  private void init(final double[] sa) {
    final int[] small = new int[limit];
    final int[] large = new int[limit];
    int ns = 0;
    int nl = 0;
    for (int j = 0; j != limit; j++)
      if (sa[j] > 1)
        large[nl++] = j;
      else
        small[ns++] = j;
    while (ns != 0 && nl != 0) {
      final int j = small[--ns];
      final int k = large[--nl];
      prob[j]  = sa[j];
      alias[j] = k;
      sa[k]    = (sa[k] + sa[j]) - 1; // sic
      if (sa[k] > 1)
        large[nl++] = k;
      else
        small[ns++] = k;
    }
    while (ns != 0)
      prob[small[--ns]] = 1;
    while (nl != 0)
      prob[large[--nl]] = 1;
  }

  public int choose() {
    final double u = limit * random.nextDouble();
    final int    j = (int) Math.floor(u);
    final double p = u - (double) j;
    return p <= prob[j] ? j : alias[j];
  }
}

In all generality, it is sufficient that the "probabilities" be non-negative, since dividing the values by the mean normalizes them so that the conditions of the Method are satisfied.


References


  1. Michael D. Vose. A Linear Algorithm For Generating Random Numbers With a Given Distribution, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING VOL. 17, NO. 9 SEPTEMBER 1991. Online.

2011-12-21

A Better (Gauss) Error Function

If you do statistics you know of erf and erfc; if you work in OCaml you surely miss them. It is not very difficult to port the canonical implementation given by Numerical Recipes (which I won't show and not just for licensing reasons); if you Google for a coefficient you'll see that this approximation is, indeed, ubiquitous. There exists a better approximation in the literature, one that is more than 40 years old. Since I find it inconceivable that it is not more widely used (again, Google for a coefficient), I present a straightforward implementation of it.


This approximation boasts a relative error of 10-19.5 in the range |x| ≤ 0.46875 = 15/32, of 10-18.59 in the range 0.46875 ≤ |x| ≤ 4, and of 10-18.24 in the range 4 ≤ |x|. The only difficulty is to accurately transcribe the coefficient tables into code; I've double-checked them and ran some tests to assess the proper functioning of the code:


let r0 = [|
3.20937_75891_38469_47256_2e+03, 2.84423_68334_39170_62227_3e+03;
3.77485_23768_53020_20813_7e+02, 1.28261_65260_77372_27564_5e+03;
1.13864_15415_10501_55649_5e+02, 2.44024_63793_44441_73305_6e+02;
3.16112_37438_70565_59694_7e+00, 2.36012_90952_34412_09349_9e+01;
1.85777_70618_46031_52673_0e-01, 1.00000_00000_00000_00000_0e+00;
|]

let r1 = [|
1.23033_93547_97997_25272e+03, 1.23033_93548_03749_42043e+03;
2.05107_83778_26071_46532e+03, 3.43936_76741_43721_63696e+03;
1.71204_76126_34070_58314e+03, 4.36261_90901_43247_15820e+03;
8.81952_22124_17690_90411e+02, 3.29079_92357_33459_62678e+03;
2.98635_13819_74001_31132e+02, 1.62138_95745_66690_18874e+03;
6.61191_90637_14162_94775e+01, 5.37181_10186_20098_57509e+02;
8.88314_97943_88375_94118e+00, 1.17693_95089_13124_99305e+02;
5.64188_49698_86700_89180e-01, 1.57449_26110_70983_47253e+01;
2.15311_53547_44038_46343e-08, 1.00000_00000_00000_00000e+00;
|]

let r2 = [|
-6.58749_16152_98378_03157e-04, 2.33520_49762_68691_85443e-03;
-1.60837_85148_74227_66278e-02, 6.05183_41312_44131_91178e-02;
-1.25781_72611_12292_46204e-01, 5.27905_10295_14284_12248e-01;
-3.60344_89994_98044_39429e-01, 1.87295_28499_23460_47209e+00;
-3.05326_63496_12323_44035e-01, 2.56852_01922_89822_42072e+00;
-1.63153_87137_30209_78498e-02, 1.00000_00000_00000_00000e+00;
|]

The tables are laid out as pairs of coefficients (pi, qi) for the numerator and denominator polynomials, respectively, in ascending order of monomial power (that is, the independent coefficients are in the first position of the arrays). The rational functions can be evaluated with a suitable variation of Horner's schema:


let horner2 r x =
  let n = Array.length r in
  let s = ref 0.
  and t = ref 0. in
  for i = n - 1 downto 0 do
    let p, q = Array.unsafe_get r i in
    s := !s *. x +. p;
    t := !t *. x +. q
  done;
  !s /. !t

For clarity I haven't inlined Horner's recursion in the following code; this is not very difficult to do, so I've left it as an exercise for the implementor:


let iqpi = 5.64189_58354_77562_86948_1e-01

let erfc x =
  let z  = abs_float x in
  let z2 = z *. z in
  let y =
    if z < 0.46875 then   1. -.         z   *. horner2 r0 z2 else
    if z < 4.      then         exp (-. z2) *. horner2 r1 z  else
    let z'  = 1. /. z  in
    let z'2 = z' *. z' in       exp (-. z2) *. z' *. (iqpi +. z'2 *. horner2 r2 z'2)
  in if x < 0. then 2. -. y else y

As explained in the paper, erfc x must be everywhere equal to 1. -. erf x:


let erf x = 1. -. erfc x

If you have Mathematica™ lying around, you might want to output a list of points to test the accuracy of the approximation:


let mma f x0 x1 n =
  let buf = Buffer.create 256 in
  Buffer.add_string buf "l = Map[N[\
(1 - 2 BitAnd[1, BitShiftRight[#, 63]])\
2^(BitAnd[16^^7ff, BitShiftRight[#, 52]] - 1023)\
(1 +  2^-52 BitAnd[16^^fffffffffffff, #]),\
$MachinePrecision]&, #, {2}]&@{";
  for i = 0 to n do
    let x = (x0 *. float (n - i) +. x1 *. float i) /. float n in
    let y = f x in
    if i != 0 then Buffer.add_char buf ',';
    Printf.bprintf buf "{16^^%Lx,16^^%Lx}"
      (Int64.bits_of_float x)
      (Int64.bits_of_float y)
  done;
  Buffer.add_string buf "};";
  Buffer.contents buf

In order to exactly marshal the IEEE 754 double values I convert them to rational numbers equivalent to machine floats on Mathematica's end. My informal tests show that around the cut-off points Mathematica evaluates erf to the same exact values this code does (that is, the absolute error is zero). When evaluated to sufficient precision, the maximum absolute error among 5,000 equispaced points in the interval [3.99, 4.01] is 5.5437×10-17.


To mathematical libraries implementors everywhere, I urge you: please steal this code!


References


  1. W. J. Cody. Rational Chebyshev Approximations for the Error Function, Mathematics of Computation Vol. 23, No. 107 (Jul., 1969), pp. 631-63. Online.

2011-11-17

Brainfuck in Java

(I don't feel like writing a punning title.) Last time I showed how the use of functors allows for modular and type-safe specification of semantics. I wrote I can't imagine this flexibility in object-oriented languages like Java or Python; fighting words for which I was justly taken to task. Here's my penance: a morally equivalent reimplementation in Java of the Brainfuck interpreter. The translation was mechanic, but I realized with some surprise that the result is more flexible than the original OCaml: whereas the latter only allows static composition of semantics (unless you use first-class modules), Java allows fully dynamic, run-time stacking of semantics.

Even though the Java is better decoupled than the original (which uses the same memory representation for every interpreter variant), this is more by necessity than design. In the process, the interpreter suffered more than a 300% code expansion by line count at 520 lines. Bear with me! This is a single-file, multiple-class program; feel free to package and refactor it to your heart's content. The external requirements are minimal:

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Stack;

As advertised, the top-level Brainfuck class parses, prints and evaluates Brainfuck programs (a List of Instructions) using fully compositional semantics:

public final class Brainfuck {
    public static void main(String[] args) {
        String hello =
">+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>++++++++[<++++>-]"+
"<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++.------.--------.>>+.";
        Semantics semantics = new S3_(5000, new S2(new S1_(new S0())));
        List<Instruction> program = Brainfuck.parse(hello);
        System.out.print(Brainfuck.print(program));
        Brainfuck.evaluate(semantics, program);
    }

The class is completely static and reentrant; the main operations delegate to appropriate visitors:

private Brainfuck() { }

    public static void evaluate(Semantics semantics, List<Instruction> program) {
        new Evaluator(semantics).evaluate(program);
    }

    public static String print(List<Instruction> program) {
        return new Printer().print(program);
    }

The parser is standard, recursive-descent but with explicitly-managed recursion. For simple (concrete) instructions it just adds the corresponding Instruction to the current program. When it sees an opening bracket it uses a Stack to hold the partially-parsed List of Instructions and starts fresh. When it encounters a closing bracket, it builds a Repeat instruction from the just-parsed program, pops the context and adds the former to the latter:

public static List<Instruction> parse(String text) {
        final Stack<List<Instruction> > stack = new Stack<List<Instruction> >();
        List<Instruction> program = new ArrayList<Instruction>();

        for (int i = 0; i != text.length(); i++)
            switch (text.charAt(i)) {
            case '>': program.add(Instruction.FWD); break;
            case '<': program.add(Instruction.BAK); break;
            case '+': program.add(Instruction.INC); break;
            case '-': program.add(Instruction.DEC); break;
            case '.': program.add(Instruction.OUT); break;
            case ',': program.add(Instruction.INP); break;
            case '[':
                stack.push(program);
                program = new ArrayList<Instruction>();
                break;
            case ']':
                if (stack.empty())
                    throw new IllegalArgumentException("unmatched `]'");
                final Instruction rep = Instruction.REP(program);
                program = stack.pop();
                program.add(rep);
                break;
            }
        if (!stack.empty())
            throw new IllegalArgumentException("unmatched `['");
        return program;
    }
}

A textbook example that completes the class. Now for the real program. An Instruction is an abstract class implementing the case class pattern à la Scala (cf Wadler's expression problem). Every instruction supports two operations, evaluation and printing, by using the visitor pattern:

abstract class Instruction {
    private Instruction() { }

    public abstract void evaluate(Evaluator evaluator);
    public abstract void print(Printer printer);

Each concrete Instruction is tucked away inside this class. They simply forward the call to the passed-in visitor. The first six are mind-numbingly straightforward:

private static final class Forward extends Instruction {
        private Forward() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateForward();
        }

        @Override
        public void print(Printer printer) {
            printer.printForward();
        }

    private static final class Backward extends Instruction {
        private Backward() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateBackward();
        }

        @Override
        public void print(Printer printer) {
            printer.printBackward();
        }

    private static final class Increment extends Instruction {
        private Increment() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateIncrement();
        }

        @Override
        public void print(Printer printer) {
            printer.printIncrement();
        }

    private static final class Decrement extends Instruction {
        private Decrement() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateDecrement();
        }

        @Override
        public void print(Printer printer) {
            printer.printDecrement();
        }

    private static final class Output extends Instruction {
        private Output() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateOutput();
        }

        @Override
        public void print(Printer printer) {
            printer.printOutput();
        }

    private static final class Input extends Instruction {
        private Input() { }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateInput();
        }

        @Override
        public void print(Printer printer) {
            printer.printInput();
        }

The last Instruction is fractionally more interesting in that it has a program as a parameter:

private static final class Repeat extends Instruction {
        private final List<Instruction> program;

        private Repeat(List<Instruction> program) {
            this.program = program;
        }

        @Override
        public void evaluate(Evaluator evaluator) {
            evaluator.evaluateRepeat(program);
        }

        @Override
        public void print(Printer printer) {
            printer.printRepeat(program);
        }

These classes are fully private to the enclosing case class Instruction; the Brainfuck interpreter uses static members to refer to each:

public static final Instruction FWD = new Forward();
    public static final Instruction BAK = new Backward();
    public static final Instruction INC = new Increment();
    public static final Instruction DEC = new Decrement();
    public static final Instruction OUT = new Output();
    public static final Instruction INP = new Input();

    public static Instruction REP(List<Instruction> program) {
        return new Repeat(program);
    }
}

This is all the (essentially boilerplate) code necessary just to reproduce an algebraic data type over which OCaml code would pattern match. The visitors implement each arm of this match, so more boilerplate follows. The first one is the Evaluator:

class Evaluator {
    private final Semantics semantics;
    private final Memory memory;

It takes a Semantics and sets itself to operate upon an initial Memory:

public Evaluator(Semantics semantics) {
        this.semantics = semantics;
        this.memory = semantics.initial();
    }

This is all a bit abstract because both Semantics and Memory are interfaces that can be plugged-in at will. To evaluate a program, it visits each Instruction in turn:

public void evaluate(List<Instruction> program) {
        for (Instruction instruction : program)
            instruction.evaluate(this);
    }

Again, this is a bog-standard visitor. Each visiting method simply delegates to the chosen Semantics with the current state of the Memory:

public void evaluateForward() {
        semantics.forward(this.memory);
    }

    public void evaluateBackward() {
        semantics.backward(this.memory);
    }

    public void evaluateIncrement() {
        semantics.increment(this.memory);
    }

    public void evaluateDecrement() {
        semantics.decrement(this.memory);
    }

    public void evaluateOutput() {
        semantics.output(this.memory);
    }

    public void evaluateInput() {
        semantics.input(this.memory);
    }

The Repeat instruction is an Action over a Memory that corresponds to evaluate-ing the repeated program:

public void evaluateRepeat(final List<Instruction> program) {
        semantics.repeat(new Action<Memory>() {
            @Override
            public void apply(Memory _unused) {
                evaluate(program);
            }
        }, this.memory);
    }
}

(Why is the Memory _unused? Because evaluate already has it in scope, but the Semantics are fully parametric with respect to the store. If you find an elegant solution to this that doesn't involve each Instruction passing the Memory around, let me know.) This is the moral equivalent of a higher-order closure; someday you might be able to write a lambda instead of that. The second visitor is the Printer:

class Printer {
    private final StringBuffer buffer = new StringBuffer(72);
    private int linelen;

It formats the Brainfuck program in neat lines of 72 characters:

public Printer() {
        linelen = 0;
    }

    private void putc(char c)  {
        if (linelen == 72 || c == '\n') {
            buffer.append('\n');
            linelen = 0;
        }
        if (c != '\n') {
            buffer.append(c);
            linelen++;
        }
    }

The visitor sets the buffer up, prints each instruction by visiting them in turn and closes the buffer:

public String print(List<Instruction> program) {
        buffer.setLength(0);
        for (Instruction instruction : program)
            instruction.print(this);
        putc('\n');
        return buffer.toString();
    }

Each visited Instruction accumulates its representation in the buffer:

public void printForward()   { putc('>'); }
    public void printBackward()  { putc('<'); }
    public void printIncrement() { putc('+'); }
    public void printDecrement() { putc('-'); }
    public void printOutput()    { putc('.'); }
    public void printInput()     { putc(','); }

Aside: It is not the responsibility of the Instruction to know its external representation since it denotes Brainfuck's abstract syntax. Do not be misled by phony invocations to the Law of Demeter.

Repeat just emits its list of Instructions between brackets by mutual recursion between it and this visitor:

public void printRepeat(List<Instruction> program) {
        putc('[');
        for (Instruction instruction : program)
            instruction.print(this);
        putc(']');
    }
}

Neat. Now for the raison d'être of the exercise: A Semantics encapsulates the effects of each Brainfuck instruction upon the Memory:

interface Semantics {
    Memory initial();
    void forward(Memory memory);
    void backward(Memory memory);
    void increment(Memory memory);
    void decrement(Memory memory);
    void output(Memory memory);
    void input(Memory memory);
    void repeat(Action<Memory> program, Memory memory);
}

The effect of a repetition depends on the program being repeated; this is represented by an Action taking the Memory as a parameter:

interface Action<T> {
    void apply(T argument);
}

Aside: Don't confuse a program as a List of Instructions with its meaning as an Action or effect. It wouldn't be appropriate for repeat to take anything else.

In object-oriented languages there are two reuse mechanisms: inheritance and composition. The first is static, compile-time and determined by the providing (i.e. library) code; the second is dynamic, run-time and controlled by the consuming (i.e. client) code. Coplien's envelope pattern allows to combine both in a sort of dynamic inheritance:

abstract class DelegatingSemantics implements Semantics {
    private final Semantics delegate;

    public DelegatingSemantics(Semantics delegate) {
        this.delegate = delegate;
    }

    @Override
    public Memory initial()              { return delegate.initial(); }

    @Override
    public void forward(Memory memory)   { delegate.forward(memory); }

    @Override
    public void backward(Memory memory)  { delegate.backward(memory); }

    @Override
    public void increment(Memory memory) { delegate.increment(memory); }

    @Override
    public void decrement(Memory memory) { delegate.decrement(memory); }

    @Override
    public void output(Memory memory)    { delegate.output(memory); }

    @Override
    public void input(Memory memory)     { delegate.input(memory); }

    @Override
    public void repeat(Action<Memory> program, Memory memory) {
        delegate.repeat(program, memory);
    }
}

Envelopes take a delegating letter to which to forward by default the implemented methods; subclasses need only override the methods of interest and let the envelope handle the rest. The initial semantics S0 is, however, fully concrete. It operates on an unbounded memory of machine integers:

class S0 implements Semantics {
    @Override
    public Memory initial()              { return new UnboundedMemory(); }

    @Override
    public void forward(Memory memory)   { memory.next(); }

    @Override
    public void backward(Memory memory)  { memory.prev(); }

    @Override
    public void increment(Memory memory) { memory.set(memory.get() + 1); }

    @Override
    public void decrement(Memory memory) { memory.set(memory.get() - 1); }

    @Override
    public void output(Memory memory) {
        System.out.print((char) (memory.get() & 255));
    }

The input behavior on EOF is to leave the memory untouched:

@Override
    public void input(Memory memory) {
        try {
            final int c = System.in.read();
            if (c != -1) memory.set(c);
        } catch (IOException e) {
            System.err.println(e);
        }
    }

The meaning of repeat is to execute the enclosed program until the current memory cell is zero:

@Override
    public void repeat(Action<Memory> program, Memory memory) {
        int c;
        while ( (c = memory.get()) != 0 )
            program.apply(memory);
    }
}

This interpretation of repeat will probably never change (but consider a non-Turing-complete version of Brainfuck with bounded repetition, e.g. for server-side Core Wars competitions). The first difference, however, is on input:

class S1 extends DelegatingSemantics {
    public S1(Semantics delegate) { super(delegate); }

    @Override
    public void input(Memory memory) {
        try {
            final int c = System.in.read();
            memory.set(c == -1 ? 0 : c);
        } catch (IOException e) {
            System.err.println(e);
        }
    }
}

The S1 interpretation overrides a Semantics so that it sets the current cell to zero on EOF. The other variant is to set it to -1:

class S1_ extends DelegatingSemantics {
    public S1_(Semantics delegate) { super(delegate); }

    @Override
    public void input(Memory memory) {
        try {
            final int c = System.in.read();
            memory.set(c);
        } catch (IOException e) {
            System.err.println(e);
        }
    }
}

The second difference is in the memory cell size, in this case 8-bit-wide:

class S2 extends DelegatingSemantics {
    public S2(Semantics delegate) { super(delegate); }

    @Override
    public void increment(Memory memory) {
        memory.set((memory.get() + 1) & 255);
    }

    @Override
    public void decrement(Memory memory) {
        memory.set((memory.get() - 1) & 255);
    }
}

Although it uses the delegate Semantics's memory, it truncates it on mutation. The third difference is in the organization of the Memory itself:

class S3 extends DelegatingSemantics {
    protected final int length;

    public S3(int length, Semantics delegate) {
        super(delegate);
        this.length = length;
    }

    @Override
    public Memory initial() { return new BoundedMemory(length); }
}

It uses a bounded memory of the size provided, which disallows moving past its bounds. A variation is to wrap around the current cell when reaching either end:

class S3_ extends DelegatingSemantics {
    protected final int length;

    public S3_(int length, Semantics delegate) {
        super(delegate);
        this.length = length;
    }

    @Override
    public Memory initial() { return new CircularMemory(length); }
}

The same code except for the underlying store selected. The Memory interface itself is simple:

interface Memory {
    int get();
    void set(int value);
    void next();
    void prev();
}

The only operations on it are getting and setting the current cell's value and moving it one position in each direction. The implementation of an UnboundedMemory is, however, a bit sophisticated. It uses the standard Turing Machine trick of simulating a doubly-infinite tape with two semi-infinite ones:

class UnboundedMemory implements Memory {
    private final List<Integer> left  = new ArrayList<Integer>();
    private final List<Integer> right = new ArrayList<Integer>();
    private int cursor;

The right tape contains cells with index 0, 1, 2… while the left tape contains cells indexed by -1, -2, -3… Initially only the zero-th cell exists:

public UnboundedMemory() {
        right.add(0);
        cursor = 0;
    }

The class maintains the invariant that the current cell, to which cursor points, always exists:

@Override
    public int get() {
        if (cursor < 0)
            return left.get(-1 - cursor);
        else
            return right.get(cursor);
    }

    @Override
    public void set(int value) {
        if (cursor < 0)
            left.set(-1 - cursor, value);
        else
            right.set(cursor, value);
    }

New cells are created on demand on the appropriate tape upon cursor movement:

@Override
    public void next() {
        cursor++;
        if (cursor >= 0 && right.size() == cursor)
            right.add(0);
    }

    @Override
    public void prev() {
        --cursor;
        if (cursor < 0 && left.size() == -1 - cursor)
            left.add(0);
    }
}

A BoundedMemory is, by contrast, much simpler:

class BoundedMemory implements Memory {
    protected final int[] tape;
    protected int cursor;

    public BoundedMemory(int length) {
        this.tape = new int[length];
        cursor = 0;
    }

    @Override
    public int get() {
        return tape[cursor];
    }

    @Override
    public void set(int value) {
        tape[cursor] = value;
    }

The only intelligence required is in avoiding moving the cursor outside its bounds:

@Override
    public void next() {
        if (cursor != tape.length - 1)
            cursor++;
    }

    @Override
    public void prev() {
        if (cursor != 0)
            --cursor;
    }
}

A CircularMemory is just a BoundedMemory with a different out-of-bounds behavior:

class CircularMemory extends BoundedMemory {
    public CircularMemory(int length) {
        super(length);
    }

    @Override
    public void next() {
        cursor++;
        if (cursor == tape.length)
            cursor = 0;
    }

    @Override
    public void prev() {
        if (cursor == 0)
            cursor = tape.length;
        --cursor;
    }
}

tl; dr: writing interpreters in Java is tedious. Interestingly enough, I find the object-oriented expression of the Semantics over the different types of Memory very natural, more so even than what modular programming allows. The downsides I find, at least for Java, are two:

  1. The sheer amount of boilerplate necessary to express the problem. Even allowing for the lack of closures, algebraic data types and pattern matching, it lacks an automatic delegation mechanism. This is actually a shortcoming of every mainstream object-oriented language (and no, metaprogramming is not a substitute for a language construct that can be checked and verified statically for a number of properties that run-time reflection would not)
  2. The need to name everything. Nominal type disciplines not only suck on principle, they actually encourage naming trivial program constructs (classes, interfaces, methods) that therefore take an artificial existence of their own, bloating (module) interfaces and opening new ports for coupling

Still with me? Let me thank you and express my admiration for your fortitude.

2011-11-11

Modular Semantics for Brainfuck

The problem with Brainfuck is that its semantics are given by its different implementations but not all its implementations agree so that writing an interpreter is straightforward but writing portable Brainfuck programs is not. OCaml functors allows playing with pluggable semantics in a way that would be very difficult laborious with other languages. I can't imagine this flexibility in object-oriented languages like Java or Python, but in OCaml the mix-and-match approach with module inclusion is straightforward.

Update: OK, so reddit justly challenged my grandstanding. The moral equivalent of this typesafe, modular, extensible interpreter requires more than 520 tight lines, of which 270 go to the parser, pretty-printer and the evaluator (two visitors over 7 case classes), and 250 go to the plug-in semantics. The end result is more flexible than the following OCaml code, since I can choose the composition dynamically, at run-time:

public static void main(String[] args) {
  String hello =
">+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>++++++++[<++++>-]"+
"<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++.------.--------.>>+.";

  Semantics s = new S3_(5000, new S2(new S1_(new S0())));
  Brainfuck bf = new Brainfuck(s);
  bf.evaluate(bf.parse(hello), s.initial());
}

The key is using delegation instead of inheritance to override behavior and spending six hours typing code like a madman.

The language is defined by the effect that its eight instructions have on a memory:

module type SEMANTICS = sig
  type mem
  val mem       : mem
  val forward   : mem -> mem
  val backward  : mem -> mem
  val increment : mem -> mem
  val decrement : mem -> mem
  val output    : mem -> mem
  val input     : mem -> mem
  val repeat    : (mem -> mem) -> (mem -> mem)
end

The initial state of the memory is given by mem. The next six functions correspond directly to the six instructions > < + - . , while the operational semantics of each of the brackets is consolidated into a single small-step function, repeat, so that for a properly balanced Brainfuck program [P] its effect on mem is repeat P mem.

Given a SEMANTICS, the interpretation of a Brainfuck program is straightforward:

module Brainfuck (S : SEMANTICS) = struct

The program is represented by an abstract syntax comprising of a sequence of instructions mapping one-to-one to the surface syntax of the language, except for the pair of brackets that represents a single unit of repetition:

type instr = Fwd | Bak | Inc | Dec | Out | Inp | Rep of program
  and program = instr list

Note: The fact that properly balanced [ and ] denote a repeating computation can be taken as a definition. The translation from concrete syntax to abstract syntax ensures that the mapping is faithful and the operational semantics given for [ and ] as separate instructions corresponds to the one for Rep.

Since the semantics of each instruction is a function from memory to memory, the semantics of a program is the left-to-right composition of the semantics of the individual instructions comprising it:

let rec eval p t = List.fold_left (fun t -> function
  | Fwd   -> S.forward   t
  | Bak   -> S.backward  t
  | Inc   -> S.increment t
  | Dec   -> S.decrement t
  | Out   -> S.output    t
  | Inp   -> S.input     t
  | Rep p -> S.repeat (eval p) t) t p

The translation from concrete to abstract syntax is given by a parser:

let parse str =
    let syntax_err fmt = Printf.kprintf failwith fmt in
    let res = ref []
    and stk = ref [] in
    let emit  i  = res := i :: !res
    and enter () = stk := !res :: !stk; res := []
    and leave () = match !stk with
    | r :: s -> stk := s; res := Rep (List.rev !res) :: r
    | []     -> syntax_err "unmatched `]'"
    in String.iter (function
    | '>' -> emit Fwd
    | '<' -> emit Bak
    | '+' -> emit Inc
    | '-' -> emit Dec
    | '.' -> emit Out
    | ',' -> emit Inp
    | '[' -> enter ()
    | ']' -> leave ()
    |  _  -> ()) str;
    if !stk != [] then syntax_err "unmatched `['" else
    List.rev !res

Only the concrete instructions > < + - . , [ and ] have meaning in a program; all other possible characters are ignored and [ and ] must be properly balanced. An adjoint pretty-printer or formatter translates abstract syntax back to concrete, such that for all abstract programs P, parse (format P) ≡ P:

let format prog =
    let buf = Buffer.create 72
    and len = ref 0 in
    let putc c =
      if !len == 72 || c == '\n' then begin
        Buffer.add_char buf '\n';
        len := 0
      end;
      if c != '\n' then begin
        Buffer.add_char buf c;
        incr len
      end
    in
    let rec go prog = List.iter (function
    | Fwd   -> putc '>'
    | Bak   -> putc '<'
    | Inc   -> putc '+'
    | Dec   -> putc '-'
    | Out   -> putc '.'
    | Inp   -> putc ','
    | Rep p -> putc '['; go p; putc ']') prog
    in go prog; putc '\n'; Buffer.contents buf
end

(The proof of this last property would go by mutual induction on format and parse if it were not that they are imperative; the conversion to pure recursive form is straightforward and mechanical but left as an exercise to the reader.) It is time to give concrete semantics for each of the instructions.

The simplest possible store is a tape of memory cells which can be conveniently represented by a zipper:

type tape = Tape of int list * int * int list

The initial memory has an infinite number of zeros:

module S0 = struct
  type mem = tape

  let mem =
    let rec zeros = 0 :: zeros in
    Tape (zeros, 0, zeros)

Advancing and retreating the cell pointer simply moves the zipper, since both ends are infinite:

let forward  (Tape (ls, c, r :: rs)) = Tape (c :: ls, r, rs)
  let backward (Tape (l :: ls, c, rs)) = Tape (ls, l, c :: rs)

(The cyclic lists at both ends can be considered as sentinel nodes.) The simplest representation for a cell is as the underlying machine integer:

let increment (Mem (ls, c, rs)) = Mem (ls, c + 1, rs)
  let decrement (Mem (ls, c, rs)) = Mem (ls, c - 1, rs)

Character output is consistently among implementations just the output to console of the cell value as an ASCII code:

let output (Mem (_, c, _) as t) =
    output_char stdout (char_of_int (c land 255));
    t

Character input differs on the treatment of end-of-file; probably the most general behavior is to not modify the cell on EOF:

let input (Mem (ls, c, rs) as t) =
    let c = try int_of_char (input_char stdin) with End_of_file -> c in
    Tape (ls, c, rs)

Repetition is the effect of p whenever the current cell is not zero:

let rec repeat p t = match t with
  | Mem (_, 0, _) -> t
  | _             -> repeat p (p t)
end

One variation between interpreters is the EOF behavior. Some of them set the current cell to 0 on EOF:

module S1 (S : SEMANTICS with type mem = tape) = struct
  include S

  let input (Tape (ls, c, rs)) =
    let c = try int_of_char (input_char stdin) with End_of_file -> 0 in
    Tape (ls, c, rs)
end

Others set it to -1:

module S1' (S : SEMANTICS with type mem = tape) = struct
  include S

  let input (Tape (ls, c, rs)) =
    let c = try int_of_char (input_char stdin) with End_of_file -> -1 in
    Tape (ls, c, rs)
end

Another variation between interpreters is the cell width in bytes. The standard reference interpreter uses eight-bit cells:

module S2 (S : SEMANTICS with type mem = tape) = struct
  include S

  let increment (Tape (ls, c, rs)) = Tape (ls, (c + 1) land 255, rs)
  let decrement (Tape (ls, c, rs)) = Tape (ls, (c - 1) land 255, rs)
end

(To avoid redefining all functions I simply mask off excess bits from the native integer representation.) Yet another possible variation is to have a fixed-size tape. What to do when trying to move the cell pointer past the ends is a free choice; I could opt for simply ignoring the attempted movement:

let fill v =
  let rec go l n =
    if n == 0 then l else
    go (v :: l) (pred n)
  in go []

module S3 (N : sig val size : int end)
          (S : SEMANTICS with type mem = tape) = struct
  include S

  let mem = Tape ([], 0, fill 0 N.size)

  let forward  t = match t with
  | Tape (ls, c, r :: rs) -> Tape (c :: ls, r, rs)
  | Tape (_ , _, []     ) -> t

  let backward t = match t with
  | Tape (l :: ls, c, rs) -> Tape (ls, l, c :: rs)
  | Tape ([]     , _, _ ) -> t
end

The other alternative is to wrap the cell pointer at either end:

module S3' (N : sig val size : int end)
           (S : SEMANTICS with type mem = tape) = struct
  include S3 (N) (S)

  let forward  t = match t with
  | Tape (ls, c, r :: rs) -> Tape (c :: ls, r, rs)
  | Tape (ls, c, []     ) ->
    let r :: rs = List.rev (c :: ls) in
    Tape ([], r, rs)

  let backward t = match t with
  | Tape (l :: ls, c, rs) -> Tape (ls, l, c :: rs)
  | Tape ([]     , c, rs) ->
    let l :: ls = List.rev (c :: rs) in
    Tape (ls, l, [])
end

Now the choice of semantics is a matter of stacking the desired functors:

let hello = "
>+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.>>>++++++++[<++++>-]
<.>>>++++++++++[<+++++++++>-]<---.<<<<.+++.------.--------.>>+."

let _ =
  let module S  = S3'(struct let size = 5000 end)(S2(S1(S0))) in
  let module BF = Brainfuck(S) in
  BF.(eval (parse hello)) S.mem

This makes BF a Brainfuck interpreter running on a bounded memory of 5.000 byte-sized cells with 0 on EOF.

2011-10-28

A First-Principles GIF Encoder

Also, quasicrystals. Have you ever found yourself wanting to do generated animations but didn't know how to visualize them? Since I couldn't find a self-contained GIF encoder, I made myself one. Even though it is not fast, I think its 220 lines are clear enough to be a good reference implementation to build upon. The encoder conforms to the following interface:


type palette
val palette : ?background:int -> color array -> palette
type t
val make    : ?ct:palette
           -> width:int -> height:int
           -> (t -> unit) -> out_channel -> unit
val image   : ?ct:palette -> ?transparent:int -> ?fps:int
           -> ?x:int -> ?y:int -> ?width:int -> ?height:int -> string
           -> t -> unit
val comment : string -> t -> unit
val repeat  : ?count:int -> t -> unit

The idea is to build the animation via repeat, comment and image inside the higher-order argument to make to ensure a well-formed output file.


GIF is a block-oriented format specifically allowing decoders to quickly skip over unknown blocks. GIF data is written as length-prefixed blocks, with the block size occupying a byte, hence a data block can be no larger than 255 bytes. Variable-length data is written as a succession of nonempty blocks terminated by an empty block, that is, a zero byte.


For this reason I need a buffering layer upon channel output. This is where I find objects a nice programming construct: manipulating stateful abstractions in an imperative way, even when the overall interface is functional. The output stream can be in immediate mode, where bytes are passed directly to the output channel, or in block mode, where bytes are accumulated in the buffer to be written in block:


class output otch = object (self)
  val         buffer  =  String.create 255
  val mutable current = -1

The member current is -1 in the first case, and in the second it marks the next empty position in buffer. When the buffer is full or at the end of the block any accumulated and unwritten bytes must be flushed out:


  method private flush =
    if current > 0 then begin
      output_byte otch current;
      output      otch buffer 0 current;
      current <- 0
    end

The output stream is byte-oriented:


  method byte n =
    let b = n land 255 in
    if current == -1 then output_byte otch b else begin
      if current == 255 then self#flush;
      buffer.[current] <- char_of_int b;
      current <- current + 1
    end

If in immediate mode the byte is written to the underlying channel, or else it is accumulated. This primitive is the building block for the other methods:


  method le16 n =
    self#byte  n;
    self#byte (n lsr 8)

  method string str =
    String.iter (fun c -> self#byte (int_of_char c)) str

(I admitted this is not particularly fast.) The switch into and out of block mode is performed via the following pair of methods:


  method begin_block =
    if current != -1 then failwith "begin_block";
    current <- 0

  method end_block =
    if current == -1 then failwith "end_block" else
    self#flush;
    current <- -1;
    self#byte 0

It doesn't make sense to nest block modes, so as a sanity check I verify before changing mode. Finally, a bit of clean-up:


  method close =
    if current != -1 then self#end_block;
    flush otch
end

Pretty simple except maybe for the fact that the output stream must be careful not to commit a fence error when writing a block of exactly 255 characters. This class is functional enough to support all the data required to build a GIF file, including the LZW-compressed image data. For that I also use a class, again to encapsulate a blob of mutable data. That doesn't mean that I will forgo functional style, though:


let fold_string f e s =
  let res = ref e in
  String.iter (fun c -> res := f !res c) s;
  !res

The LZW compressor used by GIF starts with a given code size, in principle the pixel bit-width, and expands it as it compresses data. It uses two special codes for signaling a stream reset and the end of the stream:


class lzw ?(bits=8) out =
  let count       = 1 lsl bits in
  let clear_code  = count
  and end_of_info = count + 1 in
object (self)

Since the stream is bit-oriented, the compressor keeps a bit buffer on which it appends data until there are enough bits to make whole bytes:


  val mutable code_size = 0
  val mutable buffer    = 0
  val mutable length    = 0

  method append n =
    buffer <- buffer lor ((n land pred (1 lsl code_size)) lsl length);
    length <- length + code_size;
    while length >= 8 do
      out#byte (buffer land 0xff);
      buffer <- buffer lsr 8;
      length <- length - 8
    done

Bits accumulate from least significative to most significative, that is, right to left. Of course, any left-over bits must be flushed out too:


  method finish =
    if length > 0 then
      out#byte (buffer land pred (1 lsl length))

The basic idea behind the algorithm is to build a table of those strings repeatedly seen in the input stream and to replace them with a code whenever they are detected. A good (but frustratingly sketchy in crucial parts) reference is that of Steve Blackstock, who recommends using a hash table to quickly find substrings:


  val table : (int * int, int) Hashtbl.t = Hashtbl.create 5003
  val mutable next_code = 0

The table maps substrings of the form <prefix>c to codes; the <prefix> is represented as the corresponding code, in effect treating substrings as linked lists of characters. The initial table is populated with the roots, the 2bits possible input values, plus the two special codes:


  method private reset =
    Hashtbl.clear table;
    for i = 0 to count - 1 do
      Hashtbl.add table (-1, i) i
    done;
    code_size <- bits  + 1;
    next_code <- count + 2

Since the total number of roots is 2bits + 2, the code_size is bits + 1. Whenever a new string is added to the table, the encoder must check that the code_size is sufficient, and adjust it if not. The GIF specification allows for a maximum of 12-bit codes, so if that size is reached the stream is reset:


  method private add_string s =
    if next_code == 1 lsl code_size then
      code_size <- code_size + 1;
    Hashtbl.add table s next_code;
    next_code <- next_code + 1;
    if next_code == 0x1000 then begin
      self#append clear_code;
      self#reset;
    end

The compressor follows essentially the description given in the reference:


  method compress data =
    self#reset;
    self#append clear_code;
    let last = fold_string (fun prefix c ->
      let  k = int_of_char c in
      try  Hashtbl.find table (prefix, k)
      with Not_found ->
        self#append prefix;
        self#add_string (prefix, k);
        k) (-1) data in
    self#append last;
    self#append end_of_info;
    self#finish
end

It took me a number of tries to discover the correct expand/reset sequence; all in all, I find the LZW algorithm really clever.


GIF is an indexed-color graphics format. Pixels are indices into a Color Table or palette:


type color = { red : int; green : int; blue : int; }

type palette = { background : int; bits : int; table : color array; }

Creating a palette is a matter of validating the input (GIF is restricted to power-of-two color tables) and making sure that the array of colors can't be mutated outside the palette:


let palette ?(background=0) table =
  let size = Array.length table in
  if size < 2
  || size > 256
  || size land (size - 1) != 0
  || background < 0
  || background >= size
  then invalid_arg "color_table" else
  let bits = truncate (log (float size) /. log 2.) in
  { background; bits; table = Array.copy table; }

The GIF type records the screen dimensions, the Global Color Table and the output stream:


type t = {
  width  : int;
  height : int;
  ct     : palette option;
  out    : output;
}

The GIF is built part by part, beginning with the Header and ending with the Trailer:


let header { out; _ } = out#string "GIF89a"

let trailer { out; _ } = out#byte 0x3b

(Note the concision that I gain by using a record as the data type instead of an object.) Color Tables are laid out sequentially:


let color_table { table; _ } { out; _ } =
  Array.iter (fun { red; green; blue } ->
    out#byte red  ;
    out#byte green;
    out#byte blue) table

A GIF file consists of one or more images laid out in a graphics space represented by a Logical Screen Descriptor:


let logical_screen ({ width; height; ct; out } as gif) =
  out#le16   width;
  out#le16   height;
  match ct with
  | None ->
    out#byte 0;
    out#byte 0;
    out#byte 0 (* default aspect ratio (1:1 = 49) *)
  | Some ({ background; bits; _ } as ct) ->
    out#byte (0xf0 lor pred bits);
    out#byte background;
    out#byte 0;
    color_table ct gif

If the GIF includes a global color table, it follows the Logical Screen Descriptor. In order for make to be safe it must close the output stream even in the presence of a client error:


let unwind ~(protect : 'a -> unit) f x =
  try let y = f x in protect x; y
  with e -> protect x; raise e

Then it is a matter of creating the data record, writing the header and the logical screen description, calling the client function to generate the content, and wrapping up the file with the trailer:


let make ?ct ~width ~height proc otch =
  unwind ~protect:(fun gif -> trailer gif; gif.out#close)
    (fun gif -> header gif; logical_screen gif; proc gif)
    { width; height; ct; out = new output otch; }

GIF87a already allows a GIF file to contain more than one image to be displayed in the logical screen, for instance by tiling. GIF89a includes Graphic Control Extension to make cell animations out of those images, by specifying transparency and frame delay:


let graphics_control_extension transparent fps { out; _ } =
  let delay = if fps = 0 then 0 else (200 + fps) / (fps + fps) in
  out#byte 0x21; (* GIF Extension Code *)
  out#byte 0xf9; (* Graphic Control Label *)
  out#byte 0x04; (* Length of Data Sub-Block *)
  out#byte (match transparent with None -> 0 | Some _ -> 9);
  out#le16 delay;
  out#byte (match transparent with None -> 0 | Some c -> c);
  out#byte 0x00 (* Data Sub-Block Terminator *)

An image itself is introduced by an Image Descriptor giving its position and size relative to the logical screen, and optionally a Local Color Table:


let image_descriptor ct x y width height ({ out; _ } as gif) =
  out#byte 0x2c;
  out#le16 x;
  out#le16 y;
  out#le16 width;
  out#le16 height;
  match ct with
  | None ->
    out#byte 0x00
  | Some ({ bits; _ } as ct) ->
    out#byte (0x80 lor pred bits);
    color_table ct gif

The image is laid out as a sequence of byte-sized pixels, in height rows of width bytes, represented as a string for compactness:


let image ?ct ?transparent ?(fps=0) ?(x=0) ?(y=0) ?width ?height bytes ({ out; _ } as gif) =
  let w = match width  with None -> gif.width  | Some w -> w
  and h = match height with None -> gif.height | Some h -> h in
  if x < 0 || x + w  > gif.width
  || y < 0 || x + h > gif.height
  || String.length bytes != w * h
    then invalid_arg "image";
  let bits = match ct, gif.ct with
  | Some ct, _
  | None   , Some ct -> max 2 ct.bits
  | None   , None    -> invalid_arg "image"
  in
  (match transparent, fps with
  | None, 0 -> ()
  | _   , _ -> graphics_control_extension transparent fps gif);
  image_descriptor ct x y w h gif;
  out#byte bits;
  out#begin_block;
  (new lzw ~bits out)#compress bytes;
  out#end_block

I validate the image dimensions and make sure that there is at least a color table applicable to this image to compute the pixel size in bits (GIF requires a minimum of 2-bit pixels even for bilevel images). If the image has a transparent color or an animation speed, it must be modified by a Graphics Control Extension block. Then I write the Image Descriptor and the Table-Based Image Data.


Another extension block allows the GIF file to include Comments:


let comment text { out; _ } =
  out#byte 0x21;         (* GIF Extension Code *)
  out#byte 0xfe;         (* Comment Label *)
  out#begin_block;
  out#string text;
  out#end_block

These can be useful for debugging, for instance. Looping is specified by a specific Application Extension Block introduced by Netscape:


let repeat ?(count=0) { out; _ } =
  out#byte   0x21;       (* GIF Extension code *)
  out#byte   0xff;       (* Application Extension Label *)
  out#byte   0x0b;       (* Length of application block *)
  out#string "NETSCAPE"; (* Application Identifier *)
  out#string "2.0";      (* Appl. Authentication Code *)
  out#byte   0x03;       (* Length of Data Sub-Block *)
  out#byte   0x01;       (* Loop sub-block ID *)
  out#le16   count;      (* Loop count (0 = forever) *)
  out#byte   0x00;       (* Data Sub-Block Terminator *)

That is it. As an application, let me show you the Quasicrystal generator. First I need a grayscale palette:


let grayscale = GIF.palette (Array.init 256 (fun i ->
  { red = 255 - i; green = 255 - i; blue = 255 - i }))

and a way to write to a named file:

let with_output_file proc fname =
  unwind ~protect:close_out proc (open_out_bin fname)

The code itself is a direct port from the original:


let quasicrystal ?(nwaves=7) ?(freq=27) ?(nsteps=30) ~width ~height fname =
  let pi     = 3.14159265358979312 in
  let dp     = 2. *. pi /. float nsteps
  and dt     = pi /. float nwaves
  and ds     = 1.0 /. float (max width height)
  and omega  = 2. *. pi *. float freq in
  let frame phase pixels =
    let o = ref 0 in
    let y = ref (-0.5 *. ds *. float height) in
    for i = 0 to height - 1 do
      let x = ref (-0.5 *. ds *. float width) in
      for j = 0 to width - 1 do
        let s = ref 0. in
        for k = 0 to nwaves - 1 do
          let t = dt *. float k in
          let b = !y *. cos t -. !x *. sin t in
          s := !s +. 0.5 *. (cos (b *. omega +. phase) +. 1.)
        done;
        s := !s -. 2. *. floor (0.5 *. !s);
        if !s > 1.0 then s := 2. -. !s;
        pixels.[!o] <- char_of_int (truncate (255. *. !s));
        incr o;
        x := !x +. ds
      done;
      y := !y +. ds
    done
  in
  let pixels = String.create (width * height) in
  with_output_file (GIF.make ~ct:grayscale ~width ~height (fun gif ->
    GIF.repeat gif;
    for p = 0 to nsteps - 1 do
      GIF.comment (Printf.sprintf "Frame %d/%d" (p+1) nsteps) gif;
      frame (float p *. dp) pixels;
      GIF.image ~fps:25 pixels gif;
    done) ) fname

The last 8 lines are the ones actually dealing with the GIF output. For instance, the call:


quasicrystal ~width:256 ~height:256 ~nwaves:7 ~freq:13 ~nsteps:25 "quasicrystal.gif" ;;

creates the following image:


plane-wave quasicrystal

I hope you find the code useful.

2011-09-26

Yield, Continue

Roshan P. James and Amr Sabry show in "Yield: Mainstream Delimited Continuations" the interdefinability of yield-style generators and delimited continuations. Their encoding is at the same time simple and general, and even if the examples given in the paper are in Haskell, their translation into OCaml is straightforward. So much so that the result is essentially equivalent to ASAI Kenichi's OchaCaml (Edit: this claim of mine is certainly unsubstantiated and quite possibly wrong. See Zerny's comment).


James and Sabry generalize the mechanics of yield to a three-ported construct represented by the type (ι, ο, ρ) Yield:


The type of yield

This type encapsulates the communication between an iterator and its calling context, where the iterator yields values of type ο, receives inputs of type ι and terminates (or returns) with a final result of type ρ. This communication is mediated by a delimited context that can be activated with run which … marks the boundary of an iterator and delimits the action of yield. This communication is effected by a reified continuation given by a concrete data type with which the calling context can interact:


type ('i, 'o, 'r) iterator =
| Result of 'r
| Susp of 'o * ('i -> ('i, 'o, 'r) iterator)

In effect, run converts a monadic producer that uses yield into a CPS-transformed consumer that invokes the continuation given by an iterator. These patterns of interaction can be abstracted somewhat. The most general consumer is given by foreach:


let rec foreach (m : ('i, 'o, 'r) iterator) (f : 'o -> 'i) : 'r = match m with
| Susp (v, k) -> foreach (k (f v)) f
| Result r    -> r

It applies f to each value yielded by the iterator, feeding the result back to it. If the consumer is interested just in the yielded values and not in the result of the iteration, it can fold over them:


let rec fold (m : ('i, 'i, 'r) iterator) (f : 'i -> 'j -> 'j) (e : 'j) : 'j = match m with
| Susp (v, k) -> f v (fold (k v) f e)
| Result _    -> e

The essence of the iterator is given by an abstract signature:


module type YIELD = sig
  type ('i, 'o, 'r) yield
  val return : 'r -> ('i, 'o, 'r) yield
  val (>>=) : ('i, 'o, 'r) yield -> ('r -> ('i, 'o, 's) yield) -> ('i, 'o, 's) yield
  val yield : 'o -> ('i, 'o, 'i) yield
  val run : ('i, 'o, 'r) yield -> ('i, 'o, 'r) iterator
end

which gives a multi-parameter monad together with a pair of operations: yield, that returns a computation returning the yielded value (note the difference with return); and run, that captures the computation's context and reifies it into an iterator. The paper gives two possible implementations. The first "grabs" each invocation frame turning it directly into an iterator:


module FrameGrabbingYield : YIELD = struct
  type ('i, 'o, 'r) yield = ('i, 'o, 'r) iterator

  let return e = Result e

  let rec (>>=) m f = match m with
  | Result v    -> f v
  | Susp (v, k) -> Susp (v, fun x -> k x >>= f)

  let yield v = Susp (v, return)

  let run e = e
end

The seconds uses the CPS-encoded delimited continuation monad directly:


module CPSYield : YIELD = struct
  type ('i, 'o, 'r) yield =
    { cont : 'b . ('r -> ('i, 'o, 'b) iterator) -> ('i, 'o, 'b) iterator }

  let return x = { cont = fun k -> k x }

  let (>>=) m f = { cont = fun k -> m.cont (fun x -> (f x).cont k) }

  let yield v = { cont = fun k -> Susp (v, k) }

  let run e = e.cont (fun r -> Result r)
end

This is the standard CPS monad with answer type polymorphism, as given by Kiselyov. Now yield e is shift $ return . Susp e, and run e is equivalent to reset $ e >>= return . Result). This is sufficient but a bit bare-bones. Let's build from here:


module YieldT (Y : YIELD) = struct
  include Y

In the simplest case, generators simply yield successive values. The result of the computation is the value itself, that can be updated for the next cycle:


let rec repeat x = yield x >>= repeat
  let rec from i = yield i >>= fun j -> from (succ j)

Transformers are a bit more involved in that they must consume the iterator and yield new values, in effect delimiting the control of the iterator they consume. The simplest transformer is obviously map:


let rec map f y =
    let rec go = function
    | Result r    -> return r
    | Susp (v, k) -> yield (f v) >>= fun _ -> go (k v)
    in go (run y)

(Note that the monadic fmap would only act on the result and not on the generated values.) In this case, the result of the computation is the mapped value of the original iterator, that must be continued with the original value. Truncating an iterator is straightforward:


let rec take n y =
    let rec go n = function
    | Result r -> return (Some r)
    | Susp (_, _) when n = 0 -> return None
    | Susp (v, k) -> yield v >>= fun x -> go (n - 1) (k x)
    in go n (run y)

Combining two generators is also straightforward:


let zip y1 y2 =
    let rec go = function
    | Result r1, Result r2 -> return (r1, r2)
    | Susp (v1, k1), Susp (v2, k2) ->
      yield (v1, v2) >>= fun (x1, x2) -> go (k1 x1, k2 x2)
    | _ -> failwith "zip"
    in go (run y1, run y2)
end

(Iterators that return early must be dealt with in a somewhat arbitrary way.) With this it is relatively straightforward to use iterators:


let ex1 y =
  let module Y = YieldT( (val y : YIELD) ) in
  foreach Y.(run (take 10 (map succ (from 0)))) (Printf.printf "%d ")

And both implementations give equivalent results:


# let _ = ex1 (module FrameGrabbingYield : YIELD) ;;
1 2 3 4 5 6 7 8 9 10 - : 'a option = None
# let _ = ex1 (module CPSYield : YIELD) ;;
1 2 3 4 5 6 7 8 9 10 - : 'a option = None

Furthermore, Asai's examples (as given in this Reddit thread) can be easily duplicated as well:


module Tree (Y : YIELD) = struct
  type 'a tree = E | N of 'a tree * 'a * 'a tree

  open Y

  let rec depth_walk : 'a tree -> ('b, 'a, 'b tree) yield = function
  | N (l, n, r) ->
    depth_walk l >>= fun l' ->
    yield n      >>= fun n' ->
    depth_walk r >>= fun r' ->
    return (N (l', n', r'))
  | E -> return E

  let to_list t = fold (run (depth_walk t)) (fun x xs -> x :: xs) []

  let map f t = foreach (run (depth_walk t)) f

  let samefringe l r =
    let rec visit l r = match l, r with
    | Result _, Result _ -> true
    | Susp (a, ka), Susp (b, kb)
      when a = b ->
      visit (ka a) (kb b)
    | _ -> false
    in visit (run (depth_walk l)) (run (depth_walk r))

  let swap l r =
    let rec visit l r = match l, r with
    | Susp (a, ka), Susp (b, kb) -> visit (ka b) (kb a)
    | Result t1, Result t2 -> (t1, t2)
    | _ -> failwith "Unequal number of leaves"
    in visit (run (depth_walk l)) (run (depth_walk r))
end

Note that, except for the return type polymorphism, these versions are exactly the same. To prove that all works properly, here are a number of tests:


# module T = Tree(CPSYield) ;;
# open T ;;

# let t1 = N (N (E, 10, E), 20, N (E, 30, N (E, 40, E)))
and t2 = N (N (E, 10, N (E, 20, E)), 30, N (E, 40, E))
and t3 = N (N (E, 'a', N (E, 'b', E)), 'c', N (E, 'd', E))
;;

(I omit the output of these for clarity.)


# let _ = map succ t1 ;;
- : int T.tree = N (N (E, 11, E), 21, N (E, 31, N (E, 41, E)))
# let _ = to_list t1 ;;
- : int list = [10; 20; 30; 40]
# let _ = samefringe t1 t2 ;;
- : bool = true
# let _ = swap t1 t3 ;;
- : char T.tree * int T.tree =
(N (N (E, 'a', E), 'b', N (E, 'c', N (E, 'd', E))),
 N (N (E, 10, N (E, 20, E)), 30, N (E, 40, E)))

Note that in the last example the trees retain their respective shapes but interchange the values of their leaves.

2011-09-16

Higher Order Fun

(the serious title of this post is "First-Class Modules Encode Type-Safe Reification of Types" or some such) After reading Kiselyov and Yallop's "First-class modules: hidden power and tantalizing promises", I wanted to understand how first-class modules relate to higher order types. They state that [f]irst-class modules — first-class functors — permit type constructor abstraction and polymorphism. They show one such use of constructor parametricity; I wanted to try another by encoding Haskell's canonical type class. Yes, I'm talking about Monad! Here it is:

module type MONAD = sig
  type 'a t
  val return : 'a -> 'a t
  val (>>=) : 'a t -> ('a -> 'b t) -> 'b t
  val fail : 'a t
end

The idea Oleg and Jeremy outline in "Generics for the OCaml masses" is to reify a type constructor α t as a module that wraps it together with its type parameter. Since a type constructor bound in a module type (in this case α MONAD.t) "cannot escape" for soundness' sake, it must be packed and unpacked. In effect this constitutes a higher-order existential type:

module type Monad = sig
  type a
  module Repr (M : MONAD) : sig
    val extract : a M.t
  end
end

Here it is crucial to see that the extraction is entirely dependent on the actual "interpretation" (the term originally used by Oleg and Jeremy) of the type given by its definition in the module parameterizing it. Uses of these packed representations of first-class constructors must ensure that the interpretation is consistent (something that in Haskell is assured by the use of a type class parameter, as in Monad m ⇒ …). The use of a type synonym restores polymorphism:

type 'a monad = (module Monad with type a = 'a)

In order to enable concrete instances of MONAD to "display their wares" so to speak, it is convenient to let them unpack these representations by themselves:

module Make (M : MONAD) = struct
  include M
  let run (type s) (mx : s monad) : s t =
    let module MX = (val mx : Monad with type a = s) in
    let module RX = MX.Repr(M) in
    RX.extract
end

Given a concrete constructor M.t, the σ monad is let-bound as a module, its representation under M is obtained and the corresponding value extracted. The same pattern recurs in the following genuinely higher-kinded functions, of which return is the simplest:

let return : 'a . 'a -> 'a monad =
  fun (type s) x ->
  (module struct
    type a = s
    module Repr (M : MONAD) = struct
      let extract = M.return x
    end
  end : Monad with type a = s)

Note that the result type is α monad, that is, a bona-fide module. Note also that the representation of value x under the monad M is exactly M.return x. So far, no complications. Now for something completely different:

let (>>=) : 'a . 'a monad -> ('a -> 'b monad) -> 'b monad =
  fun (type s) (type t) mx f ->
  (module struct
    type a = t
    type res = t
    module Repr (M : MONAD) = struct
      let extract =
        let module MX = (val mx : Monad with type a = s) in
        let module RX = MX.Repr(M) in
        M.(RX.extract >>= fun x ->
          let my = f x in
          let module MY = (val my : Monad with type a = res) in
          let module RY = MY.Repr(M) in
          RY.extract
        )
    end
  end : Monad with type a = t)

Given mx of module type Monad with type a = σ, its representation under M is extracted and monadically bound to f, which produces a value my of module type Monad with type a = τ, under exactly the same monadic interpretation given by M. A technicality: since the modules are abstract, they are generative (every name is fresh); hence to force the sharing of type τ between my and the result value I need to rebind it with a unique name res that can be shared across scopes. Now the next thing is a bit disturbing:

let fail : 'a . 'a monad =
  fun (type s) ->
  (module struct
    type a = s
    module Repr (M : MONAD) = struct
      let extract = M.fail
    end
  end : Monad with type a = s)

Really, is that a type-level function‽ Actually no, the syntax fun (type σ) → … binds the type σ locally in the body of the function (I'd find it clearer if the syntax were let type s = … in …, provided that no constructor escapes. Maybe in OCaml 3.14?) That's it! All that remains is to write monadic functions as if they were given the necessary type class:

let liftM f mx = mx >>= fun x -> return (f x)

let (>>) mx my = mx >>= fun _ -> my

let guard p = if p then return () else fail

(exercise: define the obvious join and verify that it really has type α monad monad → α monad. Bonus points: expand its definition). The usage with concrete instances of MONAD is surprisingly straightforward. For instance, given the standard:

module OptionM = Make(struct
  type 'a t = 'a option
  let return x = Some x
  let (>>=) mx f = match mx with None -> None | Some x -> f x
  let fail = None
end)

module ListM = Make(struct
  type 'a t = 'a list
  let return x = [x]
  let (>>=) mx f = List.(concat (map f mx))
  let fail = []
end)

The following deceptively simple code just works:

# let v = liftM succ (return 3) >>= fun x -> return (succ x);;
val v : int monad = <module>
# OptionM.run v ;;
- : int OptionM.t = Some 5
# ListM.run v ;;
- : int ListM.t = [5]

The monadic computation v exists completely independently of the computational interpretation under which it is evaluated! Note that this interpretation is selected at runtime, not at compile-time; if it weren't for the differing types, the relevant run could be a higher-order parameter to a generic function.