2007-12-01

The Genuine Sieve of Eratosthenes

The Sieve of Eratosthenes is a staple demo of the expressive power of lazy functional languages. It is usually encountered in its tersest Haskell form:

sieve [] = []
sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]

primes = sieve [1..]

However, in a feisty paper, Melissa O'Neill clearly shows that this is not the implementation of a sieving process, much less of Eratosthenes's algorithm (see also her rather forceful Haskell Café message). As is well-known, Eratosthenes' Sieve begins with an unmarked "grid" (or rather, a list) of integer numbers, starting from 2. Then, taking in turn the least unmarked number d, it "crosses out", or marks every multiple 2⋅d, 3⋅d…, of d. The unmarked numbers are not multiple of anything else and so, by definition, prime.

As O'Neill explains, in order to replicate this process with a computer program, there must be somehow a record of all "crossed-out" numbers. Her idea is to use a priority queue to readily access the next number to eliminate from the list. The key aspect of using a priority queue is that we can access the least element (according to some total order between them) in constant time. Then, for each prime p found, the queue is updated to record its multiples kp needing crossing out. I'll show an implementation of this idea, using this time object-oriented imperative queues, in OCaml.

Suppose an object q of class queue possessing, at least, an operation q#peek to look at the least element, an operation q#drop to eliminate it, and an operation q#put to add an element to it. In order to test if a number n is prime, we need to see if the queue holds some prime multiple equal to it or not. The queue will store every known prime p together with its least current multiple d as a pair (p, d). When n is tested against d and either found prime or marked, the code eliminates d from the queue and updates it with p's next multiple:

let is_prime q n =
  let rec test pass =
    let (p, d) = q#peek in
    if d > n then begin
      if pass then q#put (n, n + n);
      pass
    end else begin
      q#drop;
      q#put (p, d + p);
      test (d < n)
    end
  in test true

Either n passes the test, that is, is prime, or not. In any case, the queue's elements are inspected in order to update them with their next multiple; if any is equal to n we know from that mark that it is composite; otherwise, it is added as a formerly unknown prime. The type of is_prime is:

val is_prime : < drop : 'a; peek : int * int; put : int * int -> unit; .. > -> int -> bool

that is, as explained above, the only requirements on the queue class is that it has the methods drop, peek, put of the right type. A naïve but suitable implementation using lists is:

class ['a] queue (cmp : 'a -> 'a -> int) = object
  val mutable elts : 'a list = []
  method elements = elts
  method clear = elts <- []
  method is_empty = elts == []
  method peek = match elts with
  | [] -> failwith "peek"
  | x :: _ -> x
  method drop = match elts with
  | [] -> failwith "drop"
  | _ :: xs -> elts <- xs
  method put (a : 'a) =
    elts <- List.merge cmp [a] elts
end

The queue is generic on the type of the arguments, and the constructor takes the comparison function needed to make the elements totally ordered. It keeps them in a list, and its core operation is put, which simply inserts the new element into the existing list of elements while keeping it sorted. The extra methods will come handy later. With this, it is simple to find out all the primes up to a given limit lim:

let sieve q lim =
  let rec loop i =
    if i > lim then [] else
    if is_prime q i
      then i :: loop (succ i)
      else loop (succ i)
  in
  q#clear;
  q#put (2, 4);
  2 :: loop 3

The downside of using an imperative queue is that we must be mindful of state, and make sure that we start from a known point: we have to prime (!) the sieve with a queue containing just the first known candidate, namely two; from there the process is straightforward. So much so that we can easily do better. First of all, note that there is no need whatsoever to keep track of the found primes in a list, as the queue itself already does that for us. Also, O'Neill's program uses a flywheel, a recurring list of increments to skip over known composite numbers. The paper uses this large flywheel to eliminate multiples of 2, 3, 5 and 7; I'll use an infinite (circular) list:

let sieve q lim =
  let rec loop i (h :: t) =
    if i > lim then () else
    let _ = is_prime q i in loop (i + h) t
  in
  let rec flywheel =
    2 :: 4 :: 2 :: 4 :: 6 :: 2 :: 6 :: 4 ::
    2 :: 4 :: 6 :: 6 :: 2 :: 6 :: 4 :: 2 ::
    6 :: 4 :: 6 :: 8 :: 4 :: 2 :: 4 :: 2 ::
    4 :: 8 :: 6 :: 4 :: 6 :: 2 :: 4 :: 6 ::
    2 :: 6 :: 6 :: 4 :: 2 :: 4 :: 6 :: 2 ::
    6 :: 4 :: 2 :: 4 :: 2 ::10 :: 2 ::10 :: flywheel
  in
  q#clear;
  q#put (11, 22);
  loop 13 (List.tl flywheel);
  2 :: 3 :: 5 :: 7 :: List.sort compare (List.map fst q#elements)

This is faster, but only by a constant amount; the time is dominated by the queue operations. Even though removing the minimum is done in constant time (that is, is O(1)), the cost Clist of inserting a new element is linear in the size of the heap, or O(π(n)). This π(n) is the number of primes less than n, which is O(n/log n), and must be done once for every element in the queue, for every integer up to n, for a total cost of n⋅π(n)⋅Clist = O(n³/log² n).

By lowering the cost of an insertion, even by offsetting it with an increased cost per removal we can do asymptotically better. The best imperative implementation of a priority queue is a heap: a totally balanced binary tree with the property that every non-leaf node's children are greater than the parent. I'll use an array to store the nodes, so that the node stored at index i has its children stored at indexes 2⋅i and 2⋅i + 1. Since the binary tree is totally balanced, it has at most ⌊lg n⌋ + 1 levels.

At any instant, the heap condition must be kept; that is, in a heap every parent is always less than or equal to its left child, which in turn is always less than or equal to its sibling. This, given the layout of nodes in the array, means that the minimum element is the root, which is found at the very beginning of the array:

I will use a class with a member array that will be dynamically resized to make room for new elements. Initially, it holds no elements; however, the initial array is constructed by "magic"-ing dummy values of the correct type. The elements of the heap are returned as a list in an arbitrary order:

class ['a] heap (cmp : 'a -> 'a -> int) = object (self)
  val mutable elts : 'a array = Array.make 8 (Obj.magic 0 : 'a)
  val mutable count = 0

  method elements =
    let res = ref [] in
    for i = 0 to count - 1 do res := elts.(i) :: !res done;
    !res

Now clear, is_empty and peek are trivial:

  method clear = count <- 0
  method is_empty = count == 0

  method peek =
    if self#is_empty then failwith "peek";
    elts.(0)

In order to remove the least element, we move the last element in the heap to the root, replacing it. After that, it must "sift down" the tree to its proper resting place; the method siftdown restores the heap condition:

  method drop =
    if self#is_empty then failwith "drop";
    count <- count - 1;
    elts.(0) <- elts.(count);
    self#siftdown

A simple variant avoids duplicating the effort needed to keep the heap condition between a removal of the minimum and the insertion of a new element:

  method replace (a : 'a) =
    if self#is_empty then failwith "replace";
    elts.(0) <- a;
    self#siftdown

In order to insert a new element, the code first checks that there is enough space for it and inserts it as the last node. Then siftup is called to bubble it up the tree until the heap condition is restored:

  method put (a : 'a) =
    self#check;
    elts.(count) <- a;
    count <- count + 1;
    self#siftup

Space is made for a further element by doubling the array and copying its items to the new array:

  method private check =
    if count == Array.length elts then begin
      let sz = 2 * count in
      let ar = Array.make sz elts.(0) in
      Array.blit elts 0 ar 0 count;
      elts <- ar
    end

siftup is the simpler of the two restoring procedures. It repeatedly compares the unsorted child with its parent, exchanging both if they are out of order and recurring:

  method private siftup =
    let rec sift i =
      if i > 1 then begin
        let p = i lsr 1 in
        if cmp elts.(p - 1) elts.(i - 1) > 0 then begin
          let t = elts.(i - 1) in
          elts.(i - 1) <- elts.(p - 1);
          elts.(p - 1) <- t;
          sift p
        end
      end
    in sift count

(this is essentially the iteration for an insertion sort). siftdown is complicated by the fact that it must compare and exchange each node with the greatest of its two children:

  method private siftdown =
    let rec sift i =
      let c = ref (i lsl 1) in
      if !c <= count then begin
        if !c + 1 <= count
        && cmp elts.(!c - 1) elts.(!c) > 0 then incr c;
        if cmp elts.(!c - 1) elts.(i - 1) < 0 then begin
          let t = elts.(i - 1) in
          elts.(i - 1) <- elts.(!c - 1);
          elts.(!c - 1) <- t;
          sift !c
        end
      end
    in sift 1
end

Both siftup and siftdown dominate the complexity of drop and put, respectively. Both must traverse every level of the tree, so they have a cost of Cheap = ⌊lg n⌋ + 1 = O(log n). By using this data structure, the sieving cost drops from O(n³/log² n) to n⋅π(n)⋅Cheap = O(n²).

Given this class heap, the code for sieve remains unchanged; however, I will take advantage of replace in is_prime to avoid a siftup after a drop and a put:

let is_prime q n =
  let rec test pass =
    let (p, d) = q#peek in
    if d > n then begin
      if pass then q#put (n, n + n);
      pass
    end else begin
      q#replace (p, d + p);
      test (d < n)
    end
  in test true

All this, of course, is very far from O'Neill's functional implementation, as the present code is decidedly imperative. In fact, I'm not sure that this performs better than the naïve imperative sieve using an array of booleans (or a bit vector) to record which number is crossed. It has, however, the advantage of using memory proportional to the number π(n) of primes found, not to the range n, and since the heap grows dynamically, can be used to find primes less than an arbitrary limit without the need to have to hard-code it.

No comments: