2012-06-28

2D Interpolation, Part 4: ARGB Interpolation

After linearizing all array accesses, interpolating ARGB values is easy from the algorithmic side of things; so easy things first. I'll handle the pixel interpolation proper by abstracting them away in their own little functions. Processing an ARGB source array is just a matter of changing the variable declarations:

final int[] srcpix = {
0xff0051ff, 0xff00fffb, 0xff9cff00, 0xff0051ff,
0xffff0000, 0xff00ff08, 0xffffdb00, 0xff00fffb,
0xff9cff00, 0xff00fffb, 0xff0051ff, 0xffffdb00,
0xffffdb00, 0xff9cff00, 0xff00fffb, 0xff00ff08,
};
final int nrows = 4;
final int ncols = 4;

void interp0() {
  final int[] p = new int[4*ncols+8];
  p[1*ncols+2] = srcpix[0*ncols]; arrayCopy(srcpix, 0*ncols, p, 1*ncols+3, ncols); p[2*ncols+3] = srcpix[1*ncols-1];
  p[2*ncols+4] = srcpix[1*ncols]; arrayCopy(srcpix, 1*ncols, p, 2*ncols+5, ncols); p[3*ncols+5] = srcpix[2*ncols-1];
  p[3*ncols+6] = srcpix[2*ncols]; arrayCopy(srcpix, 2*ncols, p, 3*ncols+7, ncols); p[4*ncols+7] = srcpix[3*ncols-1];
  arrayCopy(p, 1*ncols+2, p, 0, ncols+2);
  int yr = 0;
  int yi = 2;
  int offset = 0;
  for (int i = 0; i < height; i++) {
    final float yf = (float) yr / (float) height;
    int c00 = p[0*ncols+0], c01 = p[0*ncols+1], c02 = p[0*ncols+2], c03 = p[0*ncols+3];
    int c10 = p[1*ncols+2], c11 = p[1*ncols+3], c12 = p[1*ncols+4], c13 = p[1*ncols+5];
    int c20 = p[2*ncols+4], c21 = p[2*ncols+5], c22 = p[2*ncols+6], c23 = p[2*ncols+7];
    int c30 = p[3*ncols+6], c31 = p[3*ncols+7], c32 = p[3*ncols+8], c33 = p[3*ncols+9];
    int xr = 0;
    int xi = 3;
    for (int j = 0; j < width; j++) {
      final float xf = (float) xr / (float) width;
      int c = 0;
      switch (method) {
      case NEAREST:
        c = yf < 0.5 ? xf < 0.5 ? c11 : c12 : xf < 0.5 ? c21 : c22;
        break;
      case BILINEAR:
        c = bilinear(xf, yf, c11, c12, c21, c22);
        break;
      case BICUBIC:
        c = bicubic(xf, yf,
                    c00, c01, c02, c03, c10, c11, c12, c13,
                    c20, c21, c22, c23, c30, c31, c32, c33);
        break;
      }
      pixels[offset++] = c;
      xr += ncols - 1;
      if (xr >= width) {
        xr -= width;
        xi += 1;
        if (xi < ncols+2) {
          c00 = c01; c01 = c02; c02 = c03; c03 = p[0*ncols+xi+0];
          c10 = c11; c11 = c12; c12 = c13; c13 = p[1*ncols+xi+2];
          c20 = c21; c21 = c22; c22 = c23; c23 = p[2*ncols+xi+4];
          c30 = c31; c31 = c32; c32 = c33; c33 = p[3*ncols+xi+6];
        }
      }
    }
    yr += nrows - 1;
    if (yr >= height) {
      yr -= height;
      yi += 1;
      if (yi <= nrows) {
        arrayCopy(p, (ncols+2), p, 0, 3*(ncols+2));
        if (yi < nrows) {
          p[3*ncols+6] = srcpix[yi*ncols];
          arrayCopy(srcpix, yi*ncols, p, 3*ncols+7, ncols);
          p[4*ncols+7] = srcpix[(yi+1)*ncols-1];
        }
      }
    }
  }
}

The pixel array is the result of obtaining the hue() for each value in the original float array data. As you can see, I simplified the inner switch by extracting the interpolation schemes off to named routines. These routines will have to unpack the four components in each pixel, interpolate each component and pack back the result pixel. The bilinear interpolator is straightforward:

static int bilinear(final float t, final float u,
                    final int c00, final int c01,
                    final int c10, final int c11) {
  final int a00=(c00>>24)&0xff, r00=(c00>>16)&0xff, g00=(c00>>8)&0xff, b00=c00&0xff;
  final int a01=(c01>>24)&0xff, r01=(c01>>16)&0xff, g01=(c01>>8)&0xff, b01=c01&0xff;
  final int a10=(c10>>24)&0xff, r10=(c10>>16)&0xff, g10=(c10>>8)&0xff, b10=c10&0xff;
  final int a11=(c11>>24)&0xff, r11=(c11>>16)&0xff, g11=(c11>>8)&0xff, b11=c11&0xff;
  final float a = linear(u, linear(t, a00, a01), linear(t, a10, a11));
  final float r = linear(u, linear(t, r00, r01), linear(t, r10, r11));
  final float g = linear(u, linear(t, g00, g01), linear(t, g10, g11));
  final float b = linear(u, linear(t, b00, b01), linear(t, b10, b11));
  return (int) a << 24 | (int) r << 16 | (int) g << 8 | (int) b;

Note that the pixels are unpacked in "row" (ARGB) order but interpolated in "column" order; these transpositions are typical of vectorized and GPU code. The result is very different to the previous version of the linear interpolator, since I'm interpolating vectors here, not scalars mapped to a color ramp:

bilinear interpolation

You can see very noticeable Mach banding in the form of color ridges and creases. hue() mitigated this effect by smootherstep'ping the parameter, and I can do the same here, by converting a float channel into a saturated (that is, range limited) int channel:

static int bsat(float x) {
  if (x < 0)
    return 0;
  else if (x > 255)
    return 255;
  x /= 255f;
  return (int) ((10 - (15 - 6 * x) * x) * x * x * x * 255);
}

Changing the last line in bilinear() to:

  return bsat(a) << 24 | bsat(r) << 16 | bsat(g) << 8 | bsat(b);

results in an image with the high-frequency components smoothed out or filtered:

bilinear interpolation with smoothing

Much nicer. In fact I now understand why there are so many interpolation options to resize an image in Adobe Photoshop®. In the case of the bicubic interpolation I quickly learned that some sort of clipping and/or rescaling is vital, since the "tension" given by the derivatives can make the interpolated values overshoot:

bicubic interpolation with channel overflow errors

This is easily seen by masking out every channel but one:

bicubic interpolation with channel overflow errors, red channel

In fact, a little experimentation shows that interpolated values can go as low as -72 or as high as 327! The worst outliers are, of course, the result of interpolating high-frequency matrices of the form [[0 1 1 0] [1 0 0 1] [1 0 0 1] [0 1 1 0]]. Local rescaling is not a good choice, in my opinion, since it would wash out all channels equally; in particular, it would make images more transparent than they really are. The obvious choice is clipping, although doing that without filtering introduces artifacts:

bicubic interpolation with clipping

Using the bsat() above produces the best results:

bicubic interpolation with clipping and smoothing

The final code for the ARGB bicubic interpolator is:

static int bicubic(final float t, final float u,
                   final int c00, final int c01, final int c02, final int c03,
                   final int c10, final int c11, final int c12, final int c13,
                   final int c20, final int c21, final int c22, final int c23,
                   final int c30, final int c31, final int c32, final int c33) {
  final int a00=(c00>>24)&0xff, r00=(c00>>16)&0xff, g00=(c00>>8)&0xff, b00=c00&0xff;
  final int a01=(c01>>24)&0xff, r01=(c01>>16)&0xff, g01=(c01>>8)&0xff, b01=c01&0xff;
  final int a02=(c02>>24)&0xff, r02=(c02>>16)&0xff, g02=(c02>>8)&0xff, b02=c02&0xff;
  final int a03=(c03>>24)&0xff, r03=(c03>>16)&0xff, g03=(c03>>8)&0xff, b03=c03&0xff;
  final int a10=(c10>>24)&0xff, r10=(c10>>16)&0xff, g10=(c10>>8)&0xff, b10=c10&0xff;
  final int a11=(c11>>24)&0xff, r11=(c11>>16)&0xff, g11=(c11>>8)&0xff, b11=c11&0xff;
  final int a12=(c12>>24)&0xff, r12=(c12>>16)&0xff, g12=(c12>>8)&0xff, b12=c12&0xff;
  final int a13=(c13>>24)&0xff, r13=(c13>>16)&0xff, g13=(c13>>8)&0xff, b13=c13&0xff;
  final int a20=(c20>>24)&0xff, r20=(c20>>16)&0xff, g20=(c20>>8)&0xff, b20=c20&0xff;
  final int a21=(c21>>24)&0xff, r21=(c21>>16)&0xff, g21=(c21>>8)&0xff, b21=c21&0xff;
  final int a22=(c22>>24)&0xff, r22=(c22>>16)&0xff, g22=(c22>>8)&0xff, b22=c22&0xff;
  final int a23=(c23>>24)&0xff, r23=(c23>>16)&0xff, g23=(c23>>8)&0xff, b23=c23&0xff;
  final int a30=(c30>>24)&0xff, r30=(c30>>16)&0xff, g30=(c30>>8)&0xff, b30=c30&0xff;
  final int a31=(c31>>24)&0xff, r31=(c31>>16)&0xff, g31=(c31>>8)&0xff, b31=c31&0xff;
  final int a32=(c32>>24)&0xff, r32=(c32>>16)&0xff, g32=(c32>>8)&0xff, b32=c32&0xff;
  final int a33=(c33>>24)&0xff, r33=(c33>>16)&0xff, g33=(c33>>8)&0xff, b33=c33&0xff;
  final float a = cubic(u, cubic(t, a00, a01, a02, a03), cubic(t, a10, a11, a12, a13),
                           cubic(t, a20, a21, a22, a23), cubic(t, a30, a31, a32, a33));
  final float r = cubic(u, cubic(t, r00, r01, r02, r03), cubic(t, r10, r11, r12, r13),
                           cubic(t, r20, r21, r22, r23), cubic(t, r30, r31, r32, r33));
  final float g = cubic(u, cubic(t, g00, g01, g02, g03), cubic(t, g10, g11, g12, g13),
                           cubic(t, g20, g21, g22, g23), cubic(t, g30, g31, g32, g33));
  final float b = cubic(u, cubic(t, b00, b01, b02, b03), cubic(t, b10, b11, b12, b13),
                           cubic(t, b20, b21, b22, b23), cubic(t, b30, b31, b32, b33));
  return bsat(a) << 24 | bsat(r) << 16 | bsat(g) << 8 | bsat(b);
}

Quite a mouthful, but I arrived at it by methodic and meticulous columnar editing. Now everything is ready to tackle the problem of hoisting the conditional in the inner loop and deriving three different, fully streamlined interpolators. Until next time.

No comments: