SyntaxHighlighter

5 Dec 2015

CompletableFuture as a Trampoline for Tail Recursion Elimination

A function is said to be tail recursive when the result of the recursive call is what we return from the function. In other words, the recursive call is the last thing we do, and in particular, when the recursive call returns, we do not need any of the local variables and parameters of the current stack frame.

In many languages, the compiler will recognize this situation and re-use the current stack frame for the recursive call. This is called tail recursion elimination (or tail call optimization, in the more general case that the final function call is not recursive). But it doesn't happen in Java.

There are standard techniques to implement tail call optimization when the language does not have it. One possibility is to transform the recursive function to a loop manually, which is easy to do, typically easier than transforming a non-tail-recursive function into a tail recursive one. Another such method is called trampolining. A trampoline is simply a loop that performs the function calls iteratively, one after the other. In order to achieve that, the recursive function is rewritten so that it is no longer actually recursive, but  instead returns immediately with a higher-order function that will call the original function when evaluated inside the loop. These higher-order wrappers are usually called thunks. The term "trampoline" derives from the visual image of control bouncing up and down between the loop and the function, without ever spiralling downwards as in true recursion.

Information on trampolining is not hard to find. For example, here and here are two nice posts, with illustrations. As for Java, Pierre-Yves Saumont presents a trampoline implementation in this post (without actually mentioning the term).

I have noticed that Java actually contains a built-in class that implements trampolining, namely CompletableFuture. You get the desired behavior by making asynchronous tail calls (Viktor Klang's term). We won't need any custom classes or explicit loops. Let's use the Fibonacci function as an example. Here's a tail recursive formulation of it:

public static BigInteger fibTailRecursive(int n) {
    return n <= 2 ? ONE : fibTailRecursiveAux(n, ZERO, ONE);
}

private static BigInteger fibTailRecursiveAux(int n, BigInteger a, BigInteger b) {
    return n <= 0 ? a : fibTailRecursiveAux(n - 1, b, a.add(b));
}

To get the trampoline, we delay the recursive calls by wrapping them in CompletableFuture. The terminal value will be wrapped in an already completed future. Here's the corresponding thunked version of the above function definition:

public static BigInteger fibCF(int n) {
    return n <= 2 ? ONE : fibCF(n, ZERO, ONE).join();
}

private static CompletableFuture<BigInteger> fibCF(int n, BigInteger a, BigInteger b) {
    return n <= 0 ? terminate(a) : tailcall(() -> fibCF(n - 1, b, a.add(b)));
}

The complete "framework" consists only of the two utility methods terminate and tailcall. Plus we also should provide a dedicated thread to run the async calls in. (Adding more threads, or using the common Fork-Join pool actually slows things down in my environment.)

public static <T> CompletableFuture<T> terminate(T t) {
    return CompletableFuture.completedFuture(t);
}

public static <T> CompletableFuture<T> tailcall(Supplier<CompletableFuture<T>> s) {
    return CompletableFuture.supplyAsync(s, POOL).thenCompose(identity()); 
}

private static final ExecutorService POOL = Executors.newSingleThreadExecutor(new ThreadFactoryBuilder().setDaemon(true).build());

The class ThreadFactoryBuilder is from Guava. Composing with the identity function will unwrap the nested CompletableFuture that comes out of the call to supplyAsync.

Note that it is essential to use supplyAsync. Making synchronous calls, or using an Executor that runs tasks immediately in the caller thread (for example, Executor e = Runnable::run), would lead to a stackoverflow for large inputs.The trampoline loop is realised inside CompletableFuture by taking tasks from the queue associated with the Executor. This feature is not really documented. Although Doug Lea has pointed out to me that there is an implementation comment at the top of CompletableFuture that points in that direction
      * Method postComplete is called upon completion unless the target
      * is guaranteed not to be observable (i.e., not yet returned or
      * linked). Multiple threads can call postComplete, which
      * atomically pops each dependent action, and tries to trigger it
      * via method tryFire, in NESTED mode.  Triggering can propagate
      * recursively, so NESTED mode returns its completed dependent (if
      * one exists) for further processing by its caller (see method
      * postFire).

The bad news is performance. I have benchmarked this solution against a manually optimized iterative version and against Saumont's TailCall class. The upshot is that TailCall performs as well as the manually coded loop. Using CompletableFuture is three times as slow. Here's a representative measurement for computing the 5000th Fibonacci number:

# Warmup: 5 iterations, 1 s each
# Measurement: 25 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op

Benchmark                                   Mode  Cnt  Score   Error  Units
TrampolineBenchmark.measureLoop  avgt   25  0.403 ± 0.004  ms/op
TrampolineBenchmark.measureTC    avgt   25  0.415 ± 0.002  ms/op
TrampolineBenchmark.measureCF    avgt   25  1.258 ± 0.009  ms/op


Nevertheless, I like the simplicity of it.