Driver Architecture

The actual parsing of input is handled by a so called “driver” represented as the class LL::Driver. This class is written in either C or Java depending on the Ruby platform that’s being used. The rationale for this is simple: performance. While Ruby is a great language it’s sadly not fast enough to handle parsing of large inputs in a way that doesn’t either require lots of memory, time or both.

Both the C and Java drivers try to use native data structures as much as possible instead of using Ruby structures. For example, their internal parsing stacks are native stacks. In case of Java this is an ArrayDeque, in case of C this is a vector created using the kvec library as C doesn’t have a native vector structure.

The driver operates by iterating over every token supplied by the each_token method (this method must be defined by a parser itself). For every input token a callback function in C/Java is executed that determines what to parse and how to parse it.

The parsing process largely operates on integers, only using Ruby objects where absolutely required. For example, all steps of a rule’s branch are represented as integers. Lookup tables are also simply arrays of integers with terminals being mapped directly to the indexes of these arrays. See ruby-ll’s own parser for examples. Note that the integers for the rules Array are in reverse order, so everything that comes first is processed last.

For more information on the internals its best to refer to the C driver code located in ext/c/driver.c. The Java code is largely based on this code safe for some code comments here and there.