Datatoad

Datatoad is an interactive Datalog engine, written in Rust.

It is built around a few ideas that, taken together, produce something meaningfully different from other Datalog engines:

Columnar dispatch. Data lives in columns rather than rows. Operations like permutation, filtering, and intersection are bulk transformations on contiguous arrays. This makes it cheap to rearrange data on the fly, which is what enables everything that follows.
Sorted columns as tries. Once columns are sorted lexicographically, prefix-sharing emerges for free. A column of values plus a column of bounds is a trie level, with no separate index structure.
Joins as trie intersection. Binary joins align two tries by shared prefix. Worst-case optimal joins generalize: for each fact, pick whichever trie offers the fewest extensions, propose from that, validate against the others. Both are instances of the same operation.
Relations as logic. A relation doesn’t have to be a stored table. Anything that can count extensions and propose values can participate in a join: arithmetic, ranges, constraints. These “logic relations” are bidirectional and first-class.

These ideas compose. Columnar dispatch makes worst-case optimal joins practical (not just theoretical) by making per-fact adaptive atom selection a bulk operation. Sorted columns give you tries without building them. The join framework’s generality lets logic relations slot in alongside stored data.

This book walks through each idea, how they are realized in datatoad, and what the consequences are.

Introduction

This chapter introduces Datalog, datatoad, and gets you running a first example.

Datalog is a declarative logic programming language: you state rules about what should be true, and the engine figures out everything that follows. Datatoad is an interactive runtime for evaluating Datalog programs, with a focus on robust performance. It has a few “big ideas”, revolving around unlocks from columnar data layouts, which are discussed in the next chapter.

Try it in your browser. Datatoad compiles to WebAssembly and runs entirely client-side, with no install and nothing sent to a server. Open the live demo and type rules at the prompt.

Although a lot of what follows is in the framing of Datalog, relatively little of datatoad is specific to Datalog. It just happens that Datalog is also a minimal shell of requirements for streaming, incremental, relational joins. Don’t get attached to the logic programming part of the language on my account, at least.

What is Datalog?

Datalog is a declarative language for expressing recursive queries over relations.

A Datalog program consists of facts (things that are true) and rules (ways to derive new facts from existing ones).

For example, given edges in a graph:

edge(1, 2) :- .
edge(2, 3) :- .
edge(3, 4) :- .

We can define reachability:

reach(x, y) :- edge(x, y).
reach(x, y) :- edge(x, z), reach(z, y).

The first rule says: if there is an edge from x to y, then y is reachable from x. The second says: if there is an edge from x to z, and y is reachable from z, then y is reachable from x.

The engine computes the fixpoint: it repeatedly applies the rules until no new facts are derived. The result is every (x, y) pair where y is reachable from x.

▶ Try it. Paste the facts and rules above into the browser demo, then run .list to see the derived reach relation. (Facts are written edge(1, 2) :- . — a rule with an empty body.)

Incremental Datalog

The core loop of (semi-naive, bottom-up) Datalog evaluation is

Starting from some novel facts,
Determine through rules candidate facts that may be novel,
Remove any pre-existing facts from the candidates, then call them novel,
Repeat as long as there are any novel facts.

This loop is very similar to the bones of streaming, incremental, relational computation. Our connection to Datalog is through a shared interest in these properties.

What is Datatoad?

Datatoad is an interactive, interpreted Datalog engine written in Rust.

Datatoad exists to exercise a few “big ideas” in the context of streaming, incremental, relational computation. No one of the ideas is fundamentally new, but they come together especially well in datatoad, and complement each other. The foundation is datatoad’s columnar data layout, which allows it to efficiently interpret and dispatch at runtime. This flexibility unlocks a variety of independently developed concepts, which now compose in ways that were unclear (to me) before.

Some call-outs, each of which we will develop in greater detail:

Columnar from the ground up. Data is stored and processed in columns, not rows. All algorithms and primitives work natively on columns; they do not have the ability to form complete rows. This reduces the memory footprint, can simplify data alignment, and makes other features practical.
Interpreted, not compiled. You type rules at a REPL and get results immediately. There is no compilation step, no generated code, no JIT. The engine interprets your program directly.
Lean multi-way join processing Multi-way joins are handled using indexes on input relations, but do not maintain intermediate state. Updates flow through the join in multiple directions, using per-atom query update plans.
Worst-case optimal. Datatoad implements multi-way joins by an instance of the GenericJoin algorithm, which provides worst-case optimal guarantees. Following the work of VLDB2018 this gives rise to a worst-case optimal bound for the whole iterative computation. Query patterns like triangle enumeration take asymptotically (and empirically) less time than in other systems.
Logic relations. Relations like :plus, :range, and :times participate in worst-case optimal joins as first-class atoms. They are bidirectional: :plus(x, y, z) can compute z from x and y, or x from y and z. The bidirectionality is especially important for multi-way join processing.
Distributed execution. Datatoad can shard data across multiple workers and even multiple machines. The columnar layout makes sharing and serialization easy and efficient.

A first example

Running datatoad (cargo run), you are dropped into a REPL. If you would like it to go faster, you can use cargo’s --release flag.

Let’s compute transitive closure of a small graph:

edge(1, 2) :- .
edge(2, 3) :- .
edge(3, 1) :- .

reach(x, y) :- edge(x, y).
reach(x, y) :- edge(x, z), reach(z, y).

Datatoad evaluates the rules to a fixpoint, and then returns a prompt.

▶ Try it. Paste the block above into the browser demo and run .list to see the result; the :print query below works there too.

The .list command prints all relations and their sizes. You should see something like

> .list
         3 edge  [forms: Identity, Permute-1-0]
         9 reach [forms: Identity]

To inspect the results you’ll want to use the built-in :print relation:

> temp(x,y) :- reach(x,y), :print(x,y).

This prompts a query that consults the :print relation, which contains all facts but prints each fact it is queried about.

> temp(x,y) :- reach(x,y), :print(x,y).
0x00000001      0x00000001
0x00000001      0x00000002
0x00000001      0x00000003
0x00000002      0x00000001
0x00000002      0x00000002
0x00000002      0x00000003
0x00000003      0x00000001
0x00000003      0x00000002
0x00000003      0x00000003

The output is hexadecimal, because the underlying data model is [u8], and integer literals are parsed as 32 bit integers.

Showing off

The previous example was deliberately simple. Let’s look at some things that make datatoad interesting.

Triangle queries

A triangle query finds all triples (a, b, c) that share all pairwise edges in a graph. It’s a standard benchmark because it is deceptively hard for conventional binary join engines.

First, let’s build a graph with a million edges designed to make binary joins suffer:

arc(0, x) :- :range(1, x, 1000001).
arc(x, 0) :- :range(1, x, 1000001).
arc(x, y) :- :range(1, x, 1000001), :plus(x, 1, y).

This creates a star around node 0 (a million spokes in each direction) plus a long path from one to one million. Any binary join plan for triangles on this graph must consider roughly a trillion intermediate results, because joining any two copies of arc produces every pair of edges through node 0: one million, squared.

The triangle query itself is one line:

tri(a, b, c) :- arc(a,b), arc(b,c), arc(c,a).

Datatoad handles this in about a second. A worst-case optimal join avoids the intermediate blowup entirely: for each (a,b) it finds the c by intersecting values from arc(b,c) and arc(c,a) efficiently, and similarly for the other edge inputs.

▶ Try it. Paste both blocks above into the browser demo. This one is heavy — it builds a few million arc facts, and the demo runs single-threaded WebAssembly, so it takes noticeably longer than the ~1s above and pauses the page while it runs.

For comparison, the equivalent SQL

select count(*)
from arc ab, arc bc, arc ca
where ab.y = bc.x
  and bc.y = ca.x
  and ca.y = ab.x;

will keep PostgreSQL busy for the better part of an hour (at least; I haven’t waited long enough to find out). Try it out!

Logic relations

You may have noticed :range and :plus in the example above. These are logic relations: relations whose contents are defined by logic rather than by stored data.

:range(a, x, b) contains all triples where a <= x < b.
:plus(x, y, z) contains all triples where x + y = z.

These relations are unbounded in size and are never stored explicitly. Instead, they sit behind an abstraction that lets them participate directly in worst-case optimal joins — answering “which values extend this prefix?” on the fly.

Imagine a set of sensors s with integer readings r, and monitoring intervals [lo, hi) for each sensor:

hits(s, r) :- asks(s, lo, hi), data(s, r), :range(lo, r, hi).

The right join strategy depends on the sensor, and the relative sizes of [lo, hi) and its number of its readings. Noisy sensors with many readings and a tight range should enumerate the range and intersect with the readings. Quiet sensors with few readings and a wide range should enumerate their readings and validate each with the range. Datatoad picks the right strategy sensor-by-sensor, rather than choosing one join plan for all sensors as in SQL.

Logic relations are also bidirectional: :plus(x, y, z) can compute z from bound x and y, or solve for x given y and z. This means the join planner can use them flexibly, choosing whichever direction produces fewer candidates. When presented with a query like

result(x, y, z) :- in1(x, y), in2(y, z), in3(x, z), :plus(x, y, z).

the introduction of a new tuple through any input can, via :plus, immediately form one candidate result, which then only needs to be validated with the other inputs. If :plus could only convert (x, y) to z, and not other directions, then the rule would only be efficient for facts produced through in1(x, y); the other inputs would need to join to propose x before the predicate could test it.

Distributed execution

Datatoad can shard work across multiple workers and machines. To run with four workers on one machine:

cargo run --release -- -w 4

Each worker maintains its own shard of the data, exchanging facts by hash as rules produce them. The same REPL, the same rules — just more workers.

To run across multiple machines, provide a common hostfile listing their addresses (as addr:port on new lines) in order:

cargo run --release -- -n 4 -p 0 --hostfile hostfile.txt

where each machine runs with its own -p index, starting at zero. This uses timely-communication’s networking layer under the hood, so the coordination model is the same whether workers are local threads or remote processes.

Planting face

The showing off notwithstanding, datatoad is very much a work in progress. There are many things it should do that it doesn’t yet do, for example not crashing in surprising ways. Some of the best examples are cherry picked, to make a point more than convince you to use datatoad.

And yet, it is all meant to work. Reach out when you encounter problems, and we can add them to the list.

Columnar Orientation

The foundational choice in datatoad is to represent data in columns rather than rows.

A relation with three columns is not stored as a vector of 3-tuples. Instead, it is three parallel vectors: one per column. Operations act on columns in bulk: permuting, filtering, sorting, and intersecting columns are array operations, not per-row decisions.

These bulk operations matter for more than raw performance; they make it easier to adapt our behavior.

Interpreted execution comes easily (and efficiently) to columnar models. The interpretation overhead is amortized over the bulk operations.
Isolating each column conceals details, which makes it easier for each column to specialize their behavior. We can sort a column of any fixed-width integers faster than a sequence of mixed width integers.
By abstracting the relations in joins they can be backed by different implementations. In addition to conventional joins, both antijoins and logic are applied as if columnar relations.

This chapter makes the columnar layout concrete. We start with the trie representation — how sorted relations decompose into layers of values and bounds — and show how datatoad’s core operations (union, antijoin, intersection) work layer by layer on this structure. We then cover sorting, which is the main cost of getting data into this representation.

Columnar tries

The chapter introduction described how columnar layouts let us cheaply rearrange which columns the engine looks at. This section makes the representation concrete: the actual data structures, how they compose, and how runtime specialization works.

The representation

A relation with three columns, sorted and deduplicated, looks like this:

(A, 1, x),
(A, 1, y),
(A, 1, z),
(A, 2, x),
(A, 3, x),
(A, 3, y),
(B, 2, z),
(C, 1, x),
(C, 2, x).

A naive columnar layout stores three parallel arrays, one per column:

c0  c1  c2
--  --  --
A   1   x
A   1   y
A   1   z
A   2   x
A   3   x
A   3   y
B   2   z
C   1   x
C   2   x

This is already useful — each column can be stored in a type-appropriate container — but it doesn’t exploit the sortedness.

Prefix compression

Because the data are sorted, the first column has runs of identical values. We can deduplicate it, but we need to remember where each value’s entries begin and end in the next column. Peeling off the last column and deduplicating the prefix gives:

c0  c1  bounds  c2
--  --  ------  --
A   1   [0, 3)  x
A   2   [3, 4)  y
A   3   [4, 6)  z
B   2   [6, 7)  x
C   1   [7, 8)  x
C   2   [8, 9)  y
                z
                x
                x

The bounds array translates from the second column into the third: each entry in c1 owns a range of entries in c2. Since bounds are always contiguous (one range’s upper bound is the next’s lower bound), we only need to store the upper bounds:

c0  c1  b2  c2
--  --  --  --
A   1   3   x
A   2   4   y
A   3   6   z
B   2   7   x
C   1   8   x
C   2   9   y
            z
            x
            x

Repeating this for the first two columns:

c0  b1  c1  b2  c2
--  --  --  --  --
A   3   1   3   x
B   4   2   4   y
C   6   3   6   z
        2   7   x
        1   8   x
        2   9   y
                z
                x
                x

Each column is now a sequence of sorted lists. Column 0 has one list ([A, B, C]) — the distinct values extending the empty prefix. Column 1 has three lists ([1,2,3], [2], [1,2]) — one per distinct value in column 0. Column 2 has six lists — one per distinct value in column 1.

The bounds between columns describe which lists in the next column correspond to which values in the current one. Importantly, the bounds belong to the next column, not the current one: they translate forward. This keeps the representation clean when multiple independent extensions share the same prefix column, a pattern that shows up in forward-looking join strategies.

Layers and forests

In code, each column becomes a Layer:

#![allow(unused)]
fn main() {
pub struct Layer<C> {
    pub list: Lists<C>,
}
}

where Lists<C> is a columnar container holding the values and bounds together — the sequence of sorted lists described above.

A complete trie is a Forest: a sequence of layers, one per column.

#![allow(unused)]
fn main() {
pub struct Forest<C> {
    layers: Vec<Rc<Layer<C>>>,
}
}

The number of layers equals the arity of the relation. Each layer’s values are typed (in datatoad’s case, byte sequences whose widths can be introspected at runtime), and the bounds translate between adjacent layers.

The name is “forest” because there is no requirement there be a single root, and generally we will work with intermediate runs of layers that have multiple “roots”.

Runtime specialization

A key property of the layer representation is that the boundary between layers carries no type information — just index ranges (bounds) translating from one layer to the next. This is what enables per-column specialization at runtime.

When datatoad encounters a layer whose byte sequences all have width 4 (or 8, or 12), it can upgrade to a fixed-width representation and dispatch to code that compares and copies fixed-size values. When the widths vary, it falls back to variable-length byte comparison. The choice is made per layer, per operation, at runtime — and because the interface between layers is type-erased, mixing specialized and generic layers in the same trie is free.

This is where the columnar trie representation pays off for an interpreter. A row-oriented engine would need to specialize on the full row type (the product of all column types), which explodes combinatorially. A columnar engine specializes each column independently, and the decisions compose without interference.

Columnar sorting

Our data rarely starts out as a columnar trie, and we’ll need to get it in to that shape. Moreover, we’ll often find we do have a columnar trie, but would like it in a different order. This is the generally challenging problem of “columnar sorting”, which we’ll describe here.

The problem is “generally challenging” because it is usually slow. Top sorting algorithms move data as they sort to localize the work, and columns smear that data across multiple locations. We are not going to out-perform row-oriented sorting at what they do best, but we’ll see there is a place for columnar sorting.

Columnar sorting starts from a sequence of columns (or even tries), and sorts each column in turn, each column passing notes to the next about how to continue the sort. Each column sort takes as input a list of integers that order and group its rows, and it sorts them by (group, value). It produces an output list of integers that order and group its rows, to pass to the next column.

This sort respects the ordering requirement of the preceding columns, without introducing the complexity of the types of those columns. In an interpreted setting this is paramount, because we cannot pre-compile the row sort for each of the column orders a user might create. In datatoad each type is [u8], but in each column sort we notice if it is exactly [u8; K] for some K and dispatch to an optimized (usize, [u8;K]) sort in each case.

Columnar trie sorting

One of datatoad’s core actions is reshaping columnar tries, and it wants to take as much advantage as it can.

The columnar sorting story becomes stronger when we start from a columnar trie. The layout is already columnar, but it is also helpfully compressed. Imagine we want to re-sort a trie from the column order (A, B, C) to the order (B, C, A).

The B column may be much shorter than the C column, and so there is less to sort.
When B is more complicated to sort, text for example, the reduction is especially well felt.
We can avoid ever materializing the uncompressed representation.

The columnar trie reveals structure that can make sorting substantially more efficient than flattening and re-forming.

Paged radix sorting

Datatoad uses a paged radix sort that is worth discussing. We use LSB radix sort because we often sort (usize, [u8; K]) data, which are narrow and byte-ordered. Radix sort excels at this sort of problem.

Rather than work with one large contiguous array, as LSB radix sort usually does, we’ll work with a sequence of smaller “pages”. The pages are small enough that an extra 256 of them are not problematic, but large enough to spend time working through rather than working around.

One benefit of the paged approach is that we are rarely sorting in-place. We have read the data from somewhere (a file, a trie) and want to write it back to somewhere else (a trie). The paged approach acquires memory as we fill it (like demand paging), but also releases memory as we finish with it (unlike most allocators). The pages allow us to maintain a smaller footprint as we sort.

Operations: union, intersection, and join

The columnar trie supports column-by-column operation for standard relational operations. The key observation, like with columnar sorting, is that each column can provide the information the next needs to continue, without revealing the details of the column itself. In sorting this was the ordering and grouping information. For these operators it will be alignment information.

Reports

The alignment between two layers is a sequence of Report values:

#![allow(unused)]
fn main() {
enum Report {
    This(usize, usize), // values exclusive to the first input
    That(usize, usize), // values exclusive to the second input
    Both(usize, usize), // a matching value in both inputs
}
}

A This(lower, upper) says that values at indices lower..upper in the first input have no match in the second. A That(lower, upper) says the same for the second input. A Both(i, j) says that value i in the first input equals value j in the second.

The sequence of reports accounts for every value in both inputs and spells out their sorted order. It tells the next layer exactly what to do: leave exclusive ranges as-is, and continue to resolve matching values by their values in the next layer. This foundation is sufficient for us to implement union, intersection, and join.

Union

Merging two tries proceeds layer by layer, starting from a single Both(0, 0) report (both inputs have one top-level list to merge).

At each layer, the merge consumes the current reports and produces new ones for the next layer:

This(lower, upper): Copy the lists from the first input. Emit This reports with expanded indices.
That(lower, upper): Copy the lists from the second input. Emit That reports with expanded indices.
Both(i, j): Merge the two lists (they’re sorted, so this is a linear merge). Emit This, That, and Both reports for the elements within.

The per-layer merge is generic over the value type, so each layer dispatches to type-appropriate comparison and copy logic. At the trie level, the code is compact — it walks the layers and, for each one, checks whether the data can be upgraded to a fixed-width type (e.g., 4-byte values) for a faster path:

#![allow(unused)]
fn main() {
pub fn union(&self, other: &Self) -> Self {
    let mut reports = VecDeque::default();
    reports.push_back(Report::Both(0, 0));
    for (layer0, layer1) in self.layers.iter().zip(other.layers.iter()) {
        // Merge this layer using the current reports,
        // producing new reports for the next layer.
        // Specialize by value width when possible.
    }
    ...
}
}

The This and That ranges are where the real throughput comes from: they identify contiguous runs of values and lists that can be bulk-copied rather than compared element by element.

Intersection

Intersection reuses the same layer-by-layer walk, but only tracks the Both matches — the This and That ranges are irrelevant, since we only care about what overlaps. At each layer, pairs of matching list indices are fed forward, and within those lists, a merge identifies matching values and records them for the next layer. The merge uses galloping (exponential search) to skip past non-matching regions efficiently:

#![allow(unused)]
fn main() {
while lower0 < upper0 && lower1 < upper1 {
    match val0.cmp(&val1) {
        Less    => { lower0 += 1; gallop(list0, &mut lower0, upper0, |x| x < val1); },
        Equal   => { reports.push_back((lower0, lower1)); lower0 += 1; lower1 += 1; },
        Greater => { lower1 += 1; gallop(list1, &mut lower1, upper1, |x| x < val0); },
    }
}
}

Intersection shows us where two input tries line up, without spending time on intervals where they do not. This ends up as the foundation for joins and antijoins.

Join (and Antijoin)

To join two collections by some shared key columns, datatoad forms columnar tries whose first columns are the shared key columns. An intersection not only finds the keys in common, but also returns with their offsets in each trie. In each trie that offset is the root of a subtree of values that must be crossed with the values of the other trie. Once intersected, a join is as easy as just reading all pairs of values out of the tails of the two tries (a “constant time enumerator”).

Antijoins are similar, in that the intersection indicates which values to avoid, rather than keep. The intersection report allows the trie to produce those elements that are exclusive to it. This is a core action in Datalog, where we want to retain only the truly novel facts each iteration.

Log-structured merge tries

Rather than maintain a single Forest per relation and modify it in place, datatoad wraps forests in a log-structured merge tree:

#![allow(unused)]
fn main() {
pub struct FactLSM<F> {
    layers: Vec<F>,
}
}

New facts arrive as small forests.

As soon as two forests have the same size, they are merged to keep the total count logarithmic — the invariant is that sizes should at least halve as you move through the list. This keeps individual forests immutable: all mutation happens by appending new forests and merging. Immutability plays well with large contiguous allocations, avoids the bookkeeping of in-place updates, and allows reference counted sharing.

The merge discipline, halving in size, keeps the memory footprint within 2x of what it could possibly be. All tries together sum to at most twice the size of the largest, which itself contains only distinct facts.

Products of sums

The LSM structure means that we’ll need to adapt many of our algorithms to work on sums of tries. This turns out to be not that hard; we do not need to flatten the LSM to one trie to union or intersect, for example. The requirement paves the way for other moments we might want to sum relations but cannot, for example if they have different representations (data v logic).

Joins

In Datalog each rule is translated to a multi-way equijoin. The rule

head(a, b, c) :- body0(a, x, y), body1(b, x, z), body2(c, y, z).

is a three-way join between body0, body1, and body2, with equality constraints imposed by the re-use of terms. The third field of body0 must equal the second field of body2, for example.

Not only do we need to perform joins in Datalog, we often perform incremental joins. These are joins that start from some pre-existing facts, and need to update in response to new facts. For example, we might have additions to body0, written d_body0, and need to produce d_head in response. We would like to perform this efficiently, proportional only to the new facts we are producing.

In this section we discuss how datatoad plans and executes joins.

Planning

Datatoad plans a rule by forming independent plans for each atom. Each of these plans is used when the atom presents with new facts. Unioned together, the results of these plans produce all new values of the rule head.

Each individual atom plan is a sequence of pairs of terms and atoms. Starting from the terms of the atom, each (terms, atoms) stage adds new terms, and restricts their values by atoms.

There are two common forms these take:

Single atom, multiple term (conventional binary join),
Single term, multiple atom (worst-case optimal join).

The only requirement on the plan is that by the end of it, each atom has participated while all of its terms were present. This ensures that each binding of values to terms that we produce satisfies each atom in the rule.

When planning a path, all atoms present through a common trait:

pub trait PlanAtom<T: Ord> {
    /// Terms the atom references.
    fn terms(&self) -> BTreeSet<T>;
    /// For input terms, other terms the atom can ground.
    fn ground(&self, terms: &BTreeSet<T>) -> BTreeSet<T>;
}

This trait explains the set of terms, and helps the planner navigate the relationship among terms.

For most data-based atoms, the terms are those of the atom and every term can be ground. These atoms can be used at any moment, as they can always just enumerate their contents. Obviously we prefer to do something more clever with them than this.

For logic-based atoms, the story can be more complicated. The operator order(a, b, c) : a < b < c could ground b given (a, c), but cannot ground c given (a, b). Logic based atoms can have a “direction” to their planning hyper-edge.

The datatoad planner starts from the terms of the source atom, and repeatedly adds terms that can be ground from the present terms. The planner has some heuristic preferences to involve terms that are constrained by multiple atoms, seeking the benefits of worst-case optimal joins early. There is much room to improve the depth of thought here, and conversations are welcome.

Binary joins

Datatoad’s binary join stages occur when a plan has one atom and any number of terms.

Binary joins are relatively easy when the two inputs are arranged as tries each starting with the columns to equate, in order. The first step is intersection: we find all pairs of indices corresponding to matching prefixes along the equated columns. The second step is enumeration: for each pair of indices we enumerate all pairs of remaining values from each input.

Almost all of the complexity of datatoad’s binary join implementation is in performing a fused enumeration and projection. The projections, which lead to the trie layout of the next stage, are the main cost in joining. They end up very analogous to trie formation, with columnar radix sorting to form the output layers.

The spirit here is very close to factorized databases, where aligning data is cheap, and one avoids materializing results until one is compelled (by a projection, for example).

Worst-case optimal joins

Datatoad’s worst-case optimal (WCO) join stages happen when there is one term and multiple atoms.

The WCO stages will build on binary joins as a primitive. In fact, these stages are just a few binary joins, with a careful switch put in to place.

When a stage has one term and multiple atoms, each atom involves the term and some other set of existing terms (perhaps none). For each tentative partial fact, a list of term bindings, each atom is probed to ask “how many values of the new term would you introduce?”. Each atom is then joined with the set of tentative partial facts for which the atom proposes the fewest values of the term. Finally, all newly extended facts are semijoined with all atoms, to confirm that each term that was added works for all atoms.

The operation of counting the values of the term, rather than producing the values, is the new addition. A WCO join stage starts with the counting, and partitions the facts. The parts are then joined with each of their respective atoms. All parts are merged and semijoined with all of the atoms.

Sequestration

When extending terms (a, b, c) to d through body0(b, d) and body1(c, d), we will “sequester” column a. We rotate the terms to be ordered (b, c, a) and then put the last layer to the side. We then join (b, c) to get (b, c, d) and re-assemble with the independent a column. Holding a to the side reduces the volume of data, potentially asymptotically.

Execution abstractions

Each atom involved in join execution implements a trait with two key methods:

pub trait ExecAtom<T: Ord> {
    /// For each fact in `delta`, the number of distinct values for terms in `added`.
    fn count(&self, delta: &Facts, added: &BTreeSet<T>) -> Vec<u64>;
    /// Joins `self` with `delta`, introducing terms in `added`, and projected to `after`.
    fn join(&self, delta: &mut Facts, added: &BTreeSet<T>, after: &[T]);
}

These methods allow us to implement binary joins, semijoins, and worst-case optimal joins.

Importantly, these methods can be implemented not only by columnar tries of data. Antijoins are implemented using this interface, but are only able to semijoin facts. Logic relations are also implemented this way, providing counts and values for any terms they have advertised they can ground.

There is a contract between PlanAtom and ExecAtom. If an atom advertises that it can ground a term as a function of other terms, then it must both propose a count and be able to enumerate the values of the terms. Atoms like antijoins will not advertise grounding any term, and can error if asked to count or propose terms; they should correctly semijoin when asked.

Relations as Logic

In most Datalog engines, an atom in a rule body refers to a stored relation: a table of facts. Datatoad generalizes this. An atom can be backed by anything that supports two operations: count (how many extensions would you propose?) and propose (enumerate or validate those extensions). A stored relation implements these by looking into its trie. But so can a computation.

The relation :plus(x, y, z) represents the constraint x + y = z. Given x and y, it can propose z. Given x and z, it can propose y. Given only z, it cannot propose anything (infinitely many decompositions), so it reports that and the planner knows not to start there. This bidirectionality falls out of the count/propose interface — the planner asks each atom what it can contribute given the currently bound terms, and logic relations answer honestly.

There is no special case for logic relations in the join algorithm. They are atoms, like any other.

This chapter covers how logic-backed relations are defined, how bidirectional computation works, and how they integrate with the join planner.

Abstract relations

TODO: The abstract relation framework. How relations are introduced (#21). The distinction between stored relations (backed by facts) and computed relations (backed by logic).

Bidirectional computation

TODO: The Logic trait and its bound() method. How :plus(x,y,z) can derive any one argument from the other two. How :range(lo, mid, hi) enumerates mid given lo and hi. How NotEq validates without proposing.

Logic in join planning

TODO: How the planner uses bound() to determine which terms are derivable. How logic relations participate in WCOJ alongside stored relations. How the count/propose/validate interface applies uniformly.

Scaling Out

This chapter covers datatoad’s distributed execution: how data is exchanged between workers, and what the current state of distributed scaling looks like.

Exchange and broadcast

TODO: The Comms abstraction over timely-communication. Exchange: sharding facts by hash of a column prefix. Broadcast: all-to-all for coordination. Conduit: tracking completion of a communication round.

What works, what doesn’t

TODO: Current state of distributed execution. What scales, what doesn’t. Known limitations (zero-arity multi-worker, self-sends for consolidation, single command stream). Honest assessment.

Evaluation

TODO: Benchmark results against Soufflé and other engines. Where datatoad is competitive, where it isn’t. Loading time, memory usage, join performance on various benchmarks (GALEN, etc.).

Keyboard shortcuts