What is Datalog?
Datalog is a declarative language for expressing recursive queries over relations.
A Datalog program consists of facts (things that are true) and rules (ways to derive new facts from existing ones).
For example, given edges in a graph:
edge(1, 2).
edge(2, 3).
edge(3, 4).
We can define reachability:
reach(x, y) :- edge(x, y).
reach(x, y) :- edge(x, z), reach(z, y).
The first rule says: if there is an edge from x to y, then y is reachable from x.
The second says: if there is an edge from x to z, and y is reachable from z, then y is reachable from x.
The engine computes the fixpoint: it repeatedly applies the rules until no new facts are derived.
The result is every (x, y) pair where y is reachable from x.
Incremental Datalog
The core loop of (semi-naive, bottom-up) Datalog evaluation is
- Starting from some novel facts,
- Determine through rules candidate facts that may be novel,
- Remove any pre-existing facts from the candidates, then call them novel,
- Repeat as long as there are any novel facts.
This loop is very similar to the bones of streaming, incremental, relational computation. Our connection to Datalog is through a shared interest in these properties.