# Motivation

"I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

Datopia has at its center a permissionless, immutable database — we maintain, persist and federate parallel indices to efficiently service structured queries over the chain state. Our specific approach to index implementation has involved extending Greenberg’s Hitchhiker tree, and, in datahike, providing facilities for arranging HH trees in such a way as to support interrogation with Datalog, a declarative query language.

This post is aimed at readers unfamiliar with the Hitchhiker tree — or
write-optimized B trees, more generally. We’ll begin by revising properties of
related data structures, while incrementally suggesting the design of the HH
tree — a functional implementation of
a B+ tree, employing similar
write-optimizations
to fractal tree indices or
B^{ε}
trees. Before parting, we’ll briefly touch on authentication and
replication. We can’t cover everything: details of how indices are represented
in the HH trees will be the topic of a follow-up post.

# From The Ground Up

## Binary Search Trees

The canonical (and simplest) *tree* data structure suitable for storing sorted
collections is
the binary search tree —
an arrangement in which inner nodes point to at most two subtrees (left and
right). While simple, an *unbalanced* BST is vulnerable to structural
deformation — if, say, entries are inserted in sorted order — and may
degenerate into an approximation of a linked list. Accordingly, we’ll set
unbalanced trees aside, and focus only on the balanced, or self-balancing subset of trees —
those in which all leaves are approximately equidistant from the root.

In any balanced tree, the distance between the root and a leaf scales logarithmically with the number of entries. In a balanced BST having entries, the cost of a lookup is comparisons, where the base of the logarithm — 2 — is the tree’s branching factor. Intuitively, this makes sense: we’re halving the remaining search space whenever we select a subtree for descent.

## B+ Trees

“You need the power of the logarithm.”

If we think about balanced tree lookups as involving comparisons — where is the branching factor — we might consider increasing to yield a more favourable base for the logarithm. The B tree family of structures can be understood as self-balancing generalizations of the BST, leveraging the logarithm for practical gain.

In *B+ trees*, specifically, we’ve a sorted layer of data nodes — likely
implemented as a contiguous/linked list — beneath sorted layers of *index*
nodes, through which we route key lookups. On each descent we logarithmically
search the sorted pivot values, locating the appropriate subtree — or,
terminally, the data node — for the input key.

In practice, B+ trees tend to be impressively shallow — they're realized with far greater branching factors than may coherently be depicted. We'll stick with smaller numbers, for the sake of the diagrams.

So, there’s a lot going on here. We’ve got a B+ tree containing the integer keys 1-29, with a minimum branching factor — — of 3. Each index node has between 3 () and 5 () children — excepting the root, which may have fewer. When an operation would otherwise violate these bounds, the tree maintains balance via unsurprising internal manipulations (joining/splitting nodes. etc.) Navigating via the pivot values is straightforward: the greatest (rightmost) value in any subtree is its parent’s pivot.

# Write Optimization

While B+ trees leave little to be desired for reading data, they’re not
write-optimal — an insert or deletion *also* costs , as we’re
required to walk to the respective leaf prior to operating on it^{1}.
The Hitchhiker tree attempts to asymptotically improve upon this, by buffering
write operations in fixed-length append logs associated with each index (inner)
node — an optimization common to fractal and B^{ε} trees. While
append log length is configurable, we’re using a maximum of two entries per index
node in the below examples.

In *Figure 2* we see such a tree, with the append logs rendered vertically
at the rightmost of each index node. If we attempt to append a
write to a full log, the contents are flushed downwards a level. Eventually,
an element arrives at a leaf and is inserted into the data
node. Per
Greenberg’s summary
of the benefits of this approach:

- Most inserts are a single append to the root’s event log.
- Although there are a linear number of events, nodes are exponentially less likely to overflow the deeper they are in the tree.
- All data needed for a query exists along a path of nodes between the root and a specific leaf node. Since the logs are constant in size, queries still only read nodes.

Given our ability to independently alter the log length and branching factor, we can view this data structure either as a write-optimized B+ tree, or a read-optimized append log. The latter property is one we’re interested in exploiting to efficiently replicate write-intensive event-logs in replikativ, an associated project.

^{1}See Cormen et al. for details.

## Insertion

Figure 3: A small Hitchhiker Tree.

In *Figure 3* you can see a small HH tree — specifically a *2-3* B+ tree (, so per index node) with append logs of
length 2 — containing the keys 0-12. Note how a few of the elements remain in
the logs (0, 11, 12, 13) — let’s walk through the insertion of further
elements, to develop our intuitions around how the append logs are flushed down
the tree.

Figure 4: We get off easily, via a vacancy in the root's log.

First we insert 14, observing that it requires a single write operation on the
root node’s log per *Figure 4* — leaving the root’s append log at capacity.
Let’s tempt fate and attempt to insert another element, -1:

Figure 5: An insert causes an overflow and flushes the elements down to the leaf nodes.

The root’s append log overflows, and elements 13 and 14 move rightwards, causing
another flush per *Figure 5* — triggering their insertion into the data layer
of the tree, and a split of the B+ tree’s index node. Critical for the
reduction in I/O costs is the fact that the newly inserted element — -1 —
only migrates to the node on the left, generating a single I/O operation until
the append log is filled up.

## Query

When querying a Hitchhiker tree, we have to downwardly project unincorporated
append log values — interpolating them in memory after loading the required
data nodes. In that sense, the appended values *hitchhike* with the query
operation to their appropriate position. This doesn’t require any additional
I/O, only the CPU work of sorting a few elements in the nodes. For range
queries, we selectively project downwards only the elements that belong on the
relevant path. See the asymptotic costs appendix for more
detailed information about the complexity of these operations.

# Replication

Clojure’s data structures are trivial
to structurally hash, and arbitrary data
structures may be authenticated
without difficulty. We’ve exploited these facts to *merkelize* (urgh) the
Hitchhiker tree, by indirecting parent-child relationships with
recursive SHA512 subtree hashes, yielding
a write-optimized Merkle B+ tree.

Consequently, index segments (addressed by hash) may be replicated peer-to-peer (e.g. Dat, Bittorrent), and selectively retrieved by light clients in response to local queries — database consumers maintain local copies of whatever subset of the indices their queries touch, without losing the ability to authenticate the entire structure, given a trusted root hash.

# Conclusion

I hope we’ve explained the Hitchhiker tree’s background sufficiently to communicate its attraction to us as a building block for distributed databases. We believe authenticated, optimized B trees to be a far better choice for high-performance data storage solutions (blockchains, P2P filesystems) than direct materialization of DAGs or chains. If these ideas interest you, or you’re motivated by the composition of clearly-delineated components into surprisingly powerful systems, consider joining us!

# Appendix: Asymptotic Costs

It’s perhaps clarifying to discuss complexity in the notation of B^{ε} trees. While a B+ tree has a fanout of — each node has at least
children — a tree has children
(e.g. children for ). Each node has
elements: are navigational pointers while the remainder belong to
the append log.

To calculate the amortized insertion cost informally, we can say that we have to flush an element times to the leaf. On each flush we move elements down to each child. For a detailed explanation see Section 2.2. of Jannen et al.

Cost (IO ops) | B+ tree | HH tree |
---|---|---|

insert/delete | ||

query | ||

range |

Table 1: Comparison of the asymptotic complexity of operations between a B+ tree and a HH tree.

Note that while the query cost increases slightly by , we can target larger node sizes, as they’re not rewritten as often as is the case for a B+ tree. If we consider as a fixed constant (e.g. ), it’s eliminated from the asymptotic cost expressions and yields the theoretic superiority of a fractal tree.