Last updated: 2019-06-05
If you're used to the usual, normalized database approach of storing each noun and relationship in a separate table, document-style storage, persistence and retrieval take some getting used to. And how far exactly can document storage be taken? Postgres is very accomodating with it's json and jsonb columns to store large documents, but what if one wanted to spread the document across many rows to trade off payload size for extra lookup queries? Below is my attempt at showing that a single, tree-like model or data structure may be megabytes and perhaps gigabytes, and still be serialized, stored and retrieved with a plain old Postgres database. Lets start with a basic unit of what we're storing and retrieving:
Our intent is to jam as much data into the "body" column so we can treat it's content as a plain ruby object when producing new value records or simply displaying the data. You may notice, we're opting out from a default, numeric, incrementing id and need to supply our own key. How we produce such a key will be seen later.
At it's most basic level, value record can store and lookup a PORO (plain old ruby object):
The Oj library helps us serialize Ruby objects to/from json. Using a "digest" for a key makes the body "content-addressable", which has a number of benefits.
A simple next small step is to exercise the ability to store nested values when serializing:
Now for the main course. We'd like to save space in our root object by replacing it's leaves with a pointer to the rest of the value, without encumbering the interface with explicit dereferencing.
This imposes a change on the interface of the value objects, but gets the job done of passing down the power to "auto-lookup" a leaf value from the database. Lastly we add the mechanism to trigger a "compression" of sorts, wherein a value may "check in" it's leaves into the database, thus reducing it's own serialized size.
I hope this technique is useful for many applications. This could be a way to push more state and logic into simple, plain objects, rather than having each object be an entity with it's own timeline of updates, uncorrelated to other entities. Value-based programming is more common in immutability-heavy languages such as Clojure, Elm, Haskell, but there's no reason we can't get the benefits of such style in Ruby. In fact, just a few years earlier, React/Redux introduced such style to the frontend. Having used ImmutableJS data structures there, I've been hooked on Immutability and appreciate it's effect on "time" and "state" in server-side code.
See the full Github gist here