When discussing blockchain development and tokenization with technically minded people, someone — sometimes the author — asks, or is asked why don’t you just use a regular database? Let’s flesh out the question, as a means of honestly exploring the differences — and real world trade-offs — between databases and blockchains.
The first half of the post covers the parsing of the question in the context of public blockchains, generally, and an overview of the drawbacks and advantages of both approaches. The latter portion will discuss the question in the context of blockchains which explicitly consider themselves to be databases — with a barely detectable emphasis on our project, Datopia, blockchain with an inferential database as its center.
Where’s the Regular Database?
A Structured Commons
Resilience is not a Free Lunch
Major cloud service providers have the luxury of bespoke, homogenously administered physical networks. Google’s WAN operates with sufficiently predictable latency and fault-tolerance — unattainable without centralization of every stripe — such that at least one of its flagship storage offerings would degrade spectacularly, if not fall apart, were it deployed elsewhere. At the risk of restating the obvious, abandoning these networks entails substantial performance and economic1 premia.
While many of us are attuned to the counterparty and censorship risks of operating atop cloud providers & homogeneous networks — and to those of accepting opaque, asymmetric platform governance — we’re clearly in a minority. The blockchain features most likely to engage large businesses — strong auditability, sound identity, immutability — make just as much sense without that decentralization thing1. Incumbent cloud providers know this, and have patented technologies that appear likely to be instantiated on their spectacularly efficient, highly centralized networks. On both cost and throughput, the resulting networks will eat the lunch of open-source, hetereogenously constituted chains — without compromising their value proposition to large enterprises.
It’s not all bad news — even if we concede likely defeat on every other front — from our perspective, there exist spectaular downsides to marrying a cloud provider. Even if only currently perceptible to smaller development shops — and larger ones solving specific problems — clear, repeated articulaton of the risks can only raise awareness among larger players.
Traditional cloud platforms employ a management approach — a governance model — opaque to their customers. Decisions which impact the businesses operating atop these networks — service development, product lifecycles, pricing models — are arrived at via centralized, occult processes which exclude the parties they affect. Customers predicate portions of their businesses on these platforms, yet remain powerless to audit, influence or contribute to their development — assuming substantial operational risks by depending on services which may be euthanized, re-priced, or abandoned by fiat. The problem compounds in the presence of opaquely implemented services boasting features — e.g. storage redundancy, comprehensive audit logs — which can’t be proven to operate as advertised.
To those who still see a baby through the murky bathwater, an obvious mitigation of some of the above risks may be attained by relying on the platform solely for its abstraction of compute power — atop which reputable, actively maintained & transparently governed open-source projects may deployed, if they exist. This approach will only get more more attractive, as service providers standardize support for open-source deployment targets, decreasing coupling between the customer and the provider.
Despite the improvement, this strategy likely carries its own, lesser, performance premium2, while failing to mitigate the possibility of censorship, or the existential risks facing a centralized, commercial network (state intervention, acquisition, insolvency, etc.) — it only lessens the load of the bags you’ll have to haul elsewhere, if your account is terminated. See the AWS Service Terms appendix, to sharpen your intuitions about how likely this may be in your problem domain.
2 Insofar as general purpose, open-source components cannot assume deployment atop improbably predictable networks, or rely on the presence of specialized hardware (e.g. atomic and GPS clocks, per Spanner).
Databases are not MMOs
Neither off-the-shelf databases — not cloud databse services — are well situated to directly service requests from end-user applications: they tend to rely on weak conceptions of identity, access controls of varying granularity — and can’t charge for access, or rate limit at the point of use — else we’d stop calling them databases. Commons«««
Setting inconvenience aside, let’s focus on what we forego when atomizing our databases and exposing them via throwaway middleware. Many applications capture volumes of data of which a subset is neither competitively sensitive nor identifying — and are disincentivised from making it public, even if only for marginal social capity — by the unreasonably barrier to entry erected by traditional stores. We’ve also profusion of organizations exclusively concerned with the collection, normalization and dissemination of similarly innocuous information — specifically for the purpose of public consumption1. Similarly, they’ve no means of accomplishing this goal without making striking compromises.
DBpedia, a structured Wikipedia export initiative, may prove an illustrative case study. Its sole purpose is the surfacing of versioned, normalized public data sets derived from Wikipedia article metadata. If that lights a fire under you, you’ve a couple of choices: submit SparQL queries until you encounter their API’s rate limit, or download a collection of flat-file RDF datasets for import into a private data store. How many identical instances of DBPedia datasets have been redundantly imported into segregated AWS accounts, for proprietary usage? Cloud providers have notoriously poor collaboration / delegation facilities — in the absence of cryptographically strong consistency guarantees for structured data, we have no means of trustlessly collaborating on data deployment.
Appendix I: AWS Terms
AWS Service Terms (rev. Oct 5, 2018)
- 1.4 If we reasonably believe any of Your Content violates the law, infringes or misappropriates the rights of any third party or otherwise violates a material term of the Agreement...we will notify you of the Prohibited Content and may request that such content be removed from the Services or access to it be disabled. If you do not remove or disable access...within 2 business days of our notice, we may remove or disable access to the [content]... or suspend the Services... Notwithstanding the foregoing, we may remove or disable access Prohibited Content without prior notice in connection with illegal content...pursuant to the Digital Millennium Copyright Act or as required to comply with law or any judicial, regulatory or other governmental order or request...
AWS Customer Agreement (rev. Nov 1, 2018)
- 4.1 Except to the extent caused by our breach of this Agreement...you are responsible for all activities that occur under your account, regardless of whether the activities are authorized by you or undertaken by you, your employees or a third party (including your contractors, agents or End Users)...