SQL Is Not The Problem

This is adapted from a comment I left on Hacker News.

Why don't we replace SQL with some other, better query language, perhaps something "equally powerful" like get name, age -> (foos.id X bars.foo_id) |> get first(name) |> head?

Part of the point of SQL is that the database may choose to satisfy your query in a very different manner than you might expect given the procedural reading. The example query is a nice illustration; a modern relational database may move project/select steps earlier in the processing as long as it doesn't change the declarative meaning. Doing this kind of rearranging with a pipeline-looking query language is going to surprise people because you're giving the majority of people who think procedurally a false affordance.

Here's a nice example of a false affordance from a blog I can't read:


The handles on the left doors give you a clue how they are to be used. ON the right, they're mounted on the wrong side of the door so they will be awkward and surprising to use.

By the way, the false affordance is the fundamental cause of the object-relational impedance mismatch: the ORM is trying to put a low-level wrapper around a higher-level idea. OO languages are still row-by-row procedural systems where relational databases are set-oriented.

If the target language is Haskell or Ruby or another "sufficiently DSL-capable language" it will be possible to make an internal DSL that encapsulates your query. However, in that case I think users will be surprised when they have either non-obvious limits to how they can intermix query with in-language code, or else bad database performance compared to Postgres. You can see a little of both in the story of RethinkDB. If you are not using an internal DSL, you'll be stuck in the same situation as SQL where you are embedding your query language in the code somehow.

Relational databases are not just storage technology with a complicated query language. They are also integration technology. SQL's surface syntax may be unfortunate, but I'm increasingly doubtful that there is going to be a serious general-purpose database competitor that manages to hit the same notes only better. The main contender from the 90s was OODBs; they managed to have both shitty performance and lacked the ecosystem that makes SQL so useful as an integration technology: your application may be the main actor, but there are usually a large number of supporting cast members like cron jobs, reports, little scripts, ad-hoc queries, backup and restore etc, and having to do all of that with an OODB is very painful.

There are now and will continue to be niches where relational databases suffer, the prominent one today being distributed data storage. But the problem you think is a problem really isn't a problem. Industrially speaking, "I don't like SQL" is a non-problem. For people who hate it, there are ORMs that make it possible to ignore it most of the time for better or worse (mostly worse, but whatever). Syntactically, the main problem with it is that different dialects that behave slightly differently. This turns into a major engineering annoyance, but one that is mostly swallowed by our library dependencies, who have by now already basically taken care of it.

The benefit of using a modern relational database (I'm talking about Postgres but it applies to SQL Server and probably others as well) is they already have hundreds or thousands of man-years of effort put into optimizing them. I really thought RethinkDB had a brilliant design and it was going to be the next generation database. But it performed worse than Postgres and that makes it basically a non-starter. This is part of why niche databases are not based on SQL: if you handle a niche well, I will tolerate your oddities, but if you want to build a new general-purpose database today, in 2018, you can't just have a better design. Your better design has to come with better performance and the ecosystem, the tool support, maintainability, etc., or it's a non-starter. Databases are one of the hardest technical markets to break into. For most companies, the data is actually the most important thing.