I'd like to see the ANSI SQL layer open-sourced, and some benchmarks.

ansible · on May 4, 2018

If you need SQL now, you might consider CockroachDB:

https://www.cockroachlabs.com/

It is on-the-wire compatible with PostgreSQL clients.

zdw · on May 4, 2018

Except where it isn't: https://github.com/cockroachdb/cockroachdb-python/pull/14#is...

ansible · on May 4, 2018

I apologize if I was being unclear.

It is wire-compatible with PostgreSQL clients, meaning that if you have a PostgreSQL client library for your favorite programming language, you can use that to connect to CockroachDB.

The database features and SQL dialect itself are not 100% compatible with PostgreSQL, and likely never will be.

Also, even if all the SQL you do want to use is supported, it is likely a sub-optimal idea to just port some SQL over to the CockroachDB platform without additional investigation. Some issues, like interleaving values from different tables, should be examined to ensure good performance on CockroachDB.

And even setting aside issues like that, the access patterns you'd use for a multi-region distributed database are going to be a bit different than a high-availability PostgreSQL cluster. Data locality and all that.

The wide availability of client drivers just means you can get started quickly on popular platforms.

zzzcpan · on May 4, 2018

What would be the point of benchmarks? Benchmarks say pretty much nothing about distributed databases.

But you do want to know how such database behaves in general and during failures with some load and some data, say 100M records and 3 nodes. What kind of latency it has when nodes are added, replaced, moved etc. Including 90th percentile latency. How long does it take to move 1 TB worth of records, what if this process was interrupted, can it resume later or not. How does it deal with HDDs and bad sectors, especially wrt to performance, since disks like to retry bad sectors for a few seconds, completely blocking all I/O, requiring special handling. Or is it only suitable for SSDs.

jfindley · on May 4, 2018

Last time I looked, the SQL layer was really slow. I know they invested a ton of effort to improving it, but I don't think they ever really cracked the problem and I think you're likely better off with CockroachDB, as mentioned in another reply.

fizx · on May 4, 2018

I've built a toy SQL database before that actually did a good job with query planning and used existing highly optimized storage engines (lucene or rocksdb).

It was slow AF, because the actual query execution has a ton of method dispatches, etc to e.g. check index conditions or assemble rows and it never quite gets fully JIT'd by the JVM.

Every reasonable SQL engine i'm aware of compiles queries to lower-level code. A cursory inspection of their code indicates they don't do this, so its not surprising its slow.

carlhjerpe · on May 15, 2018

https://github.com/jaytaylor/sql-layer Here's the old layer :)