Why is it an awful idea? I don't understand the trade-offs well.

sgarland · 2025-07-07T13:53:07 1751896387

Bear in mind I have a large bias towards performance, and am a DBRE, so I also have strong opinions about normalization.

Separating compute and storage means that if you ever have to hit the disk - which is every time for writes, and depending on your working set size, often for reads as well - you’re getting a massive latency hit. I’ll use Amazon Aurora as an example, because they’re quite open with their architecture design, they’re the largest player in this space, and I’m personally familiar with it.

Aurora’s storage layer consists of 6 nodes split across 3 AZs. For a write to be counted as durable, it needs to be ack’d by 4/6 nodes, which means 2/3 AZs. That’s typically a minimum of 1 msec, though they do get written in parallel, which helps. 1 msec may not sound like much, but it’s an eternity for traditional SSD access.

MySQL is even worse with Aurora, because of its change buffer. Normally, writes (including deletes) to indexed columns (secondary indices) results in the changes to the indices being buffered, which avoids random I/O. Since Aurora's architecture is so wildly different than vanilla MySQL, it can't do that, and all writes to secondary indices must happen synchronously.

Given most SaaS companies' tendency to eschew RDBMS expertise in favor of full-stack teams, and those teams' tendency to use JSON[B] for everything, poor normalization practices, and sub-optimal queries, all of this adds up to a disastrous performance experience.

I have a homelab with Dell R620s, which originally came out in 2012. Storage is via Ceph on Samsung PM983 NVMe drives, connected with Mellanox ConnectX3-Pro in a mesh. These drives are circa-2013. Despite the age of this system, it has consistently out-performed Aurora MySQL and Postgres in benchmarks I've done. The only instance classes that can match it are, unsurprisingly, those with local NVMe storage.

In fairness, it isn't _all_ awful. Aurora does have one feature that is extremely nice: survivable page cache. If an instance restarts, in most circumstances, you don't lose the buffer pool / shared buffers on the instance. This means you don't have the typical cold start performance hit. That is legitimately cool tech, and quite useful. I'm less sold on the other features, like auto-scaling. If you're planning for a peak event (e.g. a sales event for e-commerce), you know well in advance, and have plenty of time to bring new instances online. If you have a surprise peak event, auto-scaling is going to take 30 minutes to 1 hour for the new instances to come online, which is an extremely long time to be sitting in a degraded state. This isn't really any faster than RDS, though again to Aurora's credit, the fact that all instances share the same underlying cluster volume means that there is no delay when pulling in blocks from S3.

Finally, Aurora’s other main benefit, as I alluded to, is that its shared cluster volume means that replication lag is typically quite low; 10-30 msec IME. However, also IME, devs don’t design apps around this, and anything other than instantaneous is too slow, so it doesn’t really matter.