This approach works by storing the actual SQLite binary files in Git and then using a custom "diff" configuration to dump each file as SQL and compare the result.
It's a neat trick, but storing binary files like that in Git isn't as space efficient as using a plain text format.
I built my own tooling to solve this problem: https://datasette.io/tools/sqlite-diffable - which outputs a “diffable” copy of the data in a SQLite database, precisely so you can store it in Git and look at the differences later.
I’ve been running that for a couple of years in this repo: https://github.com/simonw/simonwillisonblog-backup - which provides a backup of my blog’s PostgreSQL Django database (first converted to SQLite and then dumped out using sqlite-diffable).
I don't think there will be any huge difference between a text dump and binary formats here, git doesn't really care about the underlying data when it stores and packs it.
Maybe a text format that is sorted before saved would compress a lot better though, both with zlib compression and gits delta compression. You can't really sort a binary file and put it back together!
I'm more wondering about whether or not this Sqlite DB is actually a "source file", that is something that is not generated by something else, which is normally the base material for both make and Git.
Obviously an SQLite DB cannot be created ex nihilo, so it is not a source file. Isn't it a bit dangerous to store the database file in the repo, and not the mean to generate it? If both are included in the repo, it then seems redundant (reproducible builds concerns, maybe?).
Seeing that many people are interested in this topic, I wonder in which cases they would however want to track that file in Git.
> I’ve been running that for a couple of years in this repo: https://github.com/simonw/simonwillisonblog-backup - which provides a backup of my blog’s PostgreSQL Django database (first converted to SQLite and then dumped out using sqlite-
I'm curious, what is the reason you chose not to use pgdump, but instead opted to convert to to sqlite and then dump the DB using sqlite-diffable?
On a project I'm working on, I'd like to dump our Postgres schema into individual files for each object (i.e., one file for each table, function, stored proc, etc.), but haven't spent enough time to see if pgdump could actually do that. We're just outputting files by object type for now (one tables, function, and stored procs files). It looks like sqlite-diffable does something similar to what we're looking for (separate files for each object).
Mainly it's because all of my other projects are SQLite and this felt like a good opportunity to try out some of my own tooling against the main PostgreSQL database I still use for an active project.
pgdump can do that if you go with the directory format output. According to the docs, it will "create a directory with one file for each table and large object being dumped, plus a so-called Table of Contents file describing the dumped objects in a machine-readable format that pg_restore can read."
> It's a neat trick, but storing binary files like that in Git isn't as space efficient as using a plain text format.
Why is that? I would have expected git's delta compression to work well on sqlite files, since small table changes usually cause only a small number of page changes.
I would think so too. As a quick test I added chinook.db to a fresh repository, committed it, added a row to a table, committed again, and then did a git gc.
The resulting pack file, with two revisions of the database, is 329k. The raw database is 864k and gzips to 328k.
So it does look like git's delta encoding works with sqlite's blocks.
It's a neat trick, but storing binary files like that in Git isn't as space efficient as using a plain text format.
I built my own tooling to solve this problem: https://datasette.io/tools/sqlite-diffable - which outputs a “diffable” copy of the data in a SQLite database, precisely so you can store it in Git and look at the differences later.
I’ve been running that for a couple of years in this repo: https://github.com/simonw/simonwillisonblog-backup - which provides a backup of my blog’s PostgreSQL Django database (first converted to SQLite and then dumped out using sqlite-diffable).
Here’s an example diff: https://github.com/simonw/simonwillisonblog-backup/commit/72...