A failure mode of ULIDs and similar is that they're too random to be easily compared or recognized by eye.
This is especially useful when you're using them for customer or user IDs - being able to easily spot your important or troublesome customers in logs is very helpful
Personally I'd go with a ULID-like scheme similar to the one in the OP - but I'd aim to use the smallest number of bits I could get away with, and pick a compact encoding scheme
It is entirely viable to never have more than 1 or 2 open pull requests on any particular code repository, and to use continuous delivery practices to keep deploying small changes to production 1 at a time.
That's exactly how I've worked for the past decade or so.
Look into libeatmydata LD_PRELOAD. it disables fsync and other durability syscalls, fabulous for ci. Materialize.com uses it for their ci that’s where i learned about it.
There's a couple of passing mentions of Download Monitor, but also the timeline strongly implies that a specific source was simply guessing the URL of the PDF long before it was uploaded
I'm not clear from the doc which of these scenarios is what they're calling the "leak"
> but also the timeline strongly implies that a specific source was simply guessing the URL of the PDF long before it was uploaded
A bunch of people were scraping commonly used urls based on previous OBR reports, in order to report as soon as it was live, as it common with all things of this kind
The mistake was that the URL should have been obfuscated, and only changed to the "clear" URL at publish time, but a plugin was bypassing that and aliasing the "clear" URL to the obfuscated one
It sounds like a combination of the Download Monitor plugin plus a misconfiguration at the web server level resulted in the file being publicly accessible at that URL when the developers thought it would remain private until deliberately published.
For pure duckdb, you can put an Arrow Flight server in front of duckdb[0] or use the httpserver extension[1].
Where you store the .duckdb file will make a big difference in performance (e.g. S3 vs. Elastic File System).
But I'd take a good look at ducklake as a better multiplayer option. If you store `.parquet` files in blob storage, it will be slower than `.duckdb` on EFS, but if you have largish data, EFS gets expensive.
We[2] use DuckLake in our product and we've found a few ways to mitigate the performance hit. For example, we write all data into ducklake in blog storage, then create analytics tables and store them on faster storage (e.g. GCP Filestore). You can have multiple storage methods in the same DuckLake catalog, so this works nicely.
GizmoSQL is definitely a good option. I work at GizmoData and maintain GizmoSQL. It is an Arrow Flight SQL server with DuckDB as a back-end SQL execution engine. It can support independent thread-safe concurrent sessions, has robust security, logging, token-based authentication, and more.
It also has a growing list of adapters - including: ODBC, JDBC, ADBC, dbt, SQLAlchemy, Metabase, Apache Superset and more.
We also just introduced a PySpark drop-in adapter - letting you run your Python Spark Dataframe workloads with GizmoSQL - for dramatic savings compared to Databricks for sub-5TB workloads.
This doesn't seem accurate to me - Gambling sites legally operating in the UK already have strict KYC requirements applied to them via the Gamling regulator.
Visiting a gambling site isn't restricted, but signing up and gambling is.
If age restriction technology is now being introduced to prevent kids *viewing* "inappropriate" websites, then why are gambling websites being given a free pass?
They’ve already found a loophole for that: If you gamble with fake money (acquired through real money and a confusing set of currency conversions) and the prizes are jpegs of boat-girls (or horse-girls, as I hear are popular lately) or football players, you can sell to all the children you want.
The only mention I can see in this document of compression is
> Significantly smaller than JSON without complex compression
Although compression of JSON could be considered complex, it's also extremely simple in that it's widely used and usually performed in a distinct step - often transparently to a user. Gzip, and increasingly zstd are widely used.
I'd be interested to see a comparison between compressed JSON and CBOR, I'm quite surprised that this hasn't been included.
> I'm quite surprised that this hasn't been included.
Why? That goes against the narrative of promoting one over the other. Nissan doesn't advertise that a Toyota has something they don't. They just pretend it doesn't exist.
It’s worth noting that if you DisallowUnknownFields it makes it much harder to handle forward/backward compatible API changes - which is a very common and usually desirable pattern
While this is a common view, recently I’ve begun to wonder if it may be secretly an antipattern. I’ve run into a number of cases over the years where additional fields don’t break parsing, or even necessarily the main functionality of a program, but result in subtle incorrect behavior in edge cases. Things like values that are actually distinct being treated as equal because the fields that differ are ignored. More recently, I’ve seen LLMs get confused because they hallucinated tool input fields that were ignored during the invocation of a tool.
I’m a little curious to try and build an API where parsing must be exact, and changes always result in a new version of the API. I don’t actually think it would be too difficult, but perhaps some extra tooling around downgrading responses and deprecating old versions may need to be built.
If you’re writing a client, I could see this being a problem.
If you’re writing a server, I believe the rule is that any once valid input must stay valid forever, so you just never delete fields. The main benefit of DisallowUnknownFields is that it makes it easier for clients to know when they’ve sent something wrong or useless.
No, once-valid input can be rejected after a period of depreciation.
What actually makes sense is versioning your interfaces (and actually anything you serialize at all), with the version designator being easily accessible without parsing the entire message. (An easy way to have that is to version the endpoint URLs: /api/v1, /api/v2, etc).
For some time, you support two (or more) versions. Eventually you drop the old version if it's problematic. You never have to guess, and can always reject unknown fields.
This would also be easy to do if an API is designed around versioning from the beginning, because often it isn’t and it requires a lot of boilerplate and duplication, and it just results in everything being slapped into v1.
Especially the case in frameworks that prescribe a format for routing.
I think in some cases (like WhatsApp) the better model exists and is available, but isn’t used by the app - possibly as a judge to get you to give it more permissions
On iOS Strava’s app is able to access a photo picker, and the app only gets the photos I actually pick
Meanwhile WhatsApp insists on using the model where it tries to access all photos, and I limit it to specific ones via the OS
The more fine grained “only allow access to select photos” was introduced in iOS 14 and before that your only option was to ask for permissions for all photos. Not to say devs shouldn’t have converted by now but just to say it is possible they just implemented it that way at the time and never got around to updating rather than they really want the broader access.
Just checked and Android has this permission too: I can select "Allow limited access", but this requires manual configuring where you select specific photos/videos/albums to be accessible. It's so bizarre.
You're not providing a one-time access to the photo in this case, you're providing perpetual access to the uri.
If it loses access, it won't be able to display the media from your local storage. And of course, you wouldn't want it to duplicate the media because that'll take up extra storage.
This is especially useful when you're using them for customer or user IDs - being able to easily spot your important or troublesome customers in logs is very helpful
Personally I'd go with a ULID-like scheme similar to the one in the OP - but I'd aim to use the smallest number of bits I could get away with, and pick a compact encoding scheme