Hacker Newsnew | past | comments | ask | show | jobs | submit | glenjamin's commentslogin

A failure mode of ULIDs and similar is that they're too random to be easily compared or recognized by eye.

This is especially useful when you're using them for customer or user IDs - being able to easily spot your important or troublesome customers in logs is very helpful

Personally I'd go with a ULID-like scheme similar to the one in the OP - but I'd aim to use the smallest number of bits I could get away with, and pick a compact encoding scheme


I’m amazed that this comment is so low down

Stacked diffs seems like a solution to managing high WIP - but the best solution to high WIP is always to lower WIP

Absolutely everything gets easier when you lower your work in progress.


This seems idealistic. It's very normal to be working on a feature that depends on a not-yet-merged feature.


> It's very normal to be working on a feature that depends on a not-yet-merged feature.

Oh sure, many bad ideas and poor practises such as that one are quite "normal". It's not a recommendation.


I invite you to look into feature flagging.

It is entirely viable to never have more than 1 or 2 open pull requests on any particular code repository, and to use continuous delivery practices to keep deploying small changes to production 1 at a time.

That's exactly how I've worked for the past decade or so.


Does pglite in memory outperform “normal” postgres?

If so then supporting the network protocol so it could be run in CI for non-JS languages could be really cool


Look into libeatmydata LD_PRELOAD. it disables fsync and other durability syscalls, fabulous for ci. Materialize.com uses it for their ci that’s where i learned about it.


for CI you can already use postgresql with "eat-my-data" library ? I don't know if there's more official image , but in my company we're using https://github.com/allan-simon/postgres-eatmydata


You can just set fsync=off if you don't want to flush to disk and are ok with corruption in case of a OS/hw level crash.


Huh, i always just mounted the data directory as tmpfs/ramdisk. Worked nicely too


There's a couple of passing mentions of Download Monitor, but also the timeline strongly implies that a specific source was simply guessing the URL of the PDF long before it was uploaded

I'm not clear from the doc which of these scenarios is what they're calling the "leak"


> but also the timeline strongly implies that a specific source was simply guessing the URL of the PDF long before it was uploaded

A bunch of people were scraping commonly used urls based on previous OBR reports, in order to report as soon as it was live, as it common with all things of this kind

The mistake was that the URL should have been obfuscated, and only changed to the "clear" URL at publish time, but a plugin was bypassing that and aliasing the "clear" URL to the obfuscated one


> in order to report as soon as it was live

We don't actually know that, it's just that the report did hit Reuters pretty swiftly.


https://obr.uk/docs/dlm_uploads/OBR_Economic_and_fiscal_outl... 5.pdf

Not hard to guess really. Wouldn't they know this was likely and simply choose a less obvious file name?


Turn out, no. Not they would not.



It sounds like a combination of the Download Monitor plugin plus a misconfiguration at the web server level resulted in the file being publicly accessible at that URL when the developers thought it would remain private until deliberately published.


Other than motherduck, is anyone aware of any good models for running multi-user cloud-based duckdb?

ie. Running it like a normal database, and getting to take advantage of all of its goodies


For pure duckdb, you can put an Arrow Flight server in front of duckdb[0] or use the httpserver extension[1].

Where you store the .duckdb file will make a big difference in performance (e.g. S3 vs. Elastic File System).

But I'd take a good look at ducklake as a better multiplayer option. If you store `.parquet` files in blob storage, it will be slower than `.duckdb` on EFS, but if you have largish data, EFS gets expensive.

We[2] use DuckLake in our product and we've found a few ways to mitigate the performance hit. For example, we write all data into ducklake in blog storage, then create analytics tables and store them on faster storage (e.g. GCP Filestore). You can have multiple storage methods in the same DuckLake catalog, so this works nicely.

0 - https://www.definite.app/blog/duck-takes-flight

1 - https://github.com/Query-farm/httpserver

2 - https://www.definite.app/


I wonder if anyone has experimented with "Mountpoint for S3" + DuckDB yet

https://docs.aws.amazon.com/AmazonS3/latest/userguide/mountp...


The duckdb http extension reads S3 compatibles.


that looks neat - how but do you handle failover/restarts?


in which one? restarts are no problem on ducklake (ACID transactions in catalog)

the others, I haven't tried handling it in.


GizmoSQL is definitely a good option. I work at GizmoData and maintain GizmoSQL. It is an Arrow Flight SQL server with DuckDB as a back-end SQL execution engine. It can support independent thread-safe concurrent sessions, has robust security, logging, token-based authentication, and more.

It also has a growing list of adapters - including: ODBC, JDBC, ADBC, dbt, SQLAlchemy, Metabase, Apache Superset and more.

We also just introduced a PySpark drop-in adapter - letting you run your Python Spark Dataframe workloads with GizmoSQL - for dramatic savings compared to Databricks for sub-5TB workloads.

Check it out at: https://gizmodata.com/gizmosql

Repo: https://github.com/gizmodata/gizmosql


Oh, and GizmoData Cloud (SaaS option) is coming soon - to make it easier than ever to provision GizmoSQL instances...


Feels like I keep seeing "Duckdb in your postgres" posts here. Likely that is what you want.



This reminded me of a slide from a Dan North talk - perhaps this one https://dannorth.net/talks/#software-faster? One of those anyway.

The key quote was something like "You want your software to be like surgery - as little of it as possible to fix your problem".

Anyway, it doesn't seem like this blog post is following that vibe.


I like this quote.

Unfortunately, my predecessor at work followed a different principle - "copy paste a whole file if it saves you 5 minutes today".

Well, I am still a surgeon, I just do a lot of amputations.


This doesn't seem accurate to me - Gambling sites legally operating in the UK already have strict KYC requirements applied to them via the Gamling regulator.

Visiting a gambling site isn't restricted, but signing up and gambling is.


You <-------> The point

If age restriction technology is now being introduced to prevent kids *viewing* "inappropriate" websites, then why are gambling websites being given a free pass?

The answer is to follow the money:

https://www.google.co.uk/search?q=gambling%20industry%20lobb...


They’ve already found a loophole for that: If you gamble with fake money (acquired through real money and a confusing set of currency conversions) and the prizes are jpegs of boat-girls (or horse-girls, as I hear are popular lately) or football players, you can sell to all the children you want.


The only mention I can see in this document of compression is

> Significantly smaller than JSON without complex compression

Although compression of JSON could be considered complex, it's also extremely simple in that it's widely used and usually performed in a distinct step - often transparently to a user. Gzip, and increasingly zstd are widely used.

I'd be interested to see a comparison between compressed JSON and CBOR, I'm quite surprised that this hasn't been included.


> I'm quite surprised that this hasn't been included.

Why? That goes against the narrative of promoting one over the other. Nissan doesn't advertise that a Toyota has something they don't. They just pretend it doesn't exist.


It’s worth noting that if you DisallowUnknownFields it makes it much harder to handle forward/backward compatible API changes - which is a very common and usually desirable pattern


While this is a common view, recently I’ve begun to wonder if it may be secretly an antipattern. I’ve run into a number of cases over the years where additional fields don’t break parsing, or even necessarily the main functionality of a program, but result in subtle incorrect behavior in edge cases. Things like values that are actually distinct being treated as equal because the fields that differ are ignored. More recently, I’ve seen LLMs get confused because they hallucinated tool input fields that were ignored during the invocation of a tool.

I’m a little curious to try and build an API where parsing must be exact, and changes always result in a new version of the API. I don’t actually think it would be too difficult, but perhaps some extra tooling around downgrading responses and deprecating old versions may need to be built.


It's a convenience and a labor saver, so of course it's fundamentally at odds with security. It's all trade-offs.


If you’re writing a client, I could see this being a problem.

If you’re writing a server, I believe the rule is that any once valid input must stay valid forever, so you just never delete fields. The main benefit of DisallowUnknownFields is that it makes it easier for clients to know when they’ve sent something wrong or useless.


No, once-valid input can be rejected after a period of depreciation.

What actually makes sense is versioning your interfaces (and actually anything you serialize at all), with the version designator being easily accessible without parsing the entire message. (An easy way to have that is to version the endpoint URLs: /api/v1, /api/v2, etc).

For some time, you support two (or more) versions. Eventually you drop the old version if it's problematic. You never have to guess, and can always reject unknown fields.


This would also be easy to do if an API is designed around versioning from the beginning, because often it isn’t and it requires a lot of boilerplate and duplication, and it just results in everything being slapped into v1.

Especially the case in frameworks that prescribe a format for routing.


I think in some cases (like WhatsApp) the better model exists and is available, but isn’t used by the app - possibly as a judge to get you to give it more permissions

On iOS Strava’s app is able to access a photo picker, and the app only gets the photos I actually pick

Meanwhile WhatsApp insists on using the model where it tries to access all photos, and I limit it to specific ones via the OS


The more fine grained “only allow access to select photos” was introduced in iOS 14 and before that your only option was to ask for permissions for all photos. Not to say devs shouldn’t have converted by now but just to say it is possible they just implemented it that way at the time and never got around to updating rather than they really want the broader access.


Just checked and Android has this permission too: I can select "Allow limited access", but this requires manual configuring where you select specific photos/videos/albums to be accessible. It's so bizarre.


You're not providing a one-time access to the photo in this case, you're providing perpetual access to the uri.

If it loses access, it won't be able to display the media from your local storage. And of course, you wouldn't want it to duplicate the media because that'll take up extra storage.


Yeah on iOS facebook messenger app can be set to only access media one by one selected by the user


> possibly as a judge to get you to give it more permissions

judge or kludge?


oh, I can't edit now but that was supposed to say "nudge"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: