Hacker Newsnew | past | comments | ask | show | jobs | submit | epage's commentslogin

As I'm not familiar with the npm ecosystem so maybe I'm misunderstanding this but it sounds like they removed support for local publishes (via a token) in favor of CI publishing using Trusted Publishing.

If that is correct, I thought this was discussed when Trusted Publishing was proposed for Rust that it was not meant to replace local publishing, only harden CI publishing.


> If that is correct, I thought this was discussed when Trusted Publishing was proposed for Rust that it was not meant to replace local publishing, only harden CI publishing.

Yes, that's right, and that's how it was implemented for both Rust and Python. NPM seems to have decided to do their own thing here.

(More precisely, I think NPM still allows local publishing with an API token, they just won't grant long-lived ones anymore.)


I think the path to dependency on closed publishers was opened wide with the introduction of both attestations and trusted publishing. People now have assigned extra qualities to such releases and it pushes the ecosystem towards more dependency on closed CI systems such as github and gitlab.

It was a good intention, but the ramifications of it I don't think are great.


> It was a good intention, but the ramifications of it I don't think are great.

as always, the road to hell is paved with good intentions

the term "Trusted Publishing" implies everyone else is untrusted

quite why anyone would think Microsoft is considered trustworthy, or competent at operating critical systems, I don't know

https://firewalltimes.com/microsoft-data-breach-timeline/


> the term "Trusted Publishing" implies everyone else is untrusted

No, it just means that you're explicitly trusting a specific party to publish for you. This is exactly the same as you'd normally do implicitly by handing a CI/CD system a long-lived API token, except without the long-lived API token.

(The technique also has nothing to do with Microsoft, and everything to do with the fact that GitHub Actions is the de facto majority user demographic that needs targeting whenever doing anything for large OSS ecosystems. If GitHub Actions was owned by McDonalds instead, nothing would be any different.)


> This is exactly the same as you'd normally do implicitly by handing a CI/CD system a long-lived API token, except without the long-lived API token.

The other difference is being subjected to a whitelisting approach. That wasn't previously the case.

It's frustrating that seemingly every time better authentication schemes get introduced they come with functionality for client and third party service attestation baked in. All we ever really needed was a standardized way to limit the scope of a given credential coupled with a standardized challenge format to prove possession of a private key.


> The other difference is being subjected to a whitelisting approach. That wasn't previously the case.

You are not being subjected to one. Again: you can always use an API token with PyPI, even on a CI/CD platform that PyPI knows how to do Trusted Publishing against. It's purely optional.

> All we ever really needed was a standardized way to limit the scope of a given credential coupled with a standardized challenge format to prove possession of a private key.

That is what OIDC is. Well, not for a private key, but for a set of claims that constitute a machine identity, which the relying party can then do whatever it wants with.

But standards and interoperability don't mean that any given service will just choose to federate with every other service out there. Federation always has up-front and long-term costs that need to be balanced with actual delivered impact/value; for a single user on their own server, the actual value of OIDC federation versus an API token is nil.


Right, I meant that the new scheme is subject to a whitelist. I didn't mean to imply that you can't use the old scheme anymore.

> Federation always has up-front and long-term costs

Not particularly? For example there's no particular cost if I accept email from outlook today but reverse that decision and ban it tomorrow. I don't immediately see a technical reason to avoid a default accept policy here.

> for a single user on their own server, the actual value of OIDC federation versus an API token is nil.

The value is that you can do away with long lived tokens that are prone to theft. You can MFA with your (self hosted) OIDC service and things should be that much more secure. Of course your (single user) OIDC service could get pwned but that's no different than any other account compromise.

I guess there's some nonzero risk that a bunch of users all decide to use the same insecure OIDC service. But you might as well worry that a bunch of them all decide to use an insecure password manager.

> Well, not for a private key, but for a set of claims that constitute a machine identity

What's the difference between "set of claims" and "private key" here?

That last paragraph in GP was more a tangential rant than directly on topic BTW. I realize that OIDC makes sense here. The issue is that as an end user I have more flexibility and ease of use with my SSH keys than I do with something like a self hosted OIDC service. I can store my SSH keys on a hardware token, or store them on my computer blinded so that I need a hardware token or TPM to unlock them, or lots of other options. The service I'm connecting to doesn't need to know anything about my workflow. Whereas self hosting something like OIDC managing and securing the service becomes an entire thing on top of which many services arbitrarily dictate "thou shalt not self host".

It's a general trend that as new authentication schemes have been introduced they have generally included undesirable features from the perspective of user freedom. Adding insult to injury those unnecessary features tend to increase the complexity of the specification. In contrast, it's interesting to think how things might work if what we had instead was a single widely accepted challenge scheme such as SSH has. You could implement all manner of services such as OIDC on top of such a primitive while end users would retain the ability to directly use the equivalent of an SSH key.


> Not particularly? For example there's no particular cost if I accept email from outlook today but reverse that decision and ban it tomorrow. I don't immediately see a technical reason to avoid a default accept policy here.

Accepting email isn't really the same thing. I've linked some resources elsewhere in this thread that explain why OIDC federation isn't trivial in the context of machine identities.

> The value is that you can do away with long lived tokens that are prone to theft. You can MFA with your (self hosted) OIDC service and things should be that much more secure. Of course your (single user) OIDC service could get pwned but that's no different than any other account compromise.

You can already do this by self-attenuating your PyPI API token, since it's a Macaroon. We designed PyPI's API tokens with exactly this in mind.

(This isn't documented particularly well, since nobody has clearly articulated a threat model in which a single user runs their own entire attenuation service only to restrict a single or small subset of credentials that they already have access to. But you could do it, I guess.)

> What's the difference between "set of claims" and "private key" here?

A private key is a cryptographic object; a "set of claims" is (very literally) a JSON object that was signed over as the payload of a JWT. You can't sign (or encrypt, or whatever) with a set of claims naively; it's just data.


Thank you again for taking the time to walk through this stuff in detail. I think what happened (is happening) with this stuff is a slight communication issue. Some of us (such as myself) are quite jaded at this point when we see a "new and improved" solution with "increased security" that appears to even maybe impinge on user freedoms.

I was unaware that macaroons could be used like that. That's really neat and that capability clears up an apparent point of confusion on my part.

Upon reflection, it makes sense that preventing self hosting would be a desirable feature of attested publishing. That way the developer, builder, and distributor are all independent entities. In that case the registry explicitly vetting CI/CD pipelines is a feature, not a bug.

The odd one out is trusted publishing. I had taken it to be an eventual replacement for API tokens (consider the npm situation for why I might have thought this) thus the restrictions on federation seemed like a problem. However if it's simply a temporary middle ground along the path to attested publishing and there's a separate mechanism for restricting self managed API tokens then the overall situation has a much better appearance (at least to my eye).


I mean, if it meant the infrastructure operated under a franchising model with distributed admin like McD, it would look quite different!

There is more than one way to interpret the term "trusted". The average dev will probably take away different implications than someone with your expertise and context.

I don't believe this double meaning is an unfortunate coincidence but part of clever marketing. A semantic or ideological sleight of hand, if you will.

In the same category: "Trusted Computing", "Zero trust" and "Passkeys are phishing-resistant"


> I don't believe this double meaning is an unfortunate coincidence but part of clever marketing. A semantic or ideological sleight of hand, if you will.

I can tell you with absolute certainty that it really is just unfortunate. We just couldn’t come up with a better short name for it at the time; it was going to be either “Trusted Publishing” or “OIDC publishing,” and we determined that the latter would be too confusing to people who don’t know (and don’t care to know) what OIDC is.

There’s nothing nefarious about it, just the assumption that people would understand “trusted” to mean “you’re putting trust in this,” not “you have to use $vendor.” Clearly that assumption was not well founded.


Maybe signed publishing or verified publishing would have been better terms?

It’s neither signed or verified, though. There’s a signature involved, but that signature is over a JWT not over the package.

(There’s an overlaid thing called “attestations” on PyPI, which is a form of signing. But Trusted Publishing itself isn’t signing.)


Re signed - that is a fair point, although it raises the question, why is the distributed artifact not cryptographically authenticated?

Maybe I'm misunderstanding but I thought the whole point of the exercise was to avoid token compromise. Framed another way that means the goal is authentication of the CI/CD pipeline itself, right? Wouldn't signing a fingerprint be the default solution for that?

Unless there's some reason to hide the build source from downstream users of the package?

Re verified, doesn't this qualify as verifying that the source of the artifact is the expected CI/CD pipeline? I suppose "authenticated publishing" could also work for the same reason.


> why is the distributed artifact not cryptographically authenticated?

With what key? That’s the layer that “attestations” add on top, but with Trusted Publishing there’s no user/package—associated signature.

> Maybe I'm misunderstanding but I thought the whole point of the exercise was to avoid token compromise. Framed another way that means the goal is authentication of the CI/CD pipeline itself, right? Wouldn't signing a fingerprint be the default solution for that?

Yes, the goal is to authenticate the CI/CD pipeline (what we’d call a “machine identity”). And there is a signature involved, but it only verifies the identity of the pipeline, not the package being uploaded by that pipeline. That’s why we layer attestations on top.

(The reasons for this are unfortunately nuanced but ultimately boil down to it being hard to directly sign arbitrary inputs with just OIDC in a meaningful way. I have some slides from talks I gave in the past that might help clarify Trusted Publishing, the relationship with signatures/attestations, etc.[1][2])

> I suppose "authenticated publishing" could also work for the same reason.

I think this would imply that normal API token publishing is somehow not authenticated, which would be really confusing as well. It’s really not easy to come up with a name that doesn’t have some amount of overlap with existing concepts, unfortunately.

[1]: https://yossarian.net/res/pub/packagingcon-2023.pdf

[2]: https://yossarian.net/res/pub/scored-2023.pdf


> imply that normal API token publishing is somehow not authenticated

Fair enough, although the same reasoning would imply that API token publishing isn't trusted ... well after the recent npm attacks I suppose it might not be at that.

> With what key?

> And there is a signature involved,

So there's already a key involved. I realize its lifetime might not be suitable but presumably the pipeline itself either already possesses or could generate a long lived key to be registered with the central service.

> but it only verifies the identity of the pipeline,

I thought verifying the identity of the pipeline was the entire point? The pipeline singing a fingerprint of the package would enable anyone to verify the provenance of the complete contents (either they'd need a way to look up the key or you could do TOFU but I digress). There's value in being able to verify the integrity of the artifacts in your local cache.

Also, the more independent layers of authentication there are the fewer options an attacker will have. A hypothetical artifact that carried signatures from the developer, the pipeline, and the registry would have a very clear chain of custody.

> it being hard to directly sign arbitrary inputs with just OIDC in a meaningful way

At the end of the day you just need to somehow end up in a situation where the pipeline holds a key that has been authenticated by the package registry. From that point on I'd think that the particular signature scheme would become a trivial implementation detail; you stuff the output into some json or something similar and get on with life.

Has some key complexity gone over my head here?

BTW please don't take this the wrong way. It's not my intent to imply that I know better. As long as the process works it isn't my intent to critique it. I was just honestly surprised to learn that the package content itself isn't signed by the pipeline to prove provenance for downstream consumers and from there I'm just responding to the reasoning you gave. But if the current process does what it set out to do then I've no grounds to object.


> So there's already a key involved. I realize its lifetime might not be suitable but presumably the pipeline itself either already possesses or could generate a long lived key to be registered with the central service.

The key involved is the OIDC IdP's key, which isn't controlled by the maintainer of the project. I think it would be pretty risky to allow this key to directly sign for packages, because this would imply that any party that can use that key for signing can sign for any package. This would mean that any GitHub Actions workflow anywhere would be one signing bug away from impersonating signatures for every PyPI project, which would be exceedingly not good. It would also make the insider risk from a compromised CI/CD provider much larger.

(Again, I really recommend taking a look at the talks I linked. Both Trusted Publishing and attestations were multi-year projects that involved multiple companies, cryptographers, and implementation engineers, and most of your - very reasonable! - questions came up for us as well while designing and planning this work.)

> I thought verifying the identity of the pipeline was the entire point? The pipeline singing a fingerprint of the package would enable anyone to verify the provenance of the complete contents (either they'd need a way to look up the key or you could do TOFU but I digress). There's value in being able to verify the integrity of the artifacts in your local cache.

There are two things here:

1. Trusted Publishing provides a verifiable link between a CI/CD provider (the "machine identity") and a packaging index. This verifiable link is used to issue short-lived, self-scoping credentials. Under the hood, Trusted Publishing relies on a signature from the CI/CD provider (which is an OIDC IdP) to verify that link, but that signature is only over a set of claims about the machine identity, not the package identity.

2. Attestations are a separate digital signing scheme that can use a machine identity. In PyPI's case, we bootstrap trust in a given machine identity by seeing if a project is already enrolled against a Trusted Publisher that matches that identity. But other packaging ecosystems may do other things; I don't know how NPM's attestations work, for example. This digital signing scheme uses a different key, one that's short-lived and isn't managed by the IdP, so that signing events can be made transparent (in the "transparency log" sense) and are associated more meaningfully with the machine identity, not the IdP that originally asserted the machine identity.

> At the end of the day you just need to somehow end up in a situation where the pipeline holds a key that has been authenticated by the package registry. From that point on I'd think that the particular signature scheme would become a trivial implementation detail; you stuff the output into some json or something similar and get on with life.

Yep, this is what attestations do. But a key piece of nuance: the pipeline doesn't "hold" a key per se, it generates a new short-lived key on each run and binds that key to the verified identity sourced from the IdP. This achieves the best of both worlds: users don't need to maintain a long-lived key, and the IdP itself is only trusted as an identity source (and is made auditable for issuance behavior via transparency logging). The end result is that clients that verify attestations don't verify using a specific key; the verify using an identity, and ensure that any particular key matches that identity as chained through an X.509 CA. That entire process is called Sigstore[1].

And no offense taken, these are good questions. It's a very complicated system!

[1]: https://www.sigstore.dev


> I think it would be pretty risky to allow this key to directly sign for packages, because this would imply that any party that can use that key for signing can sign for any package.

There must be some misunderstanding. For trusted publishing a short lived API token is issued that can be used to upload the finished product. You could instead imagine negotiating a key (ephemeral or otherwise) and then verifying the signature on upload.

Obviously the signing key can't be shared between projects any more than the API token is. I think I see where the misunderstanding arose now. Because I said "just verify the pipeline identity" and you interpreted that as "let end users get things signed by a single global provider key" or something to that effect, right?

The only difference I had intended to communicate was the ability of the downstream consumer to verify the same claim (via signature) that the registry currently verifies via token. But it sounds like that's more or less what attestation is? (Hopefully I understood correctly.) But that leaves me wondering why Trusted Publishing exists at all. By the time you've done the OIDC dance why not just sign the package fingerprint and be done with it? ("We didn't feel like it" is of course a perfectly valid answer here. I'm just curious.)

I did see that attestation has some other stuff about sigstore and countersignatures and etc. I'm not saying that additional stuff is bad, I'm asking if Trusted Publishing wouldn't be improved by offering a signature so that downstream could verify for itself. Was there some technical blocker to doing that?

> the IdP itself is only trusted as an identity source

"Only"? Doesn't being an identity source mean it can do pretty much anything if it goes rogue? (We "only" trust AD as an identity source.)


> There must be some misunderstanding. For trusted publishing a short lived API token is issued that can be used to upload the finished product. You could instead imagine negotiating a key (ephemeral or otherwise) and then verifying the signature on upload.

From what authority? Where does that key come from, and why would a verifying party have any reason to trust it?

(I'm not trying to be tendentious, so sorry if it comes across that way. But I think you're asking good questions that lead to the design that we arrived at with attestations.)

> I did see that attestation has some other stuff about sigstore and countersignatures and etc. I'm not saying that additional stuff is bad, I'm asking if Trusted Publishing wouldn't be improved by offering a signature so that downstream could verify for itself. Was there some technical blocker to doing that?

The technical blocker is that there's no obvious way to create a user-originated key that's verifiably associated with a machine identity, as originally verified from the IdP's OIDC credential. You could do something like mash a digest into the audience claim, but this wouldn't be very auditable in practice (since there's no easy way to shoehorn transparency atop that). But some people have done some interesting exploration in that space with OpenPubKey[1], and maybe future changes to OIDC will make something like that more tractable.

> "Only"? Doesn't being an identity source mean it can do pretty much anything if it goes rogue? (We "only" trust AD as an identity source.)

Yes, but that's why PyPI (and everyone else who uses Sigstore) mediates its use of OIDC IdPs through a transparency logging mechanism. This is in effect similar to the situation with CAs on the web: a CA can always go rogue, but doing so would (1) be detectable in transparency logs, and (2) would get them immediately evicted from trust roots. If we observed rogue activity from GitHub's IdP in terms of identity issuance, the response would be similar.

[1]: https://github.com/openpubkey/openpubkey


Okay. I see the lack of benefit now but regardless I'll go ahead and respond to clear up some points of misunderstanding (and because the topic is worthwhile I think).

> From what authority?

The registry. Same as the API token right now.

> The technical blocker is that there's no obvious way to create a user-originated key

I'm not entirely clear on your meaning of "user originated" there. Essentially I was thinking something equivalent to the security of - pipeline generates ephemeral key and signs { key digest, package name, artifact digest }, registry auth server signs the digest of that signature (this is what replaces the API token), registry bulk data server publishes this alongside the package artifact.

But now I'm realizing that the only scenario where this offers additional benefit is in the event that the bulk data server for the registry is compromised but the auth server is not. I do think there's some value in that but the much simpler alternative is for the registry to tie all artifacts back to a single global key. So I guess the benefit is quite minimal. With both schemes downstream assumes that the registry auth server hasn't been compromised. So that's not great (but we already knew that).

That said, you mention IdP transparency logging. Could you not add an arbitrary residue into the log entry? An auth server compromise would still be game over but at least that way any rogue package artifacts would conspicuously be missing a matching entry in the transparency log. But that would require the IdP service to do its own transparency logging as well ... yeah this is quickly adding complexity for only very minimal gain.

Anyway. Hypothetical architectures aside, thanks for taking the time for the detailed explanations. Though it wasn't initially clear to me the rather minimal benefit is more than enough to explain why this general direction wasn't pursued.

If anything I'm left feeling like maybe the ecosystems should all just switch directly to attested publishing.


Thanks for replying.

I'm certainly not meaning to imply that you are in on some conspiracy or anything - you were already in here clarifying things and setting the record straight in a helpful way. I think you are not representative of industry here (in a good way).

Evangelists are certainly latching on to the ambiguity and using it as an opportunity. Try to pretend you are a caveman dev or pointy-hair and read the first screenful of this. What did you learn?

https://github.blog/changelog/2025-07-31-npm-trusted-publish...

https://learn.microsoft.com/en-us/nuget/nuget-org/trusted-pu...

https://www.techradar.com/pro/security/github-is-finally-tig...

These were the top three results I got when I searched online for "github trusted publishing" (without quotes like a normal person would).

Stepping back, could it be that some stakeholders have a different agenda than you do and are actually quite happy about confusion?

I have sympathy for that naming things is hard. This is Trusted Computing in repeat but marketed to a generation of laymen that don't have that context. Also similar vibes to the centralization of OpenID/OAuth from last round.

On that note, looking at past efforts, I think the only way this works out is if it's open for self-managed providers from the start, not by selective global allowlisting of blessed platform partners one by one on the platform side. Just like for email, it should be sufficient with a domain name and following the protocol.


> People now have assigned extra qualities to such releases and it pushes the ecosystem towards more dependency on closed CI systems such as github and gitlab.

I think this is unfortunately true, but it's also a tale as old as time. I think PyPI did a good job of documenting why you shouldn't treat attestations as evidence of security modulo independent trust in an identity[1], but the temptation to verify a signature and call it a day is great for a lot of people.

Still, I don't know what a better solution is -- I think there's general agreement that packaging ecosystems should have some cryptographically sound way for responsible parties to correlate identities to their packages, and that previous techniques don't have a great track record.

(Something that's noteworthy is that PyPI's implementation of attestations uses CI/CD identities because it's easy, but that's not a fundamental limitation: it could also allow email identities with a bit more work. I'd love to see more experimentation in that direction, given that it lifts the dependency on CI/CD platforms.)

[1]: https://docs.pypi.org/attestations/security-model/


Rust and Python appear to still long lived ones so it's only a matter of time until they get the same issues it would seem?

For whatever reason, we haven't seen the same degree of self-perpetuating credential disclosure in either Rust or Python as an ecosystem. Maybe that trend won't hold forever, but that's the distinguishing feature here.

I think the switch statement design is a foot gun: defaults to fall-through when empty and break when there is a body.

https://c3-lang.org/language-overview/examples/#enum-and-swi...


This feels very natural though, in a "principle of least surprise" kinda way. This is what you'd expect a properly designed switch statement to do.

Least surprise to who? Are there any other mainstream languages that behave this way?

I think consistency is the best correlate of least surprise, so having case statements that sometimes fall though, sometimes not, seems awful.


Swift

Sadly many of us are so used to the C fall-through behavior that this would be quite a surprise.

Personally, I'd rather see a different syntax switch (perhaps something like the Java pattern switch) or no switch at all than one that looks the same as in all C-style languages but works just slightly differently.


It reads naturally but I can see people getting tripped up writing this. Worse for changing existing code. Refactor in a way that removes a body? Likely forget to add a breake

If I aimed and shot a gun at my foot and a bullet didn’t go through it, I would trash the gun.

To be fair, I would probably toss the gun away if a bullet went through my foot.

Er, Forth guy as a, would foot trash I the and bullet gun the. Word!

I agree it's not the best choice. I mean it's true that you almost always want fall-through when the body is empty and break where it isn't but maybe it would be better to at least require explicit break (or fall-through keyword) and just make it a compiler error if one is missing and the body is not empty. That would be the least surprising design imo.

I think we should provide the building blocks (display, etc like derive_more) rather than a specialized version one for errors (thiserror).

I also feel thiserror encourages a public error enum which to me is an anti-pattern as they are usually tied to your implementation and hard to add context, especially if you have a variants for other error types.


I don't quite understand the issue about public error enums? Distinguishing variants is very useful if some causes are recoverable or - when writing webservers - could be translated into different status codes. Often both are useful, something representing internal details for logging and a public interface.

I agree. Is he really trying to say that e.g. errors for `std::fs::read()` should not distinguish between "file not found" and "permission denied"? It's quite common to want to react to those programmatically.

IMO Rust should provide something like thiserror for libraries, and also something like anyhow for applications. Maybe we can't design a perfect error library yet, but we can do waaay better than nothing. Something that covers 99% of uses would still be very useful, and there's plenty of precedent for that in the standard library.


I doubt epage is suggesting that. And note that in that case, the thing distinguishing the cause is not `std::io::Error`, but `std::io::ErrorKind`. The latter is not the error type, but something that forms a part of the I/O error type.

It's very rare that `pub enum Error { ... }` is something I'd put into the public API of a library. epage is absolutely correct that it is an extensibility hazard. But having a sub-ordinate "kind" error enum is totally fine (assuming you mark it `#[non_exhaustive]`).


It's not uncommon to have it on the error itself, rather than a details/kind auxiliary type. AWS SDK does it, nested even [0][1], diesel[2], password_hash[3].

[0] https://docs.rs/aws-smithy-runtime-api/1.9.3/aws_smithy_runt... [1] https://docs.rs/aws-sdk-s3/1.119.0/aws_sdk_s3/operation/get_... [2] https://docs.rs/diesel/2.3.5/diesel/result/enum.Error.html [3] https://docs.rs/password-hash/0.5.0/password_hash/errors/enu...


Sure, I've done it too: https://docs.rs/ignore/latest/ignore/enum.Error.html

It's not necessarily about frequency, but about extensibility. There's lot of grey area there. If you're very certain of your error domain and what kinds of details you want to offer, then the downside of a less extensible error type may never actualize. Similarly, if you're more open to more frequent semver incompatible releases, then the downside also may never actualize.


Why is it an extensibility hazard (assuming you mark the `pub enum Error` as non-exhaustive)?

I mean I don't see the difference between having the non-exhaustive enum at the top level vs in a subordinate 'kind'.


Take the example at https://docs.rs/thiserror/latest/thiserror/

- Struct variant fields are public, limiting how you evolve the fields and types

- Struct variants need non_exhaustive

- It shows using `from` on an error. What happens if you want to include more context? Or change your impl which can change the source error type

None of this is syntactically unique to errors. This becomes people's first thought of what to do and libraries like thiserror make it easy and showcase it in their docs.


> Struct variant fields are public, limiting how you evolve the fields and types

But the whole point of thiserror style errors is to make the errors part of your public API. This is no different to having a normal struct (not error related) as part of your public API is it?

> Struct variants need non_exhaustive

Can't you just add that tag? I dunno, I've never actually used thiserror.

Your third point makes sense though.


> > Struct variant fields are public, limiting how you evolve the fields and types

> But the whole point of thiserror style errors is to make the errors part of your public API. This is no different to having a normal struct (not error related) as part of your public API is it?

Likely you should use private fields with public accessors so you can evolve it.


Having private variants and fields would be useful, yeah. Std cheats a bit with its ErrorKind::Uncategorized unstable+hidden variant to have something unmatchable.

Probably because non-exhaustive enums didn't exist when it was written?

Yeah I used the weasel-y "something like" for exactly these reasons. :-)

> It may have been nice to expose some reasonable defaults for code coverage measurements too.

Would love built in coverage support but investigation is needed on the design (https://github.com/rust-lang/cargo/issues/13040) and we likely need to redo how we handle doctests (https://blog.rust-lang.org/inside-rust/2025/10/01/this-devel...).


Thank you for the context!

> uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still.

Isn't assigning out what all made things fast presumptive without benchmarks? Yes, I imagine a lot is gained by the work of those PEPs. I'm more questioning how much weight is put on dropping of compatibility compared to the other items. There is also no coverage for decisions influenced by language choice which likely influences "Optimizations that don’t need Rust".

This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.


> Isn't assigning out what all made things fast presumptive without benchmarks?

We also have the benchmark of "pip now vs. pip years ago". That has to be controlled for pip version and Python version, but the former hasn't seen a lot of changes that are relevant for most cases, as far as I can tell.

> This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.

This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML. A package installer really only needs to look at it at all when building from source, and part of what these standards have done for us is improve wheel coverage. (Other relevant PEPs here include 600 and its predecessors.) Although that has also largely been driven by education within the community, things like e.g. https://blog.ganssle.io/articles/2021/10/setup-py-deprecated... and https://pradyunsg.me/blog/2022/12/31/wheels-are-faster-pure-... .


> This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML.

I don't know the details of Python's resolution algorithm, but for Cargo (which is where epage is coming from) a lockfile (which is encoded in TOML) can be somewhat large-ish, maybe pushing 100 kilobytes (to the point where I'm curious if epage has benchmarked to see if lockfile parsing is noticeable in the flamegraph).


But once you have a lock file there is no resolution needed, is there? It lists all needed libs and their versions. Given how toml is written, I imagine you can read it incrementally - once a lib section is parsed, you can download it in parallel, even if you didn't parse the whole file yet.

(not sure how uv does it, just guessing what can be done)


For Cargo,

- synchronization operations are implicit so we need to re-resolve to confirm the lockfile is still valid. We could take some short cut but it would require re-implementing some logic

- dependency resolution only uses `Cargo.toml` for local and git dependencies. Registry dependencies have a json summary of what content is relevant for dependency resolution. Cargo parses nearly every locked package's `Cargo.toml` to know how to build it.


For whatever it's worth, the toml library uv uses doesn't support streaming parsing: https://github.com/toml-rs/toml/issues/326

I'm not sure if it even makes sense for a TOML file to be "read incrementally", because of the weird feature of TOML (inherited from INI conventions) that allow tables to be defined in a piecemeal, out-of-order fashion. Here's an example that the TOML spec calls "valid, but discouraged":

    [fruit.apple]
    [animal]
    [fruit.orange]
So the only way to know that you have all the keys in a given table is to literally read the entire file. This is one of those unfortunate things in TOML that I would honestly ignore if I were writing my own TOML parser, even if it meant I wasn't "compliant".

I don't think that's worse than having to search an arbitrary distance for a matching closing bracket. There are tasks where you can start working knowing that a given array in the data might be appended to later (similarly for objects).

It's worse than having to parse a matching bracket, because any context where you have an item defined via nested brackets is going to be a subset of this use case. But yes, that doesn't mean you couldn't do some theoretical eager processing, but it's going to be context dependent. For example, consider a Cargo.toml file, where we've processed the `features` key for a given dependency. Is it safe to begin compiling that dependency with the given set of features before we finish parsing the file? No, because there might be a `default-features=false` key that applies to this dependency later in the file. In a format where tables weren't allowed to be split, the mere act of parsing a single, self-contained dependency entry would be enough to know for certain that no such `default-features` key exists. Not all potential keys are going to require this sort of consideration, but it could be a footgun depending on the semantics of your schema.

TOML as a format doesn't make sense for streaming

- Tables can be in any order, independent of heirarchy

- keys can be dotted, creating subtables in any order

On top of that, most use cases for the format are not benefitted by streaming.


Lockfiles aren't an issue. It is all the dependencies themselves.

To be fair, the whole post isn't very good IMO, regardless of ChatGPT involvement, and it's weird how some people seem to treat it like some kind of revelation.

I mean, of course it wasn't specifically Rust that made it fast, it's really a banal statement: you need only very moderate serious programming experience to know, that rewriting legacy system from scratch can make it faster even if you rewrite it in a "slower" language. There have been C++ systems that became faster when rewritten in Python, for god's sake. That's what makes system a "legacy" system: it does a ton of things and nobody really knows what it does anymore.

But when listing things that made uv faster it really mentions some silly things, among others. Like, it doesn't parse pip.conf. Right, sure, the secret of uv's speed lies in not-parsing other package manager's config files. Great.

So all in all, yes, no doubt that hundreds of little things contributed into making uv faster, but listing a few dozens of them (surely a non-exhaustive lists) doesn't really enable you to make any conclusions about the relative importance of different improvements whatsoever. I suppose the mentioned talk[0] (even though it's more than a year old now) would serve as a better technical report.

[0] https://www.youtube.com/watch?v=gSKTfG1GXYQ


That's true. Probably not compiling and using HTTP Range made a big difference, but we don't know, do we? Quantifying these differences, rather than just listing them, would make the post much more valuable.

Is there a way to check the DRM status before purchase?


This is a Rust port of the Ruby implementation which was popularized by Jekyll, so it just does whatever was done in the Ruby implementation.

For an overview of liquid itself, see https://shopify.github.io/liquid/


Looks like this does compile time templating. There are reasons to explore using those over runtime templating but they don't cover all of the use cases.


Huh, of all of my projects, I never expected this to make it to HN. Not the original author but the maintainer, so ask away.

Unfortunately, I've not had as much time to devote to this for a while. There are also a lot of weird cases that aren't too well defined or that the Ruby implementation shines through that also have me dissatisfied with Liquid. That and wanting to make it easier to extend with an embedded language. A way to solve this and my annoyance is to create a template language using an existing embedded language. I'm tempted by koto because it already has something like `|` operator (`->`) and isn't just "Rust, but embeddable" (Rust is an implementation detail; it shouldn't bleed through).

[0] https://koto.dev/


Pulling over my reddit comment

> Providing a mechanism to manually disable individual default features when specifying a dependency

We want this on the Cargo team; somebody just needs to do the design work, see https://github.com/rust-lang/cargo/issues/3126

Note that there is a related problem of feature evolution. Can you take existing functionality and move it behind a feature? No because that would be a breaking change for `default-features = false`. On the surface, it sounds like disabling default features works around that and we can remove `default-features = false` in an Edition but that is in the dependents edition and doesn't control the dependencies edition. We need something else to go with this.

> Providing a less verbose way for libraries to expose the features of their direct dependencies to other packages that depend on them directly

> Providing a way to disable features from transitive dependencies

I'm tying these two together because I wonder how well the same solution would work for this: maybe not everything should be a feature but a ["global"](https://internals.rust-lang.org/t/pre-rfc-mutually-excusive-...). Our idea for solving mutually exclusive features is to provide a key-value pair that a package defines and sets a default on and then the final application can override it. Maybe we'll eventually allow toolkits / sdks to also override but that can run into unification issues and is a lower priority.

Otherwise, I think we'd need to dig into exact use cases to make sure we have a good understanding of what kinds of features in what situations that libraries need to re-export before figuring out what design is appropriate for making large scale feature management manageable.

> "Zero-config" features that allow enabling/disabling code in a library without the author having to manually define it

We have `hints.mostly-unused` (https://doc.rust-lang.org/cargo/reference/unstable.html#prof...) which defers parts of the compilation process for most of a package until we know whether it is needed. Currently, it is a mixed bag. Explicit features still provide noticeable benefits. It is mostly for packages like `windows-sys` and `aws-sdk`.


Thanks for the thoughtful response! I also had not been aware someone posted this to reddit, so I appreciate the heads-up on that as well.

I was not aware of at least some of the potential work being considered on these things, so I'm definitely going to start reading more about each of the things you've linked over the next couple of days. Somehow I was under the false impression that there wasn't much appetite for the types of changes I've been hoping for, so all of this is awesome to hear!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: