More

martinald · 2026-02-08T00:44:01 1770511441

I think it's just routing to faster hardware:

H100 SXM: 3.35 TB/s HBM3

GB200: 8 TB/s HBM3e

2.4x faster memory - which is exactly what they are saying the speedup is. I suspect they are just routing to GB200 (or TPU etc equivalents).

FWIW I did notice _sometimes_ recently Opus was very fast. I put it down to a bug in Claude Code's token counting, but perhaps it was actually just occasionally getting routed to GB200s.

sothatsit · 2026-02-08T09:01:19 1770541279

Dylan Patel did analysis that suggests lower batch size and more speculative decoding leads to 2.5x more per-user throughput for 6x the cost for open models [0]. Seems plausible this could be what they are doing. We probably won't get to know for sure any time soon.

Regardless, they don't need to be using new hardware to get speedups like this. It's possible you just hit A/B testing and not newer hardware. I'd be surprised if they were using their latest hardware for inference tbh.

[0] https://nitter.net/dylan522p/status/2020302299827171430

martinald · 2026-02-08T00:38:47 1770511127

Hmm not sure I agree with you there entirely. You're right there's queues to ensure that you max out the hardware with concurrent batches to _start_ inference, but I doubt you'd want to split up the same job into multiple bits and move them around servers if you could at all avoid it.

It requires a lot of bandwidth to do that and even at 400gbit/sec it would take a good second to move even a smaller KV cache between racks even in the same DC.

martinald · 2026-02-06T22:34:25 1770417265

You're definitely right on (2) and (3). I've used many transit systems across the world (including TransMilenio in Bogota and other latam countries "renowned" for crime) and I have never felt as unsafe as I have using transit in the SFBA. Even standing at bus stops draws a lot of attention from people suffering with serious addiction/mental health problems.

1) is a bit simplistic though. I don't know of any European system that would cover even operating costs out of fare/commercial revenue. Potentially the London Underground - but not London buses. UK National Rail had higher success rates

The better way to look at it imo is looking at the economic loss as well of congestion/abandoned commutes. To do a ridiculous hypothetical, London would collapse entirely if it didn't have transit. Perhaps 30-40% of inner london could commute by car (or walk/bike), so the economic benefit of that variable transit cost is in the hundreds of billions a year (compared to a small subsidy).

It's not the same in SFBA so I guess it's far easier to just "write off" transit like that, it is theoretically possible (though you'd probably get some quite extreme additional congestion on the freeways as even that small % moving to cars would have an outsized impact on additional congestion).

scoofy · 2026-02-07T00:41:36 1770424896

>The better way to look at it imo is looking at the economic loss as well of congestion/abandoned commutes. To do a ridiculous hypothetical, London would collapse entirely if it didn't have transit.

You're making my argument for me. Again, my concern isn't the day-to-day conveniences of funding, my point is that building a fragile system (a system where the funding is unrelated to the usability of the service) is a system that can fail catastrophically... for systems where there are obviously alternatives (say, National Rail which can be substituted for automobile, bus, and airplane service) are less to worry about, because their failure will likely not cause cascading failures. When an entire local economy is dependent on that system -- when there are not viable substitutes -- then you're really looking at a sudden economic collapse if the funding source runs dry, or if the system is ever mismanaged.

This is a big deal. When funding really actually does run out and the system fails, then if the result is an economic cascade into a full blown depression, then you would have been much better off just building the robust system in the long term. I just really don't think people appreciate how systems can just fail. Whether it's Detroit or Caracas, when the economic tides turn in a fragile system people can lose everything in a matter of a few years.

martinald · 2026-02-07T01:32:39 1770427959

But my point is that noone has a robust system according to you in Europe at least - the bar is so high to cover all operating costs with fares (or is that your point - if so I'm lost - I definitely would not recommend replacing European transit networks with nothing?).

And National Rail isn't replaceable at all with bus/cars/planes. You really underestimate the number of people which commute >1hr into London (100km+). There is just no way to do that journey by car or bus. It would turn a ~1hr commute into a 3hr _each way_ and that's not even considering the complete lack of parking OR the fact suddenly the roads would be at (even more) gridlock with many multiples of commuters.

That's not even getting into what you consider fixed vs variable costs. Are the trains themselves a fixed cost (they should last 30-40 years)? Is track maintenance a fixed cost (this has to be done more often than the trains themselves), etc etc. The 2nd point is very important - a lot of rail operators in the UK can be made profitable or not on your metric by how much the government subsidises track maintenance vs the operators paying for it in track access charges.

Equally, are signalling upgrades (for example) fixed costs? But really they are only required to run more frequent services. So you could argue they are a variable cost?

scoofy · 2026-02-07T04:13:11 1770437591

>Are the trains themselves a fixed cost (they should last 30-40 years)?

Yes

>Is track maintenance a fixed cost (this has to be done more often than the trains themselves)

Yes

>Equally, are signalling upgrades (for example) fixed costs?

Yes

Fixed costs are the costs that don't go away when the passengers go away. Variable cost, typically labor, go away when you don't actually need that additional marginal train. You still have to amortize that train even if it's not on the tracks. You still need to buy that marginal train when the service levels require it. You still have to do track maintenance even when you're not running trains (though, yes, at the very margin there could be some small rate adjustments). When you want to upgrade the signals, it's basically the definition of a fixed cost, because you do it once and it's done.

>And National Rail isn't replaceable at all with bus/cars/planes. You really underestimate the number of people which commute >1hr into London (100km+). There is just no way to do that journey by car or bus. It would turn a ~1hr commute into a 3hr _each way_ and that's not even considering the complete lack of parking OR the fact suddenly the roads would be at (even more) gridlock with many multiples of commuters.

I don't want to speak to National Rail or British Rail that preceded it. I want to stick to the transit system that I know well.

My point here isn't that money shouldn't be spent on "getting things back in shape" here is where I waffle on the "pay for fixed capital costs and mostly have the marginal variable costs covered by the marginal rider." If a system needs the occasional cash infusion, I'm fine with that, as long as it comes with new leadership.

My concern here is that, in the Bay Area, many, many people are eager to pay $25 for a Waymo to pick them up (they are NOT cheap) while Muni costs $3 (a near 10x increase in cost). When folks are willing to pay that much of a premium, then something is very wrong with the transit system. Muni has had zero enforcement of their code of conduct for decades. When you have a system that are large section of the populous actively avoid when it's perfectly convenient, then something is very wrong with the system.

When I see BART stations that look like abandoned parking lots surrounded by single family home sprawl, then it doesn't surprise me that the system is not sustainable. The stations that may get removed are all in areas that require people to drive, to then take the train, instead of the cities zoning density and retail around the train stations. When I yell at the occasional people smoking in BART stations and I go to tell the station attendant and get a shrug back -- even when we are paying for them to have their own police force -- that's why they are failing. These are political choices that BART has made in how they operate their service

These systems aren't even doing the bare minimum in providing a reliable pleasant service, so people stop using them, and that makes sense. The entire point is that these services should be relatively inexpensive to operate because of economies of scale, but when you don't actually make people pay, when you don't actually ask people to behave like responsible adults, when your running the service like a failing business then we should expect the service to fail, and when it does, when bailouts are needed, they should (and often do) come with strings attached. BART now has gates that stop most turnstile jumping... and they were forced to be installed by the state of California as part of their second bailout. The reason I'm harping on having variable costs attached to ridership is exactly because the systems needs to be forced to respond when a sizable amount of people no longer find the service valuable.

This is about sustainability, because the marginal tax dollar is better spend on something like providing people with the healthcare they need than it is providing people a bus service they're not even willing to actually use.

martinald · 2026-02-06T17:04:03 1770397443

Hi, author here. Sorry if I skipped over the "evidence". If you read/watch financial news, every single outlet was claiming that it was caused by this legal tool launch. I was commenting on that - somewhat in jest. I'll update the article to make it clearer what I was trying to get at.

Thanks for the feedback, it was front page of the FT and I need to remember not everyone reads financial news!

martinald · 2026-02-05T16:08:13 1770307693

Hey Robin, big fan of this map, congrats on getting front page on HN :).

Two suggestions/questions if you may:

1) Would be good to see how many MW each boundary can handle, not just %? Btw, I can't see the number for the south east england boundary.

2) Great job on the battery info. I'm seeing some battery storage is curtailed. How is that possible? Please don't tell me that we are paying batteries to _not_ export :/?

robhawkes · 2026-02-05T16:40:03 1770309603

Hey! And thank you

1) Absolutely agree. The current approach for the boundaries is a quick hack until I can implement something more sophisticated. Safe to say an update is already in the works that adds a MW value and more insight into the state of each boundary (and is also more accurate in general)

2) "Please don't tell me that we are paying batteries to _not_ export" – it's actually the opposite, the batteries paid to not export (at least today). You can dig into this yourself via the Detailed System Prices dataset [0] and looking at one of the batteries on the sell stack (eg. KILSB-5)

[0] https://bmrs.elexon.co.uk/detailed-system-prices

pjc50 · 2026-02-05T16:48:06 1770310086

> "Please don't tell me that we are paying batteries to _not_ export" – it's actually the opposite, the batteries paid to not export (at least today).

Unfortunately you've got at least one negative wrong in this sentence and I'm still confused, and the linked dataset is currently blank?

Sorry for complaining, this is a great website!

robhawkes · 2026-02-05T16:53:49 1770310429

No worries. The Detailed System Prices dataset is lagged by a couple hours so try going back in time.

The simplest answer I can give is that assets place bids and a volume of energy that they are willing to turn down if the system operator needs to. Those bids are either positive or negative in value and this depends a lot on the type of asset, for example wind assets usually bid negative (ie. we pay them to turn down) while gas assets usually bid positive (ie. they pay us to turn down). The reason for that is a lot to do with complexities of the market and also the cost of running that assets, the cost of fuels, etc.

martinald · 2026-02-05T18:24:00 1770315840

Ok, got it I think.

So actually the battery operator is paying _the grid_ to turn down output from whatever was previously agreed (because they think they'd get more money for it later?).

And this shows as curtailment on the map?

Let me know if I'm directionally right here. If I am, it would be good to see 'bad curtailment' vs 'good curtailment' (i assume if bids are negative/positive?)

martinald · 2026-02-02T01:59:09 1769997549

Totally agree, though ironically Claude code works way better with Excel than I expected.

I even tried telling Copilot to convert each sheet to a CSV on one attempt THEN do calculations. It just ignored it and failed miserably, ironically outputting me a list of files that it should have made, along with the broken python script. I found this very amusing.

mattmanser · 2026-02-02T07:58:58 1770019138

Think why an LLM might really struggle with csvs. You're asking a chat bot to make a weird sentence with tons of commas in it.

I've read that they're supposed to be great with XML as it's so structured, better than JSON, but haven't actually found that to be the case.

martinald · 2026-02-02T01:55:29 1769997329

It compacted at least twice but continued with no real issues.

Anyway, please try it if you find it unbelievable. I didn't expect it to work FWIW like it did. Opus 4.5 is pretty amazing at long running tasks like this.

moregrist · 2026-02-02T02:11:22 1769998282

I think the skepticism here is that without tests or a _lot_ of manual QA how would you know that it did it correctly?

Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.

Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.

skybrian · 2026-02-02T06:02:10 1770012130

If you’re porting some formulas from one language to another, “correct” can be defined as “gets the same answers as before.” Assuming you can run both easily, this is easy to write a property test for.

Sure, maybe that’s just building something that’s bug-for-bug compatible, but it’s something Claude can work with.

gregoryl · 2026-02-02T08:10:37 1770019837

For starters, Python uses IEEE 754, and Excel uses IEEE 754 (with caveats). I wonder if that's being emulated.

stavros · 2026-02-02T02:19:26 1769998766

I generally agree with you, but I tried to get it to modernize a fairly old SaaS codebase, and it couldn't. It had all the code right there, all it had to do was change a few lines, upgrade a few libraries, etc, but it kept getting lots of things wrong. The HTML was wrong, the CSS was completely missing, basic views wouldn't work, things like that.

I have no idea why it had so much trouble with this generally easy task. Bizarre.

martinald · 2026-02-02T01:43:56 1769996636

That's exactly what it did (author here).

majormajor · 2026-02-02T01:49:32 1769996972

I'm having trouble reconciling "30 sheet mind numbingly complicated Excel financial model" and "Two or three prompts got it there, using plan mode to figure out the structure of the Excel sheet, then prompting to implement it. It even added unit tests to the Python model itself, which I was impressed with!"

"1 or 2 plan mode prompts" to fully describe a 30-sheet complicated doc suggests a massively higher level of granularity than Opus initial plans on existing codebases give me or a less-than-expected level of Excel craziness.

And the tooling harnesses have been telling the models to add testing to things they make for months now, so why's that impressive or suprising?

martinald · 2026-02-02T01:53:46 1769997226

No it didn't make a giant plan of every detail. It made a plan of the core concepts and then when it was in implementation mode it kept checking the excel file to get more info. It took around ~30 mins in implementation mode to build it.

I was impressed because the prompt didn't ask it to do that. It doesn't normally add tests for me without asking, YMMV.

majormajor · 2026-02-02T01:56:56 1769997416

Ah, I see.

Did it build a test suite for the Excel side? A fuzzer or such?

It's the cross-concern interactions that still get me.

80% of what I think about these days when writing software is how to test more exhaustively without build times being absolute shit (and not necessarily actually being exhaustive anyway).

martinald · 2026-02-02T01:12:25 1769994745

They have an excel sheet next to it - they can test it against that. Plus they can ask questions if something seems off and have it explain the code.

AlotOfReading · 2026-02-02T01:34:35 1769996075

I'm not sure being able to verify that it's vaguely correct really solves the issue. Consider how many edge cases inhabit a "30 sheet, mind-numbingly complicated" Excel document. Verifying equivalence sounds nontrivial, to put it mildly.

Draiken · 2026-02-02T12:02:59 1770033779

They don't care. This is clearly someone looking to score points and impress with the AI magic trick.

The best part is that they can say the AI will get some stuff wrong, they knew that, and it's not their fault when it breaks. Or more likely, it'll break in subtle ways, nobody will ever notice and the consequences won't be traced back to this. YOLO!

Dylan16807 · 2026-02-02T05:25:35 1770009935

Consider how many edge cases it misses. Equivalence probably shouldn't be the top priority here.

Nevermark · 2026-02-02T06:17:54 1770013074

Equivalence here would definitely be the worst test, except for all the alternatives.

lmm · 2026-02-02T01:36:35 1769996195

> They have an excel sheet next to it - they can test it against that.

It used to be that we'd fix the copy-paste bugs in the excel sheet when we converted it to a proper model, good to know that we'll now preserve them forever.

karlgkk · 2026-02-02T01:14:37 1769994877

[flagged]

yomismoaqui · 2026-02-02T01:20:32 1769995232

You would be surprised at the volume of money made by businesses supported by Excel.

martinald · 2026-02-02T01:23:07 1769995387

Yes. I suspect there are thousands of Excel files that "process" >$1bn/yr out there.

irishcoffee · 2026-02-02T04:39:11 1770007151

Allow me to introduce to you: ACH. It is truly fascinating.

martinald · 2026-01-29T09:18:35 1769678315

Ollama is CLI/API "first". LM studio is a proper full blown gui with chat features etc. It's far easier to use than Ollama at least for non technical users (though they are increasingly merging in functionality, with LM studio adding CLI/API features and Ollama adding more UI).

james_marks · 2026-01-29T11:26:22 1769685982

Even as a technical person, when I wanted to play with running models locally, LM Studio turned it into a couple of button clicks.

Without much background, you’re finding models, chatting with them, have an OpenAI-compatible API w/logging. Haven’t seen the new version, but LM Studio was already pretty great.