More

msp26 · 2026-02-01T15:23:27 1769959407

> Special shout out to Google who to this date seem to not support tool call streaming which is extremely Google.

Google doesn't even provide a tokenizer to count tokens locally. The results of this stupidity can be seen directly in AI studio which makes an API call to count_tokens every time you type in the prompt box.

haxel · 2026-02-01T17:50:14 1769968214

AI studio also has a bug that continuously counts the tokens, typing or not, with 100% CPU usage.

Sometimes I wonder who is drawing more power, my laptop or the TPU cluster on the other side.

Havoc · 2026-02-02T00:47:04 1769993224

Same for clause code. It’s constantly sending token counting requests

localhost · 2026-02-01T16:58:24 1769965104

tbf neither does anthropic

msp26 · 2026-01-30T04:53:17 1769748797

This doesn't surprise me.

I have a SKILL.md for marimo notebooks with instructions in the frontmatter to always read it before working with marimo files. But half the time Claude Code still doesn't invoke it even with me mentioning marimo in the first conversation turn.

I've resorted to typing "read marimo skill" manually and that works fine. Technically you can use skills with slash commands but that automatically sends off the message too which just wastes time.

But the actual concept of instructions to load in certain scenarios is very good and has been worth the time to write up the skill.

msp26 · 2026-01-27T17:57:36 1769536656

Source? I've heard this rumour twice but never seen proof. I assume it would be based on tokeniser quirks?

msp26 · 2026-01-27T08:38:23 1769503103

K2 thinking didn't have vision which was a big drawback for my projects.

msp26 · 2026-01-24T19:41:32 1769283692

Mildly related question for the people in the thread:

How do I seek to the exact first frame of a timestamp with mux? I've tried a few things but it seems to always go to the nearest keyframe rather than the first frame at e.g. 00:34. This is sensible default behaviour but bad for my use case.

Rodeoclash · 2026-01-24T21:44:03 1769291043

I don't think it's possible with players like the one Mux uses (I assume is using the underlying video technology in the browser).

Some developments in this space over the past few years have been the ability to interact with the actual frames of video being rendered and to output those into a canvas tag. This is under the Web Codecs API.

For a while I was working on a video review tool for eSports teams which required the ability to have frame perfect annotations. I got around the inability to perfectly pause on the same frame by using screenshots of the video which were overlayed over the video but with the codecs API, you don't actually need this. It opens up all sorts of features like being able to play videos backwards for example.

Heff · 2026-01-25T02:13:12 1769307192

Video.js creator here, and I agree with this ^. Frame accurate seeking isn't something the native video element does.

Check out the Omakase player: https://player.byomakase.org/

Rodeoclash · 2026-01-25T06:47:59 1769323679

Ooh, very nice link. I've basically been waiting for something like this to come along before I pick up the tools again!

msp26 · 2026-01-25T08:55:34 1769331334

Thank you! That looks great.

msp26 · 2026-01-23T16:41:56 1769186516

Originally I thought that Gas Town was some form of high level satire like GOODY-2 but it seems that some of you people have actually lost the plot.

Ralph loops are also stupid because they don't make use of kv cache properly.

---

https://github.com/steveyegge/gastown/issues/503

Problem:

Every gt command runs bd version to verify the minimum beads version requirement. Under high concurrency (17+ agent sessions), this check times out and blocks gt commands from running.

Impact:

With 17+ concurrent sessions each running gt commands:

- Each gt command spawns bd version

- Each bd version spawns 5-7 git processes

- This creates 85-120+ git processes competing for resources

- The 2-second timeout in gt is exceeded

- gt commands fail with "bd version check timed out"

tucnak · 2026-01-23T18:19:07 1769192347

I think it is satire, and pretty obvious one at that; is anybody taking it for real?

skybrian · 2026-01-23T22:49:22 1769208562

Why not both? I think it's pretty clearly both for fun and serious.

He's thrown out his experiments before. Maybe he'll start over one more time.

tucnak · 2026-01-24T12:55:17 1769259317

The big challenge for me so far has been about setting up "breakpoints" with sufficient prompt adherence, i.e. conditions for agents to break out of loop, and request actionable feedback, rather than pumping as many tokens as possible. Use cases where pumping tokens in unsupervised manner is warranted, are far and few between. For example, dataset-scale 1:n and n:n transformations have been super easy to set up, but the same implementation typically doesn't lend nicely to agent loops, as batching/KV caching suddenly becomes non-obvious and costs ramp up. Task scheduling, with lockstep batching, is a big, unsolved problem as of yet, and Gas Town is not inspiring confidence to that end.

alex_sf · 2026-01-23T17:14:13 1769188453

> Ralph loops are also stupid because they don't make use of kv cache properly.

This is a cost/resources thing. If it's more effective and the resources are available, it's completely fine.

BoneShard · 2026-01-24T00:08:33 1769213313

Gaslighting town.

msp26 · 2026-01-23T12:35:33 1769171733

This account's comment history is pure slop. 90% sure its all AI generated. The structure is too blatant.

msp26 · 2026-01-17T04:39:43 1768624783

Incredible guide, wow. Will definitely share with people. I wish I had something like this a year ago.

msp26 · 2026-01-11T18:44:13 1768157053

> because there's already concern that AI models are getting worse. The models are being fed on their own AI slop and synthetic data in an error-magnifying doom-loop known as "model collapse."

Model collapse is a meme that assumes zero agency on the part of the researchers.

I'm unsure how you can have this conclusion when trying any of the new models. In the frontier size bracket we have models like Opus 4.5 that are significantly better at writing code and using tools independently. In the mid tier Gemini 3.0 flash is absurdly good and is crushing the previous baseline for some of my (visual) data extraction projects. And small models are much better overall than they used to be.

Ifkaluva · 2026-01-11T20:09:09 1768162149

The big labs spend a ton of effort on dataset curation.

It goes further than just preventing poison—they do lots of testing on the dataset to find the incremental data that produces best improvements on model performance, and even train proxy models that predict whether data will improve performance or not. “Data Quality” is usually a huge division with a big budget.

stonogo · 2026-01-11T22:42:53 1768171373

The common thread from all the frontier orgs is that the datasets are too big to vet, and they're spending lots of money on lobbying to ensure they don't get punished for that. In short, the current corporate stance seems to be that they have zero agency, so which is it?

NewsaHackO · 2026-01-11T23:51:53 1768175513

Huh? Unless you are talking about DMCA, I haven't heard about that at all. Most AI companies go to great lengths to prevent exfiltration of copyrighted material.

soulofmischief · 2026-01-11T19:41:10 1768160470

Even if it's a meme for the general public, actual ML researchers do have to document, understand and discuss the concept of model collapse in order to avoid it.

ACCount37 · 2026-01-12T12:06:39 1768219599

It's a meme even if you assume zero agency on the part of the researchers.

So far, every serious inquiry into "does AI contamination in real world scraped data hurt the AI performance" has resulted in things like: "nope", "if it does it's below measurement error" and "seems to help actually?"

biophysboy · 2026-01-11T20:39:39 1768163979

Yes, this particular threat seems silly to me. Isn't it a standard thing to rollback databases? If the database gets worse, roll it back and change your data ingestion approach.

jbstack · 2026-01-12T09:21:07 1768209667

If you need a strategy to mitigate it (roll back and change approach) then it isn't really fair to describe it as "silly". If it's silly you could just ignore it altogether.

mrtesthah · 2026-01-11T19:11:05 1768158665

Coding and reasoning skills can be improved using machine-driven reinforcement learning.

https://arxiv.org/abs/2501.12948

conartist6 · 2026-01-11T19:44:41 1768160681

Well, they seem to have 0 agency. They left child pornography in the training sets. The people gathering the data committed enormous crimes, wantonly. Science is disintegrating along with public trust in science as fake papers peer reviewed by fake peer reviewers slop along. And from what I hear there has been no more training on the open internet anymore in recent years as it's simply too toxic.

msp26 · 2025-12-11T02:34:17 1765420457

Hi if the Gemini API team is reading this can you please be more transparent about 'The specified schema produces a constraint that has too many states for serving. ...' when using Structured Outputs.

I assume it has something to do with the underlying constraint grammar/token masks becoming too long/taking too long to compute. But as end users we have no way of figuring out what the actual limits are.

OpenAI has more generous limits on the schemas and clearer docs. https://platform.openai.com/docs/guides/structured-outputs#s....

You guys closed this issue for no reason: https://github.com/googleapis/python-genai/issues/660

Other than that, good work! I love how fast the Gemini models are. The current API is significantly less of a shitshow compared to last year with property ordering etc.

shresbm123 · 2025-12-13T23:55:46 1765670146

Thanks for the feedback. Sorry that we closed the bug without giving you a clear indication of why. Let us look into this