Hacker Newsnew | past | comments | ask | show | jobs | submit | msp26's commentslogin

> Special shout out to Google who to this date seem to not support tool call streaming which is extremely Google.

Google doesn't even provide a tokenizer to count tokens locally. The results of this stupidity can be seen directly in AI studio which makes an API call to count_tokens every time you type in the prompt box.


AI studio also has a bug that continuously counts the tokens, typing or not, with 100% CPU usage.

Sometimes I wonder who is drawing more power, my laptop or the TPU cluster on the other side.


Same for clause code. It’s constantly sending token counting requests

tbf neither does anthropic

This doesn't surprise me.

I have a SKILL.md for marimo notebooks with instructions in the frontmatter to always read it before working with marimo files. But half the time Claude Code still doesn't invoke it even with me mentioning marimo in the first conversation turn.

I've resorted to typing "read marimo skill" manually and that works fine. Technically you can use skills with slash commands but that automatically sends off the message too which just wastes time.

But the actual concept of instructions to load in certain scenarios is very good and has been worth the time to write up the skill.


Source? I've heard this rumour twice but never seen proof. I assume it would be based on tokeniser quirks?


K2 thinking didn't have vision which was a big drawback for my projects.


Mildly related question for the people in the thread:

How do I seek to the exact first frame of a timestamp with mux? I've tried a few things but it seems to always go to the nearest keyframe rather than the first frame at e.g. 00:34. This is sensible default behaviour but bad for my use case.


I don't think it's possible with players like the one Mux uses (I assume is using the underlying video technology in the browser).

Some developments in this space over the past few years have been the ability to interact with the actual frames of video being rendered and to output those into a canvas tag. This is under the Web Codecs API.

For a while I was working on a video review tool for eSports teams which required the ability to have frame perfect annotations. I got around the inability to perfectly pause on the same frame by using screenshots of the video which were overlayed over the video but with the codecs API, you don't actually need this. It opens up all sorts of features like being able to play videos backwards for example.


Video.js creator here, and I agree with this ^. Frame accurate seeking isn't something the native video element does.

Check out the Omakase player: https://player.byomakase.org/


Ooh, very nice link. I've basically been waiting for something like this to come along before I pick up the tools again!


Thank you! That looks great.


Originally I thought that Gas Town was some form of high level satire like GOODY-2 but it seems that some of you people have actually lost the plot.

Ralph loops are also stupid because they don't make use of kv cache properly.

---

https://github.com/steveyegge/gastown/issues/503

Problem:

Every gt command runs bd version to verify the minimum beads version requirement. Under high concurrency (17+ agent sessions), this check times out and blocks gt commands from running.

Impact:

With 17+ concurrent sessions each running gt commands:

- Each gt command spawns bd version

- Each bd version spawns 5-7 git processes

- This creates 85-120+ git processes competing for resources

- The 2-second timeout in gt is exceeded

- gt commands fail with "bd version check timed out"


I think it is satire, and pretty obvious one at that; is anybody taking it for real?


Why not both? I think it's pretty clearly both for fun and serious.

He's thrown out his experiments before. Maybe he'll start over one more time.


The big challenge for me so far has been about setting up "breakpoints" with sufficient prompt adherence, i.e. conditions for agents to break out of loop, and request actionable feedback, rather than pumping as many tokens as possible. Use cases where pumping tokens in unsupervised manner is warranted, are far and few between. For example, dataset-scale 1:n and n:n transformations have been super easy to set up, but the same implementation typically doesn't lend nicely to agent loops, as batching/KV caching suddenly becomes non-obvious and costs ramp up. Task scheduling, with lockstep batching, is a big, unsolved problem as of yet, and Gas Town is not inspiring confidence to that end.


> Ralph loops are also stupid because they don't make use of kv cache properly.

This is a cost/resources thing. If it's more effective and the resources are available, it's completely fine.


Gaslighting town.


This account's comment history is pure slop. 90% sure its all AI generated. The structure is too blatant.


Incredible guide, wow. Will definitely share with people. I wish I had something like this a year ago.


> because there's already concern that AI models are getting worse. The models are being fed on their own AI slop and synthetic data in an error-magnifying doom-loop known as "model collapse."

Model collapse is a meme that assumes zero agency on the part of the researchers.

I'm unsure how you can have this conclusion when trying any of the new models. In the frontier size bracket we have models like Opus 4.5 that are significantly better at writing code and using tools independently. In the mid tier Gemini 3.0 flash is absurdly good and is crushing the previous baseline for some of my (visual) data extraction projects. And small models are much better overall than they used to be.


The big labs spend a ton of effort on dataset curation.

It goes further than just preventing poison—they do lots of testing on the dataset to find the incremental data that produces best improvements on model performance, and even train proxy models that predict whether data will improve performance or not. “Data Quality” is usually a huge division with a big budget.


The common thread from all the frontier orgs is that the datasets are too big to vet, and they're spending lots of money on lobbying to ensure they don't get punished for that. In short, the current corporate stance seems to be that they have zero agency, so which is it?


Huh? Unless you are talking about DMCA, I haven't heard about that at all. Most AI companies go to great lengths to prevent exfiltration of copyrighted material.


Even if it's a meme for the general public, actual ML researchers do have to document, understand and discuss the concept of model collapse in order to avoid it.


It's a meme even if you assume zero agency on the part of the researchers.

So far, every serious inquiry into "does AI contamination in real world scraped data hurt the AI performance" has resulted in things like: "nope", "if it does it's below measurement error" and "seems to help actually?"


Yes, this particular threat seems silly to me. Isn't it a standard thing to rollback databases? If the database gets worse, roll it back and change your data ingestion approach.


If you need a strategy to mitigate it (roll back and change approach) then it isn't really fair to describe it as "silly". If it's silly you could just ignore it altogether.


Coding and reasoning skills can be improved using machine-driven reinforcement learning.

https://arxiv.org/abs/2501.12948


Well, they seem to have 0 agency. They left child pornography in the training sets. The people gathering the data committed enormous crimes, wantonly. Science is disintegrating along with public trust in science as fake papers peer reviewed by fake peer reviewers slop along. And from what I hear there has been no more training on the open internet anymore in recent years as it's simply too toxic.


Hi if the Gemini API team is reading this can you please be more transparent about 'The specified schema produces a constraint that has too many states for serving. ...' when using Structured Outputs.

I assume it has something to do with the underlying constraint grammar/token masks becoming too long/taking too long to compute. But as end users we have no way of figuring out what the actual limits are.

OpenAI has more generous limits on the schemas and clearer docs. https://platform.openai.com/docs/guides/structured-outputs#s....

You guys closed this issue for no reason: https://github.com/googleapis/python-genai/issues/660

Other than that, good work! I love how fast the Gemini models are. The current API is significantly less of a shitshow compared to last year with property ordering etc.


Thanks for the feedback. Sorry that we closed the bug without giving you a clear indication of why. Let us look into this


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: