With search-replace you could work on separate part of a file independently with the LLM. Not to mention with each edit all lines below are shifted so you now need to provide LLM with the whole content.
(not the author) it works fine most of the time been using it alongside an active agent and haven't ran into too many noticable problems. The token savings alone are worth it.
In my personal benchmark it's bad. So far the benchmark has been a really good indicator of instruction following and agentic behaviour in general.
To those who are curious, the benchmark is just the ability of model to follow a custom tool calling format. I ask it to using coding tasks using chat.md [1] + mcps. And so far it's just not able to follow it at all.
I'm developing a personal text editor with vim keybindings and paused work because I couldn't think of a good interface that felt right. This could be it.
I think I'll update my editor to do something like this but with intelligent "collapsing" of extra text to reduce visual noise.
I couldn't decide on folding and reducing noise so I'm stuck on that front. I believe there is some elegant solution that I'm missing, hope to see your take.
Custom tool calling formats are iffy in my experience. The models are all reinforcement learned to follow specific ones, so it’s always a battle and feels to me like using the tool wrong.
Have you had good results with the other frontier models?
Not the parent commenter, but in my testing, all recent Claudes (4.5 onward) and the Gemini 3 series have been pretty much flawless in custom tool call formats.
Be careful with openrouter. They routinely host quantized versions of models via their listed providers and the models just suck because of that. Use the original providers only.
I specifically do not use the CN/SG based original provider simply because I don't want my personal data traveling across the pacific. I try to only stay on US providers. Openrouter shows you what the quantization of each provider is, so you can choose a domestic one that's FP8 if you want
Not really. China doesn't share a border with us, doesn't claim any EU territory, and didn't historically rule our lands the way the USSR did. In the context of spheres of influence and security interests, its strategic goals aren't directly at odds with the EU's core interests.
> EU is not a singular country, and Germany or France don't border Russia either.
But soon they could, that's the problem.
> Considering China is ok to supply Russia, I don't see how your second point has any standing either.
Supply? China supplies Ukraine too. Ukraine's drone sector runs heavily on Chinese supply chains. And if China really wanted to supply Russia, the war would likely be over by now, Russia would have taken all of Ukraine.
This, along with John Ousterhout's talk [1] on deep interfaces was transformational for me. And this is coming from a guy who codes in python, so lots of transferable learnings.
Sonnet has the same behavior: drops thinking on user message. Curiously in the latest Opus they have removed this behavior and all thinking tokens are preserved.
To those who are not deterred and feel yolo mode is worth the risk, there are two patterns that should perk your ears up.
- Cleanup or deletion tasks. Be ready to hit ctrl c anytime. Led to disastrous nukes in two reddit threads.
- Errors impacting the whole repo, especially those that are difficult to solve. In such cases if it decides to reset and redo, it may remove sensitive paths as well.
It removed my repo once because "it had multiple problems and was better to it write from scratch".
- Any weird behavior, "this doesn't seem right", "looks like shell isn't working correctly" indicative of application bug. It might employ dangerous workarounds.
Right, you're describing sampling a single token which is equivalent to sampling from one step in the Markov Chain. When generating output you're repeating this process and updating your state sequentially which is the definition of the Markov Chain since at each state the embedding (which represents our current state) is conditionally independent of the past.
Every response from an LLM is essentially the sampling of a Markov chain.
With search-replace you could work on separate part of a file independently with the LLM. Not to mention with each edit all lines below are shifted so you now need to provide LLM with the whole content.
Have you tested followup edits on the same files?
reply