Irony is that jane street hires from prestigious Indian schools too, for pretty obscene salaries. These salaries get hyped and celebrated over newspapers.
Can someone elucidate how using a full blown browser is improvisation over using say markitdown / pandoc / whatever? Given that most useful coding docs sites are static (made with sphinx or mkdocs or whatever)
The problem I see with MCP is very simple. It's using JSON as the format and that's nowhere as expressive as a programming language.
Consider a python function signature
list_containers(show_stopped: bool = False, name_pattern: Optional[str] = None, sort: Literal["size", "name", "started_at"] = "name"). It doesn't even need docs
Now convert this to JSON schema which is 4x larger input already.
And when generating output, the LLM will generate almost 2x more tokens too, because JSON. Easier to get confused.
And consider that the flow of calling python functions and using their output to call other tools etc... is seen 1000x more times in their fine tuning data, whereas JSON tool calling flows are rare and practically only exist in instruction tuning phase. Then I am sure instruction tuning also contains even more complex code examples where model has to execute complex logic.
Then theres the whole issue of composition. To my knowledge there's no way LLM can do this in one response.
vehicle = call_func_1()
if vehicle.type == "car":
details = lookup_car(vehicle.reg_no)
else if vehicle.type == "motorcycle":
details = lookup_motorcycle(vehicle.reg_ni)
the reason to use the llm is that you dont know ahead of time that the vehicle type is only a car or motorcycle, and the llm will also figure out a way to detail bycicles and boats and airplanes, and to consider both left and right shoes separately.
the llm cant just be given this function because its specialized to just the two options.
you could have it do a feedback loop of rewriting the python script after running it, but whats the savings at tha point? youre wasting tokens talking about cars in python when you already know is a ski, and the llm could ask directly for the ski details without writing a script to do it in between
But "the" problem with MCP? IMVHO (Very humble, non-expert) the half-baked or missing security aspects are more fundamental. I'd love to hear updates about that from ppl who know what they're talking about.
Wasn't there a tool calling benchmark by docker guys which concluded qwen models are nearly as good as GPT? What is your experience about it?
Personally I am convinced JSON is a bad format for LLMs and code orchestration in python-ish DSL is the future. But local models are pretty bad at code gen too.
People are used to the `click` way, where you can define args as function parameters. It's little more verbose but it helps click is a very established library which also provides many other things needed by CLI tools.
There's also `typer` from the creator of `fastapi` which relies on type annotations. I have not had the opportunity to use it.
At this point you're just flexing that you have 96GiB machine. (Average developer machines are more like 16GiB)
But that's not the point. If every dependency follows same philosophy, costs (compiler time, binary size, dependency supply chain) will add up very quickly.
Not to mention, in big organizations, you have to track each 3rd party and transitive dependency you add to the codebase (for very good reasons).
I can write and have written hand-tuned assembly when every byte is sacred. That’s valuable in the right context. But that’s not the common case. In most situations, I’d rather spend those resources on code ergonomics, a flexible and heavily documented command line, and a widely used standard that other devs know how to use and contribute to.
And by proportion, that library would add an extra .7 bytes to a Commodore 64 program. I would have cheerfully “wasted” that much space for something 100th as nice as Clap.
I’ve worked in big organizations and been the one responsible for tracking dependencies, their licenses, and their vulnerable versions. No one does that by hand after a certain size. Snyk is as happy to track 1000 dependencies as 10.
96? It sounds more like 64 to me, which is probably above average but not exactly crazy. I've had 64 GB in my personal desktop for years, and most laptops I've used in the past 5 years or so for work have had 32 GB. If it takes up 1/4700 of memory, I don't think it changes things much. Plus, argument parsing tends to be done right at the beginning of the program and completely unused again by the time anything else happens, so even if the parsing itself is inefficient, it seems like maybe the least worrisome place I could imagine to optimize for developer efficiency over performance.
reply