Wasn't there a tool calling benchmark by docker guys which concluded qwen models...

		never_inline 7 days ago \| parent \| context \| favorite \| on: Tools: Code Is All You Need Wasn't there a tool calling benchmark by docker guys which concluded qwen models are nearly as good as GPT? What is your experience about it? Personally I am convinced JSON is a bad format for LLMs and code orchestration in python-ish DSL is the future. But local models are pretty bad at code gen too.