How good is Gemma at structured output generation, JSON schema compliance and tool use? Particularly the smaller versions, particularly in foreign languages?
We will run our internal evals on it for sure, but just wanted to ask whether that's even a use case that the team considered and trained for.
Hey, I'm from the Gemma team. There's a couple of angles to your question
We do care about prompted instructions, like json schema, and it is something we eval for and encourage you to try. Here's an example from Gemma2 to guide folks looking to do what it sounds like you're interested in.
The Ollama stuff is the old llama.cpp stuff that constrains output tokens.
It's great, I've used it to get outputs from as small a model as 1B.
But it's a stark difference in quality from, say, Phi-4's native tool-calling.
If Gemma 3 is natively trained on tool-calling, i.e. y'all are benching on say, Berekley Function Calling leaderboard, that'd be great to know out here.
Tangentially, github.com/ochafik is a Googler who landed an excellent overhaul of llama.cpp's tool-calling, might be worth reaching out to (if you're not working with him already!)
Just tried gemma3:4b for structured output and it fails with a strange error ( ollama is the latest):
Ollama error: POST predict: Post "http://127.0.0.1:49675/completion": read tcp 127.0.0.1:49677->127.0.0.1:49675: wsarecv: An existing connection was forcibly closed by the remote host.
Not sure this is Ollama or gemma3:4b problem. At the same time, gemma3:12b works fine for the same API request (100% identical, only difference is model id).
We will run our internal evals on it for sure, but just wanted to ask whether that's even a use case that the team considered and trained for.