The function calling behavior of Gemini models has become completely unreliable and unpredictable:
Function calling used to work somewhat reliably, but recently it has stopped working almost entirely.
Instead of invoking the function (as verified in the API response), the model simply follows the schema and instructions, generating a plain text response without actually calling the function.
Occasionally, with the same query, the model will randomly use function calling again, but the output is often worse—ignoring instructions and schema more than when it just generates a text only response (!)
gemini-2.5-flash-preview-04-17 is noticeably better than the production gemini-2.5-flash
at following instructions and schemas - when it works. The fact that the preview model is better than production raises concerns about stability and release practices.
This erratic behavior makes it impossible to build or trust production systems on top of these APIs.
Ongoing Bug:
The long-standing bug where the model starts its response with a ```json code block—even when explicitly instructed not to—remains unresolved after several months. This forces us to implement unreliable workarounds.
What is the reason for this regression/change of behavior? Will it be fixed?
Will Google ensure that production models match or exceed the reliability and quality of preview versions?
Gemini models are exceptional imo, but we can’t built any serious application if this is the type of performance & reliability we get.
How can I share the code? It’s a pretty complex assistant with a layered agent architecture + large schema for function calling… Should we jump on a call to share details and logs? That’s probably the easiest way. IMO you need to work with design partners like us to get these issues resolved fast. Difficult to test everything otherwise. Let me know
If you could share some part of your code that would be helpful as we could try to reproduce your issue. But I have some suggestions which could help you:
Function calls are highly dependent on prompt and tool description. So you can try to improve tool descriptions and provide more specific instructions for function calls. If you always want a function call then there is a feature to force function call as well (example code below).
# Configure the client and tools
client = genai.Client()
house_tools = [
types.Tool(function_declarations=[power_disco_ball, start_music, dim_lights])
]
config = types.GenerateContentConfig(
tools=house_tools,
automatic_function_calling=types.AutomaticFunctionCallingConfig(
disable=True
),
# Force the model to call 'any' function, instead of chatting.
tool_config=types.ToolConfig(
function_calling_config=types.FunctionCallingConfig(mode='ANY')
),
)
When I use my previous flow, i.e. without forcing, I get a success but with all the issues I mentioned.
GoogleProvider: API request successful
This is the code used to force the function call
if (options.force_function_call) {
requestBody.toolConfig = {
functionCallingConfig: {
mode: "ANY" // This forces the model to call a function.
}
};
console.log(`🔧 GoogleProvider: Function calling mode: FORCED (ANY)`);
}
again, the major issue here is that it used to work no problem until a week or so ago. Now it’s returning text instead of calling the function (yet following the schema that was provided). It’s a bug in the API/model.
And also, the model keeps starting every response with ```json block even if explicitly asked not to - this has been going on for a couple of months with no resolution, forcing us to brittle workarounds.