Function Calling: OpenAI vs Anthropic vs Gemini 2026
TL;DR
Function calling (or "tool use") is the feature that turns LLMs from chat engines into agents — they can call your APIs, run database queries, fetch live data, and take actions in the world. OpenAI, Anthropic, and Google all support it, but with key differences in syntax, parallel execution, error handling, and how they handle tool results. In 2026, all three are production-ready, but OpenAI has the most mature ecosystem, Anthropic has the cleanest multi-tool patterns, and Gemini offers the deepest Google service integrations.
Key Takeaways
- All three providers: support parallel function calls (multiple tools at once)
- OpenAI (
tools+tool_choice): most widely supported, best SDK ecosystem,strict: truemode for guaranteed schema compliance - Anthropic (
tools+tool_use): cleanest API design, beta Computer Use extends to desktop automation - Gemini (
function_declarations): native Google Search grounding, best for Google service integrations - Schema: all three use JSON Schema for parameter definitions — code is mostly portable
- Agentic loops: all three require you to build the loop yourself — or use LangChain/Vercel AI SDK
What Function Calling Actually Does
Without function calling:
User: "What's the weather in Tokyo?"
LLM: "I don't have access to real-time weather data..."
With function calling:
User: "What's the weather in Tokyo?"
LLM: → calls get_weather(location="Tokyo, Japan")
App: → fetches weather API → returns {temp: 18, condition: "cloudy"}
LLM: "It's 18°C and cloudy in Tokyo right now."
The LLM doesn't execute the function — it decides to call it and provides arguments. Your code executes it and returns results to the LLM.
OpenAI: The Most Mature Ecosystem
OpenAI introduced function calling in June 2023. In 2026, it's the most widely used implementation.
Basic Setup
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// Define tools (formerly "functions"):
const tools: OpenAI.Chat.Completions.ChatCompletionTool[] = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a location',
strict: true, // ← OpenAI 2024+ feature: guarantees schema compliance
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City and country, e.g. "Tokyo, Japan"',
},
unit: {
type: 'string',
enum: ['celsius', 'fahrenheit'],
description: 'Temperature unit',
},
},
required: ['location'],
additionalProperties: false, // Required with strict: true
},
},
},
];
async function chat(userMessage: string) {
const messages: OpenAI.Chat.Completions.ChatCompletionMessageParam[] = [
{ role: 'user', content: userMessage },
];
// First call — LLM may return tool_calls:
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
tools,
tool_choice: 'auto', // or 'required', 'none', or {type: 'function', function: {name: 'get_weather'}}
});
const message = response.choices[0].message;
// Check if the model wants to call tools:
if (message.tool_calls && message.tool_calls.length > 0) {
messages.push(message); // Add assistant message with tool_calls
// Execute each tool call:
for (const toolCall of message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);
const result = await executeFunction(toolCall.function.name, args);
// Return tool result:
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result),
});
}
// Second call — LLM generates final response:
const finalResponse = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
});
return finalResponse.choices[0].message.content;
}
return message.content;
}
async function executeFunction(name: string, args: Record<string, unknown>) {
if (name === 'get_weather') {
// Call your actual weather API:
return { temperature: 18, condition: 'cloudy', location: args.location };
}
throw new Error(`Unknown function: ${name}`);
}
OpenAI Parallel Tool Calls
OpenAI will call multiple tools in a single response when it determines they're independent:
// Query: "What's the weather in Tokyo AND Paris?"
// OpenAI returns TWO tool_calls in message.tool_calls[]:
[
{ id: 'call_abc', function: { name: 'get_weather', arguments: '{"location":"Tokyo"}' } },
{ id: 'call_xyz', function: { name: 'get_weather', arguments: '{"location":"Paris"}' } },
]
// Execute both simultaneously:
const results = await Promise.all(
message.tool_calls.map(async (toolCall) => {
const args = JSON.parse(toolCall.function.arguments);
const result = await executeFunction(toolCall.function.name, args);
return {
role: 'tool' as const,
tool_call_id: toolCall.id,
content: JSON.stringify(result),
};
})
);
messages.push(...results);
strict: true Mode — OpenAI's Killer Feature
With strict: true, OpenAI guarantees arguments will match your schema. No more parsing errors:
// WITHOUT strict: model might return:
// { "location": "Tokyo" } ← missing required 'unit'
// { "loc": "Tokyo", "unit": "c" } ← wrong property name
// WITH strict: true, model ALWAYS returns valid JSON matching your exact schema
// This means you can safely use JSON.parse() without try/catch
Anthropic: Cleanest API Design
Anthropic calls it "tool use" rather than "function calling" — the concepts are identical but the API is cleaner.
import Anthropic from '@anthropic-ai/sdk';
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
// Define tools:
const tools: Anthropic.Messages.Tool[] = [
{
name: 'get_weather',
description: 'Get current weather for a location',
input_schema: {
type: 'object' as const,
properties: {
location: {
type: 'string',
description: 'City and country, e.g. "Tokyo, Japan"',
},
unit: {
type: 'string',
enum: ['celsius', 'fahrenheit'],
},
},
required: ['location'],
},
},
];
async function chat(userMessage: string) {
const messages: Anthropic.Messages.MessageParam[] = [
{ role: 'user', content: userMessage },
];
// First call:
const response = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
tools,
messages,
tool_choice: { type: 'auto' }, // or 'any', 'none', or {type: 'tool', name: 'get_weather'}
});
// Anthropic uses stop_reason: 'tool_use' instead of checking tool_calls:
if (response.stop_reason === 'tool_use') {
// Add assistant response (may contain both text + tool_use blocks):
messages.push({ role: 'assistant', content: response.content });
// Find tool use blocks:
const toolResults: Anthropic.Messages.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type === 'tool_use') {
const result = await executeFunction(block.name, block.input as Record<string, unknown>);
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: JSON.stringify(result),
});
}
}
// Return results in a user message (Anthropic's pattern):
messages.push({
role: 'user',
content: toolResults,
});
// Second call:
const finalResponse = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
tools,
messages,
});
// Extract text from response:
return finalResponse.content
.filter((b) => b.type === 'text')
.map((b) => (b as Anthropic.Messages.TextBlock).text)
.join('');
}
return response.content
.filter((b) => b.type === 'text')
.map((b) => (b as Anthropic.Messages.TextBlock).text)
.join('');
}
Anthropic's Unique: Mixed Content in Responses
Anthropic's API can return both text AND tool calls in the same response:
// Anthropic response.content might look like:
[
{ type: 'text', text: "Let me check the weather for both cities." },
{ type: 'tool_use', id: 'toolu_01', name: 'get_weather', input: { location: 'Tokyo' } },
{ type: 'tool_use', id: 'toolu_02', name: 'get_weather', input: { location: 'Paris' } },
]
// The text block is Claude "thinking aloud" before calling the tools
// This is useful for debugging and transparency
Tool Error Handling (Anthropic)
// If a tool fails, return the error as content:
toolResults.push({
type: 'tool_result',
tool_use_id: block.id,
content: 'Error: Weather API unavailable',
is_error: true, // ← Anthropic-specific: signals this is an error response
});
// Claude will then handle the error gracefully in its response
Google Gemini: Google-Native Integrations
Gemini's function calling uses function_declarations and has unique systemInstruction + built-in Google Search grounding.
import { GoogleGenerativeAI, FunctionDeclaration } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
// Define function declarations:
const functionDeclarations: FunctionDeclaration[] = [
{
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'OBJECT' as any,
properties: {
location: {
type: 'STRING' as any,
description: 'City and country name',
},
unit: {
type: 'STRING' as any,
enum: ['celsius', 'fahrenheit'],
},
},
required: ['location'],
},
},
];
const model = genAI.getGenerativeModel({
model: 'gemini-2.0-flash',
tools: [{ functionDeclarations }],
toolConfig: {
functionCallingConfig: {
mode: 'AUTO', // or 'NONE', 'ANY'
},
},
});
async function chat(userMessage: string) {
const chat = model.startChat();
const result = await chat.sendMessage(userMessage);
const response = result.response;
// Check for function calls:
const functionCalls = response.functionCalls();
if (functionCalls && functionCalls.length > 0) {
const functionResponses = await Promise.all(
functionCalls.map(async (call) => {
const result = await executeFunction(call.name, call.args as Record<string, unknown>);
return {
functionResponse: {
name: call.name,
response: result,
},
};
})
);
// Send function results back:
const finalResult = await chat.sendMessage(functionResponses);
return finalResult.response.text();
}
return response.text();
}
Gemini's Killer Feature: Google Search Grounding
Unlike OpenAI and Anthropic, Gemini can use Google Search as a built-in tool:
const model = genAI.getGenerativeModel({
model: 'gemini-2.0-flash',
tools: [
{ googleSearch: {} }, // Built-in Google Search tool
{ functionDeclarations }, // Your custom functions
],
});
// Now Gemini can search Google AND call your functions
// No web scraping, no Perplexity, just native Google Search results
Side-by-Side Comparison
| Feature | OpenAI | Anthropic | Gemini |
|---|---|---|---|
| Terminology | Function calling / Tools | Tool use | Function calling |
| Schema format | JSON Schema | JSON Schema | Protobuf-like (also JSON Schema) |
| Schema strictness | strict: true option | No equivalent | No equivalent |
| Parallel calls | ✅ Automatic | ✅ Automatic | ✅ Automatic |
| Built-in tools | Code interpreter, web search (Responses API) | Computer Use (beta) | Google Search, Code execution |
| Tool result format | role: "tool" messages | tool_result content blocks | functionResponse parts |
| Error signaling | Return error as string | is_error: true flag | Return error as string |
| Streaming tool calls | ✅ Yes | ✅ Yes | ✅ Yes |
tool_choice control | auto/none/required/specific | auto/any/none/specific | AUTO/NONE/ANY |
| Mixed text+tools | Separate messages | ✅ Same response | Separate messages |
The Agentic Loop Pattern
For complex tasks requiring multiple tool calls, build an agentic loop:
// Universal agentic loop (works with any provider):
async function agentLoop(task: string, maxSteps = 10) {
const messages: Message[] = [{ role: 'user', content: task }];
for (let step = 0; step < maxSteps; step++) {
const response = await callLLM(messages);
if (!hasToolCalls(response)) {
// LLM finished — no more tools needed
return extractText(response);
}
// Execute tools and add results:
const toolResults = await executeAllTools(response);
messages.push(assistantMessage(response));
messages.push(...toolResultMessages(toolResults));
// Safety: check if we're stuck in a loop
if (isRepeatingCalls(messages)) {
throw new Error('Agent stuck in tool call loop');
}
}
throw new Error(`Agent exceeded ${maxSteps} steps`);
}
Schema Portability: Writing Once, Running Everywhere
Because all three providers use JSON Schema, you can abstract tool definitions:
// Define once:
const weatherTool = {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City and country' },
unit: { type: 'string', enum: ['celsius', 'fahrenheit'] },
},
required: ['location'],
},
};
// Convert for OpenAI:
const openAITool = {
type: 'function' as const,
function: { ...weatherTool, strict: true },
};
// Convert for Anthropic (just rename 'parameters' to 'input_schema'):
const anthropicTool = {
name: weatherTool.name,
description: weatherTool.description,
input_schema: weatherTool.parameters,
};
// Convert for Gemini (same schema, different container):
const geminiFunctionDeclaration = weatherTool;
This is exactly what the Vercel AI SDK and LangChain do — define tools once, run on any provider.
When to Choose Each
Choose OpenAI tools if:
- You need
strict: truefor guaranteed schema compliance (critical for production) - Your team is already using the OpenAI ecosystem
- You need the widest SDK/framework support (LangChain, Vercel AI, etc.)
Choose Anthropic tools if:
- You want mixed text + tool response in same message (useful for transparency)
- You need Computer Use for desktop automation
- You find Claude's reasoning around tool selection more predictable
Choose Gemini tools if:
- You need native Google Search grounding
- You're building on Google Cloud infrastructure
- Your use case requires Google Maps, Gmail, Calendar integrations
Tool Design Best Practices
How you define tools affects how reliably the LLM calls them. Well-designed tool definitions lead to correct, efficient function calls; poorly designed definitions lead to wrong tools being called, missing arguments, or unnecessary calls.
Write descriptions from the LLM's perspective. The LLM reads your tool description to decide when to call it. "Get weather data" is less useful than "Get current weather conditions including temperature, humidity, and forecast for a given city. Use when the user asks about current weather, whether they need an umbrella, or temperature in a specific location." The description should explain both what the tool does and when to use it.
Narrow the parameter surface. Every optional parameter increases the decision space the LLM navigates when deciding how to call the function. Prefer fewer, well-named parameters over many optional ones. If a parameter is almost always needed, make it required. If a parameter is only needed in rare cases, consider whether it should be a separate tool instead.
Return structured, informative results. Tool results feed back into the LLM's context. Results that are clearly structured (JSON with labeled fields) produce more accurate LLM synthesis than raw text or large blobs. Include the information the LLM needs to answer the user's question, but avoid returning large amounts of irrelevant data — every token in the tool result counts against your context window and increases cost.
Avoid tool names that overlap in purpose. When you have multiple similar tools (e.g., search_customers and find_customer_by_email), the LLM must reason about which to call in each situation. Ambiguous tool sets lead to inconsistent routing — the same user query might call different tools on different runs. Consolidate overlapping tools or make their use cases unambiguously distinct in their descriptions.
Include examples in descriptions for complex tools. For tools with complex argument schemas or subtle use cases, adding a one-line example in the description significantly improves call accuracy: "Example: {location: 'Tokyo, Japan', unit: 'celsius'} for weather in Tokyo." This is especially useful for tools that consume structured formats (SQL, JMESPath queries, regex patterns) where the LLM needs to generate a specific syntax.
Handle tool_choice: 'required' carefully. Forcing the LLM to always use a tool prevents it from answering questions it already knows the answer to. Use required only when you genuinely need the tool called on every turn — like a logging or analytics tool. For most use cases, auto gives the LLM appropriate discretion and allows it to answer simple follow-up questions without unnecessary tool overhead.
Version your tool schemas alongside your API. Tool definitions are part of your API surface — if you change a parameter name, rename a tool, or modify a required field, any system prompt or conversation history referencing the old tool name will break. Treat tool schema changes with the same versioning discipline as REST API changes: add new parameters as optional before making them required, keep deprecated tools available for a migration period, and document changes in your changelog.
Production Considerations
Deploying function calling in production requires thinking beyond the happy path. Several operational patterns that matter at scale:
Tool timeouts. LLM API calls have their own timeouts, but your tool functions may run longer than expected — database queries, external API calls, file processing. Implement explicit timeouts on every tool function and return a structured error when the timeout is hit. The LLM can then decide how to handle the timeout (retry, inform the user, fall back to a different approach). Without explicit tool timeouts, a single slow function call can block the entire conversation.
Tool call logging for debugging. In production, log every tool call: the tool name, arguments, execution time, and result (truncated if large). When a user reports that the AI gave a wrong answer, the tool call log reveals whether the AI called the wrong function, passed wrong arguments, or received incorrect data from the function. This is the equivalent of distributed tracing for AI agents — you can't debug production issues without it.
Cost attribution per tool. Tool calls consume significant tokens — the tool schema definition, the function call in the response, and the function result all contribute to token count. For applications where AI cost management matters, log token usage per tool call and aggregate by tool name. Some tools (those that return large results) may account for a disproportionate share of costs and can be optimized by summarizing results before returning them to the LLM.
Retry on tool failure. When a tool returns an error, you have two options: propagate the error to the LLM (which can try to work around it) or retry the tool call. For transient failures (network timeouts, rate-limited external APIs), retry with backoff before informing the LLM. For permanent failures (malformed input, resource not found), inform the LLM immediately so it can take a different approach or explain the limitation to the user.
Testing Function Calling
Unit testing tool functions is straightforward — test the function in isolation with representative inputs. The harder testing challenge is testing whether the LLM calls the right tool with the right arguments given various user inputs.
The most practical approach is prompt regression testing: maintain a set of representative user queries with expected tool call behavior, and run the LLM against each one to verify correct tool selection. When a new tool is added or a prompt is changed, these tests catch regressions where the LLM's routing behavior changes unexpectedly. Tools like PromptFoo and LangSmith support this pattern with eval harnesses that run assertions against LLM outputs.
For complex agentic workflows with multiple sequential tool calls, snapshot testing of the entire conversation thread (messages + tool calls + results) provides broad coverage but breaks when the LLM's reasoning or phrasing changes slightly. A more robust approach is testing the final outcome (the right data was returned, the right action was taken) rather than the exact sequence of tool calls used to get there.
Methodology
API syntax and feature capabilities based on OpenAI API documentation (assistants and responses API), Anthropic API documentation, and Google AI Studio documentation as of March 2026. strict: true mode for OpenAI was introduced in August 2024 and is available in gpt-4o-2024-08-06 and later. Gemini Google Search grounding available in Gemini 2.0 Flash and Gemini 1.5 Pro. Parallel function calling available across all three providers for all their current-generation models. Schema portability examples verified against published SDK versions: openai@4.x, @anthropic-ai/sdk@0.30.x, @google/generative-ai@0.21.x.
Compare AI API capabilities at APIScout.
Compare OpenAI and Anthropic on APIScout.
Related: Anthropic MCP vs OpenAI Plugins vs Gemini Extensions, OpenAI vs Anthropic vs Gemini Batch API 2026, Anthropic vs Google Gemini