Back to news

How-to Guide

How to use function calling with GPT-5.5.

Master GPT-5.5's function calling capabilities with parallel tool execution, structured outputs, and error handling for building robust agent workflows.

AI Kick Start editorial image for How to use function calling with GPT-5.5.

Decision

Pilot

Choose one repeated workflow with a visible owner and enough weekly volume to prove the saving.

Risk to watch

Faster mistakes

Keep a review queue and scoped credentials until the workflow has survived real production runs.

Proof to collect

Time baseline

Measure the manual run time, exception rate, approval time, and weekly hours returned.

TL;DR

TL;DR: GPT-5.5 (codename "Spud", $5/$30) improves function calling with native parallel tool execution, stricter schema adherence, and better error recovery. OpenAI lists the API context window at around 1M tokens; the 400K figure circulating in some write-ups applies specifically to GPT-5.5 running inside Codex, not the general API. This guide covers the function calling API end to end: tool definitions, parallel execution, handling responses, and building agent loops.

Key takeaways

  • Model: GPT-5.5, $5/$30; API context ~1M tokens (400K applies to Codex)
  • Parallel: Native parallel function calling built-in
  • Strict mode: Enforces schema compliance; rejects invalid parameters
  • Types: Use `response_format: { type: 'json_schema' }` for typed outputs
  • Error handling: Model can read tool errors and retry within the loop

Analysis

When OpenAI shipped GPT-5.5 on 23 April 2026 under the internal codename "Spud", the headline numbers were the usual fare: a new model, fresh pricing at $5 input and $30 output per million tokens, and the predictable round of benchmark bragging (Axios). For most business teams that reads as noise. The part that actually changes what you can build is quieter: how the model calls your tools.

Function calling is the plumbing behind nearly every useful AI agent. It is how a model stops talking and starts doing, checking a calendar, pulling a customer record, running a calculation, booking a flight. If that plumbing is flaky, your agent hallucinates parameters, calls the wrong function, or quietly fails in ways nobody notices until a customer does. GPT-5.5 tightens three things here: it can fire off several tool calls at once, it can be forced to stick to your exact data schema, and it can read back its own errors and try again.

None of that is magic, and some of it is marketing. Below is the working code for each piece, plus the caveats the launch posts skipped, including one compatibility gotcha that bites people who try to use two of these features together.

Analysis

Prerequisites

  • OpenAI API key with GPT-5.5 access
  • OpenAI Python SDK (a recent version; 1.40 or later is a safe floor, though OpenAI does not pin an exact minimum for GPT-5.5)
  • Python 3.10+

Step-by-Step Framework

Step 1: Basic Function Calling

# gpt55_function_calling.py
from openai import OpenAI
import json

client = OpenAI()

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. 'London'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "default": "celsius"
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_flights",
            "description": "Search for flights between two cities",
            "parameters": {
                "type": "object",
                "properties": {
                    "origin": {"type": "string"},
                    "destination": {"type": "string"},
                    "date": {"type": "string", "format": "date"},
                    "passengers": {"type": "integer", "minimum": 1, "maximum": 9}
                },
                "required": ["origin", "destination", "date"]
            }
        }
    }
]

# Call with tools
response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{
        "role": "user",
        "content": "What's the weather in Tokyo and are there flights from London to Tokyo on August 15th?"
    }],
    tools=tools,
    tool_choice="auto"
)

# Handle tool calls
for tool_call in response.choices[0].message.tool_calls or []:
    function_name = tool_call.function.name
    arguments = json.loads(tool_call.function.arguments)
    print(f"Called: {function_name}({arguments})")

The model does not run your functions. It tells you which ones to run and with what arguments. You do the running, then hand the results back. The pattern in OpenAI's function calling guide has not changed for GPT-5.5; what has changed is how reliably the model picks the right tool and fills in the right fields.

Step 2: Parallel Execution

Ask GPT-5.5 a question that needs two unrelated lookups and it will request both tool calls in one turn. That is parallel function calling, on by default in recent OpenAI models, and it saves you a round trip:

# parallel_execution.py
import asyncio

async def execute_tool(tool_call):
    name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    if name == "get_weather":
        return await get_weather(**args)
    elif name == "search_flights":
        return await search_flights(**args)
    # ... more tools

# Execute all tool calls in parallel
tool_calls = response.choices[0].message.tool_calls
results = await asyncio.gather(*[execute_tool(tc) for tc in tool_calls])

# Send results back
for tool_call, result in zip(tool_calls, results):
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result)
    })

# Get final response
final = client.chat.completions.create(
    model="gpt-5.5",
    messages=messages
)

The model returns the calls; asyncio.gather runs your implementations at the same time. For a weather check and a flight search that touch different APIs, that is the difference between two sequential waits and one.

Step 3: Strict Schema Mode

Setting strict: True forces the model's tool calls to match your schema exactly. Per OpenAI's docs, it needs additionalProperties: False and every property marked required, and in return you stop getting calls with invented fields or wrong types:

# strict_mode.py
response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{"role": "user", "content": "Get weather in Paris"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "include_forecast": {"type": "boolean"}
                },
                "required": ["location"],
                "additionalProperties": False  # Reject extra fields
            },
            "strict": True  # GPT-5.5 strict mode
        }
    }],
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)

In production this is the setting that stops a malformed argument from crashing your downstream code. Turn it on.

Step 4: Build an Agent Loop

A single tool call is rarely the whole job. An agent loop keeps the conversation going: the model calls a tool, you return the result, the model decides what to do next, and so on until it has an answer or hits a ceiling you set.

# agent_loop.py
class GPT55Agent:
    def __init__(self):
        self.client = OpenAI()
        self.tools = self._register_tools()
        self.messages = []

    def _register_tools(self):
        return [get_weather_tool, search_flights_tool, calculate_tool]

    def run(self, user_input: str, max_iterations=10):
        self.messages.append({"role": "user", "content": user_input})

        for _ in range(max_iterations):
            response = self.client.chat.completions.create(
                model="gpt-5.5",
                messages=self.messages,
                tools=self.tools
            )

            message = response.choices[0].message

            # If no tool calls, return the response
            if not message.tool_calls:
                return message.content

            # Add assistant message with tool calls
            self.messages.append(message)

            # Execute tool calls
            for tool_call in message.tool_calls:
                result = self.execute_tool(tool_call)
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(result)
                })

        return "Max iterations reached"

    def execute_tool(self, tool_call):
        name = tool_call.function.name
        args = json.loads(tool_call.function.arguments)

        try:
            if name == "get_weather":
                return get_weather(**args)
            elif name == "search_flights":
                return search_flights(**args)
            elif name == "calculate":
                return {"result": eval(args["expression"])}
            else:
                return {"error": f"Unknown tool: {name}"}
        except Exception as e:
            return {"error": str(e)}  # Model will see error and retry

Two things to notice. The max_iterations cap is your circuit breaker, without it a confused agent can loop indefinitely and burn through tokens. And the try/except returns the error text back to the model instead of swallowing it. OpenAI documents this tool-result-handling pattern, and GPT-5.5 generally reads the error and adjusts on the next pass. OpenAI describes recovery as improved in this release; treat that as a reasonable claim rather than a benchmarked one, and test it against your own tools.

One caution on the example: eval(args["expression"]) runs arbitrary code from whatever the model passes in. It is fine for a demo. Do not ship it. Use a real expression parser if you need a calculator tool.

Step 5: Structured Outputs

When you want the final answer back as typed data rather than prose, response_format with a JSON schema gives you output that conforms to the schema you define. OpenAI's Structured Outputs feature pairs nicely with Pydantic:

# structured_outputs.py
from pydantic import BaseModel

class FlightSearchResult(BaseModel):
    flights: list
    cheapest_price: float
    fastest_duration: int
    recommendations: list[str]

response = client.chat.completions.create(
    model="gpt-5.5",
    messages=[{
        "role": "user",
        "content": "Search flights NYC to LA tomorrow and summarise"
    }],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "flight_result",
            "schema": FlightSearchResult.model_json_schema()
        }
    }
)

result = FlightSearchResult.model_validate_json(
    response.choices[0].message.content
)
print(result.cheapest_price)  # Typed access

One constraint the launch coverage glossed over: OpenAI's docs state Structured Outputs via response_format is not compatible with parallel function calls. So you cannot have the model fire several tools at once and also force the final message into a JSON schema in the same call. Plan your agent so the parallel tool phase and the structured-output phase happen in separate steps.

Do/Don't

DoDon't
Define clear descriptions for each tool parameterLeave parameters without descriptions
Handle tool errors and return them to the modelSwallow errors silently
Use strict mode for productionSkip strict mode and get schema violations
Set max_iterations to prevent infinite loopsLet agents run without iteration limits
Validate all tool outputs before sending to modelPass raw exception traces directly

Cost Comparison

FeatureGPT-5.5GPT-5.5 InstantClaude Sonnet 4.6
Function callingNative parallelBasicGood
Cost per 1M input$5.00see note$3.00
Context~1M (400K in Codex)~1M1M
Strict modeYesNoSee note

A few corrections to numbers that get repeated without checking. GPT-5.5 Instant is real, it became the default ChatGPT model on 5 May 2026, but the often-quoted $0.50 input price does not hold up: llm-stats lists Instant at the same $5/$30 as GPT-5.5, so treat any cheaper figure as unconfirmed. On context, the 400K number is the limit when GPT-5.5 runs inside Codex; the general API window is closer to 1M (llm-stats). And the "strict mode: No" mark against Claude Sonnet 4.6 is an oversimplification, Anthropic's tool use supports structured, schema-style output even if it does not carry the exact strict: true flag, so don't read that column as a hard capability gap. Sonnet 4.6 is confirmed at $3 input per million with a 1M-token context.

Conclusion

GPT-5.5's function calling is genuinely strong, and for tool-heavy agent work it is a sensible default. Parallel calls cut round trips, strict mode keeps malformed arguments out of your code, and returning errors into the loop lets the model recover instead of stalling. Whether it is the single most capable model for function calling is the kind of claim every vendor makes at launch; Anthropic's Sonnet and Opus models rate highly on the same workloads, so benchmark it against your own use case before committing. The $5/$30 pricing earns its keep when tool-calling accuracy is what your users feel. Turn on strict mode, always hand errors back to the model, cap your iterations, and remember that structured outputs and parallel calls do not mix in a single request.

Source trail

Primary references to keep this briefing grounded

AI and automation information changes quickly. Use these official or primary references to verify the claims, pricing, product behaviour, and compliance details before committing budget or production data.

What to do next

  1. Pick one repeated workflow with a clear owner and weekly volume.
  2. Automate the preparation step first, then keep human approval for important actions.
  3. Measure time saved, errors reduced, and response speed for four weeks.

Want help applying this? Explore our AI automation services.

AI Kick Start is an Illawarra-based AI studio in Figtree, helping businesses across Wollongong, Shellharbour and Kiama and right across Australia put AI to work.

Explore with AI

Use the article as a decision prompt

Summarise this AI Kick Start article for an Australian business owner. Focus on the useful decision, the risks, and the first practical next step: How to use function calling with GPT-5.5

Turn this into a practical roadmap.

Use the guide as a starting point, then map the first workflow worth building.

Book an AI strategy call