TL;DR
Migrating from GPT-4 to Claude takes about a day for a single workflow. Steps:
- Wrap your model call in a thin adapter so the rest of your code stays unchanged.
- Translate your system prompt — Claude responds best to direct, instruction-style prompts.
- Map tool definitions to Claude’s
toolsarray (similar shape, different field names). - Replace
function_callwithtool_useblocks in the response handler. - Run your eval suite on both providers before shifting traffic. Aim for ≥ parity on your top 20 cases.
Why teams switch
Claude (Anthropic) is often picked over GPT-4 for: longer context windows, stronger long-form writing, more conservative refusal behavior, and lower per-token cost on similar capability tiers. GPT-4 still leads on certain code-completion benchmarks. Test, don’t assume.
Step 1: the adapter pattern
Don’t sprinkle two SDKs through your codebase. Define one interface:
class LLMClient(Protocol):
def chat(self, messages, tools=None, max_tokens=1024) -> ChatResult: ...
class OpenAIClient(LLMClient): ...
class AnthropicClient(LLMClient): ...
Make the input/output types identical. Internally, each implementation translates to/from its provider’s wire format. Now switching providers is a one-line change — and you can A/B test by routing traffic through both adapters.
Step 2: prompt translation
GPT-4 prompts often start with “You are a helpful assistant…”. Claude responds better to:
- A clear role statement:
"You are a senior product manager preparing release notes." - Explicit output format requirements at the top.
- Examples in
<example>...</example>tags rather than free-text. Claude follows XML-bracketed structure very reliably.
Step 3: tool-use schema
OpenAI tool format:
{ "type": "function", "function": { "name": "x", "parameters": {...} } }
Claude tool format:
{ "name": "x", "description": "...", "input_schema": {...} }
Two practical differences:
- Claude requires a non-empty
descriptionon every tool — it uses it for routing. - Claude’s response wraps tool calls in
tool_usecontent blocks with explicit IDs you must echo back intool_resultblocks.
Step 4: response parsing
GPT-4 returns either content or function_call. Claude returns an array of content blocks, each typed text, tool_use, or thinking. Iterate the array and dispatch by type.
Step 5: eval before flipping traffic
Before pointing production at Claude, run your eval suite on both. Score each output on:
- Correctness (matches expected facts)
- Format compliance (JSON parses, hits the schema)
- Tone (matches brand voice)
- Cost & latency
Don’t ship if you don’t have ≥ parity. Anthropic’s evals library and tools like Promptfoo make this two hours of work.