TL;DR

Migrating from GPT-4 to Claude takes about a day for a single workflow. Steps:

  1. Wrap your model call in a thin adapter so the rest of your code stays unchanged.
  2. Translate your system prompt — Claude responds best to direct, instruction-style prompts.
  3. Map tool definitions to Claude’s tools array (similar shape, different field names).
  4. Replace function_call with tool_use blocks in the response handler.
  5. Run your eval suite on both providers before shifting traffic. Aim for ≥ parity on your top 20 cases.

Why teams switch

Claude (Anthropic) is often picked over GPT-4 for: longer context windows, stronger long-form writing, more conservative refusal behavior, and lower per-token cost on similar capability tiers. GPT-4 still leads on certain code-completion benchmarks. Test, don’t assume.

Step 1: the adapter pattern

Don’t sprinkle two SDKs through your codebase. Define one interface:

class LLMClient(Protocol):
    def chat(self, messages, tools=None, max_tokens=1024) -> ChatResult: ...

class OpenAIClient(LLMClient): ...
class AnthropicClient(LLMClient): ...

Make the input/output types identical. Internally, each implementation translates to/from its provider’s wire format. Now switching providers is a one-line change — and you can A/B test by routing traffic through both adapters.

Step 2: prompt translation

GPT-4 prompts often start with “You are a helpful assistant…”. Claude responds better to:

Step 3: tool-use schema

OpenAI tool format:

{ "type": "function", "function": { "name": "x", "parameters": {...} } }

Claude tool format:

{ "name": "x", "description": "...", "input_schema": {...} }

Two practical differences:

  1. Claude requires a non-empty description on every tool — it uses it for routing.
  2. Claude’s response wraps tool calls in tool_use content blocks with explicit IDs you must echo back in tool_result blocks.

Step 4: response parsing

GPT-4 returns either content or function_call. Claude returns an array of content blocks, each typed text, tool_use, or thinking. Iterate the array and dispatch by type.

Step 5: eval before flipping traffic

Before pointing production at Claude, run your eval suite on both. Score each output on:

Don’t ship if you don’t have ≥ parity. Anthropic’s evals library and tools like Promptfoo make this two hours of work.

April 24, 2026 Musketeers Tech Musketeers Tech
← Back