Claude Agent SDK

Fit analysis for productionising the platform's agentic architecture. Covers current vs proposed architecture, tool migration to MCP, multi-provider support via LiteLLM, phased implementation, and risk assessment.

Overview

This is an analysis of how the Claude Agent SDK fits into the Condelo RAG platform — what it would replace, what it would improve, and how to implement it. The SDK provides the same tools, agent loop, and context management that power Claude Code, programmable in TypeScript.

Condelo RAG has a mature, custom-built agentic architecture with 6 distinct LLM interfaces, a hand-rolled tool loop, 13+ tools, multi-provider LLM abstraction, and streaming SSE delivery. As the platform productionises, the SDK could replace, simplify, or improve the agentic parts of this system.

Key Concepts

  • Built-in agent loop — No manual while (round < MAX_ROUNDS). The SDK handles tool dispatch, result collection, and continuation automatically.
  • Custom tools via MCP — Domain-specific tools are defined as in-process MCP servers using createSdkMcpServer() with Zod schemas. Existing tool handlers slot in directly.
  • Sessions — Persistent conversation state with continue, resume, and fork. Maps well to exploration threads.
  • Subagents — Named agents with scoped tool sets. Replaces the current manual sub-agent spawning in analyze_document.
  • Hooks — Lifecycle events (PreToolUse, PostToolUse, Stop) for logging, metrics, audit, and streaming.
  • Cost governancemaxTurns and maxBudgetUsd as built-in guardrails. Per-session token usage and USD cost tracking.
  • LiteLLM proxy pattern — By pointing ANTHROPIC_BASE_URL at a LiteLLM proxy, the SDK's Anthropic-format requests are translated to any backend (OpenAI, Mistral, Ollama, etc.).

Current Architecture

The platform has 6 distinct LLM interfaces, each with different loop complexity and tool requirements:

InterfaceFileLoopToolsOutput
Chatapps/api/src/services/chat.ts5-round tool loop, streaming13 toolsStreamed markdown + surface blocks
Research Agentapps/data-plane/src/services/research-agent.ts5-round tool loop, structuredsearch, queryInferenceOutputItem[] via zodResponseFormat
Exploration Converseapps/data-plane/src/services/exploration-converse-agent.tsMulti-turn within clustersearch, query, generate_surfaceMessages + suggested paths
Exploration Prepapps/data-plane/src/services/exploration-prep-agent.tsSingle LLM callNoneCluster definitions
Story Pipelineapps/data-plane/src/services/story-*.ts (5 agents)Single call eachNonePyramid, storyline, surfaces
Sub-Agentapps/data-plane/src/services/sub-agent.ts3-round tool loopsearch_within_documentSummary + chunks

Current Flow

User Request
    │
    ▼
┌─────────────────────────────┐
│  Hono API / SSE Endpoint    │
│  (apps/api)                 │
└─────────┬───────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  Hand-rolled Tool Loop      │
│  while (round < MAX_ROUNDS) │
│    ├─ LLM call (OpenAI SDK) │
│    ├─ Parse tool_calls       │
│    ├─ executeTool()          │
│    └─ Append results         │
└─────────┬───────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  Tool Registry              │
│  registerTool() / execute() │
│  13 tools, availability     │
│  predicates, callbacks      │
└─────────────────────────────┘

Proposed Flow (Agent SDK)

User Request
    │
    ▼
┌─────────────────────────────┐
│  Hono API / SSE Endpoint    │
│  (apps/api)                 │
└─────────┬───────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  Claude Agent SDK           │
│  query({                    │
│    prompt,                  │
│    options: {               │
│      mcpServers,            │
│      maxTurns,              │
│      maxBudgetUsd,          │
│      hooks                  │
│    }                        │
│  })                         │
└─────────┬───────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  MCP Tool Server            │
│  createSdkMcpServer({       │
│    tools: [                 │
│      search_documents,      │
│      query_documents,       │
│      generate_surface, ...  │
│    ]                        │
│  })                         │
└─────────────────────────────┘

Fit Assessment

Chat — Good Fit

Current: ~200 lines of manual tool loop, dispatch, context budgeting, streaming assembly.

With SDK: The agent loop replaces while (round < MAX_TOOL_ROUNDS) entirely. Custom tools register as MCP tools. Streaming via async generators pipes into Hono's streamSSE.

What improves: Eliminates manual loop/dispatch code. Gains built-in streaming, cost tracking, maxTurns/maxBudgetUsd guardrails, and hooks for observability. Subagent support means analyze_document becomes native rather than hand-rolled.

Caveats: Dynamic system prompt (DB schema, feed context) needs rethinking — SDK auto-generates tool descriptions from MCP definitions. Surface block handling (nudging toward generate_surface) moves into hooks or post-processing.

Exploration Converse — Good Fit

The most "agentic" interface. Multi-turn conversation within inference clusters, with tools for search, query, and surface generation.

With SDK: Session management (continue/resume/fork) maps well to exploration threads. Subagents handle cross-cluster research. Progressive disclosure becomes natural.

Caveats: Rich exploration state (clusters, connections, deltas) stays in the DB — SDK sessions don't replace this. Suggested paths and research tasks need custom post-processing.

Research Agents — Moderate Fit

Current: BullMQ worker with 5-round tool loop and structured output via zodResponseFormat.

With SDK: Replaces tool loop. Cost tracking per agent run is a strong operational benefit. Worker wrapper stays — just swap the inner engine.

Caveats: Verify SDK's output_format handles complex nested schemas (InferenceOutputItem with evidence arrays, surfaceHints). Feed scoping needs to pass context into MCP tool implementations.

Story Pipeline — Poor Fit

5 sequential agents making single structured LLM calls with no tools. Data passed in prompts. The SDK's tool loop and session management add complexity without benefit. Keep as-is.

Exploration Prep — Poor Fit

Single structured LLM call to cluster inferences. No tools. Keep as-is.

Sub-Agent — Poor Fit

3-round tool loop with one tool. Too simple to justify SDK overhead. Keep as-is.

Tool Migration

Existing Tools → MCP

All 13 domain-specific tools would be implemented as custom MCP tools:

const condeloTools = createSdkMcpServer({
  name: "condelo",
  tools: [
    tool("search_documents", "Semantic/keyword search across documents", {
      query: z.string(),
      feed_id: z.string().optional(),
      search_type: z.enum(["semantic", "keyword", "hybrid"]).optional(),
    }, async (args) => {
      // Existing handler logic from tools/handlers/search-documents.ts
      return { content: [{ type: "text", text: JSON.stringify(results) }] };
    }),
    // ... query_documents, generate_surface, etc.
  ]
});

SDK Built-in Replacements

Current ToolSDK EquivalentNotes
web_searchWebSearchDirect replacement
fetch_web_pageWebFetchDirect replacement
grep_documentsGrepOnly if documents are on filesystem; current implementation queries the DB

New Tools Enabled by SDK

ToolPurposeWhy
create_inferenceAgent creates inferences directlyCurrently inferences are extracted from structured output. With SDK tools, the agent could create them iteratively as it discovers insights.
compare_documentsCross-document analysisCurrently requires multiple search calls. A dedicated tool would be more efficient.
update_exploration_stateAgent updates exploration clusters/connectionsEnable the agent to actively manage exploration topology.
schedule_researchQueue async research tasksAgent identifies questions needing deeper investigation and schedules them.
memory_recallRetrieve context from previous conversationsSDK sessions handle this partially, but explicit recall of past findings would enhance continuity.

Multi-Provider Support

Less of a trade-off than initially assumed. While the SDK natively targets Claude models via the Anthropic API, it supports model flexibility through multiple mechanisms.

Native Provider Support

  • Anthropic API (direct) — Claude Opus, Sonnet, Haiku
  • AWS Bedrock — set CLAUDE_CODE_USE_BEDROCK=1 + AWS credentials
  • Google Vertex AI — set CLAUDE_CODE_USE_VERTEX=1 + GCP credentials
  • Azure AI Foundry — set CLAUDE_CODE_USE_FOUNDRY=1 + Azure credentials

LiteLLM Proxy Pattern (Any Model)

By pointing ANTHROPIC_BASE_URL at a LiteLLM proxy, the SDK's Anthropic-format requests are translated to any backend:

  • OpenAI (GPT-4o, o1, etc.)
  • Mistral
  • Ollama (fully local, air-gapped)
  • Any OpenAI-compatible endpoint

The current multi-provider flexibility is preserved — the mechanism changes from direct OpenAI SDK calls to LiteLLM-proxied Anthropic-format calls. Agent code stays identical regardless of backend model.

Deployment Options

ScenarioBackendData Leaves Infra?
StandardAnthropic APIYes (to Anthropic)
Cloud-regulatedBedrock / Vertex / AzureNo (stays in cloud contract)
Air-gapped / offlineOllama + LiteLLMNo (fully local)

What This Means for Condelo

The current multi-provider system (LLM_PROVIDER env var → OpenAI/OpenRouter/Ollama/LM Studio) would be replaced by:

  • Agentic workloads: Agent SDK → ANTHROPIC_BASE_URL pointing at either Anthropic directly or a LiteLLM proxy for other models
  • Non-agentic workloads (embeddings, story pipeline, title generation): Keep raw OpenAI SDK — these don't benefit from the agent loop

One SDK for agentic flows (with model flexibility via proxy), one SDK for simple completions.

Implementation Approach

Phase 1: Proof of Concept (1–2 weeks)

┌─────────────────────────────────────────────┐
│  Standalone script                          │
│  ├─ Register 13 tools as MCP               │
│  ├─ Run chat flow via SDK agent loop        │
│  ├─ Compare: quality, latency, cost         │
│  └─ Validate structured output for schemas  │
└─────────────────────────────────────────────┘
  1. Install @anthropic-ai/claude-agent-sdk
  2. Create standalone script running chat flow via SDK
  3. Register existing tools as MCP tools
  4. Compare quality, latency, cost, tool-use patterns against current implementation
  5. Validate structured output support for inference schemas

Phase 2: Chat Migration (2–3 weeks)

  1. Create apps/api/src/services/chat-agent-sdk.ts alongside existing chat.ts
  2. Implement MCP tool server for all 13 tools
  3. Build SSE adapter (SDK async generator → Hono streamSSE)
  4. Add feature flag: CHAT_ENGINE=agent-sdk|legacy
  5. Migrate system prompt rules into tool descriptions + MCP server metadata
  6. Add hooks for LangSmith tracing and event bus integration
  7. A/B test against existing implementation

Phase 3: Research Agent Migration (2 weeks)

  1. Replace research-agent.ts tool loop with SDK agent
  2. Validate structured output (InferenceOutputItem[]) works with SDK
  3. Keep BullMQ worker wrapper — swap the inner engine
  4. Add cost tracking per agent run

Phase 4: Exploration Migration (2–3 weeks)

  1. Migrate exploration-converse to SDK with session support
  2. Implement subagent pattern for cross-cluster research
  3. Integrate SDK sessions with exploration_messages table
  4. Preserve suggested paths and research task extraction

Phase 5: New Agentic Capabilities (Ongoing)

  1. Add new tools (create_inference, compare_documents, etc.)
  2. Implement subagent patterns for complex analysis
  3. Explore SDK guardrails for production safety
Phase 1          Phase 2          Phase 3          Phase 4          Phase 5
PoC              Chat             Research         Exploration      New Tools
──────────┐  ┌──────────────┐  ┌────────────┐  ┌──────────────┐  ┌──────────
  1-2 wks │  │   2-3 wks    │  │   2 wks    │  │   2-3 wks    │  │ ongoing
  SDK +   │  │  chat-agent- │  │  research-  │  │  explore +   │  │ create_
  MCP     │  │  sdk.ts +    │  │  agent.ts   │  │  sessions +  │  │ inference
  tools + │  │  feature     │  │  swap inner │  │  subagents   │  │ compare_
  bench-  │  │  flag +      │  │  engine     │  │              │  │ docs ...
  mark    │  │  A/B test    │  │             │  │              │  │
──────────┘  └──────────────┘  └────────────┘  └──────────────┘  └──────────

Risks & Mitigations

RiskSeverityMitigation
LiteLLM proxy reliabilityMediumLiteLLM is the bridge to non-Claude models. Test tool-calling fidelity across models. Have fallback to direct provider SDKs if needed.
Cost increaseMediumClaude API pricing vs OpenAI/OpenRouter. Use maxBudgetUsd guardrails. Haiku for quick tasks, Sonnet for agents. LiteLLM proxy to cheaper models for cost-sensitive workloads.
LatencyMediumSDK adds overhead (session management, tool search). LiteLLM proxy adds a hop. Benchmark against current implementation.
Loss of fine-grained controlMediumCurrent implementation has custom context budgeting (12K char truncation), visualisation nudging, exhaustion tracking. Some moves into hooks, some may need workarounds.
Tool-calling quality varianceMediumDifferent models handle tool use with varying quality. Claude excels at complex multi-tool reasoning. Via LiteLLM, weaker models may make poor tool choices. Test per-model and set minimum capability requirements.
Structured output compatibilityMediumVerify SDK's output_format supports complex Zod schemas before committing. Test with LiteLLM-proxied models too.
LangSmith compatibilityLowSDK hooks can forward events to LangSmith. May need custom integration.
Session storageLowSDK persists sessions to local filesystem. For multi-instance deployment, need shared storage or custom session backend.
Breaking changesLowSDK is new — API may evolve. Pin versions, abstract behind internal interfaces.

Recommendation

Adopt the Claude Agent SDK for all agentic workloads. The LiteLLM proxy pattern eliminates the model lock-in concern, making full migration of agentic interfaces viable.

Do Migrate

  • Chat — biggest win. Eliminates ~200 lines of loop/dispatch code. Gains streaming, sessions, cost tracking, and guardrails.
  • Exploration Converse — natural fit for SDK sessions and subagents. The most agentic feature in the platform.
  • Research Agents — moderate win. Simplifies tool loop, adds cost tracking.

Don't Migrate

  • Story Pipeline — not agentic, just structured completions. Keep raw OpenAI SDK.
  • Exploration Prep — single LLM call, no tools. Keep raw OpenAI SDK.
  • Sub-Agent — too simple to justify SDK overhead. Keep as-is.
  • Embeddings — not applicable. Keep OpenAI SDK.
  • Title generation / quick tasks — simple completions. Keep raw OpenAI SDK.

Decision Framework

  1. Do you want a production-grade agent framework or prefer maintaining custom loops? The SDK eliminates ~200+ lines of hand-rolled orchestration per agent type, but you trade direct control for framework conventions.

  2. Is the LiteLLM proxy acceptable for non-Claude models? It adds a translation layer but preserves model flexibility. Test with your specific models (especially tool-calling fidelity) before committing.

  3. Do you need the operational benefits now? Cost tracking, session management, guardrails, hooks, and subagents are production-grade features you'd otherwise build yourself.

  4. Are you comfortable with Anthropic SDK format as the wire protocol? Even with LiteLLM, the SDK sends Anthropic-format requests. Non-agentic workloads stay on OpenAI SDK, so you'll have two API formats.

  5. Do offline/air-gapped deployments matter? If yes, the Ollama + LiteLLM pattern is a strong advantage — same agent code runs fully local with no external calls.

Start with: A proof-of-concept on the chat interface behind a feature flag. This is the highest-traffic interface with the most complex tool loop — if the SDK works well here, it validates the approach for everything else.

Code Reference

FileDescription
apps/api/src/services/chat.tsCurrent chat tool loop (migration target)
apps/api/src/tools/registry.tsCurrent tool registry (registerTool, getAvailableTools, executeTool)
apps/api/src/tools/types.tsToolContext, ToolCallbacks, ToolEntry, ToolResult types
apps/data-plane/src/services/research-agent.tsResearch agent execution (Phase 3 target)
apps/data-plane/src/services/exploration-converse-agent.tsExploration agent (Phase 4 target)
apps/data-plane/src/services/sub-agent.tsSub-agent (keep as-is)
apps/data-plane/src/services/story-*.tsStory pipeline agents (keep as-is)

Relationships

  • Tool Registry — All 13 tools would migrate to MCP tool definitions
  • Agents — Research agents are the Phase 3 migration target
  • Surfaces & Experiencesgenerate_surface tool migrates to MCP; surface block handling moves to hooks
  • Inferences & Signals — Structured output for InferenceOutputItem[] needs SDK compatibility validation
  • Feeds — Feed scoping for agent tool access passes through MCP tool context

Making the unknown, known.

© 2026 Condelo. All rights reserved.