Claude Agent SDK — Condelo Platform

Overview

This is an analysis of how the Claude Agent SDK fits into the Condelo RAG platform — what it would replace, what it would improve, and how to implement it. The SDK provides the same tools, agent loop, and context management that power Claude Code, programmable in TypeScript.

Condelo RAG has a mature, custom-built agentic architecture with 6 distinct LLM interfaces, a hand-rolled tool loop, 13+ tools, multi-provider LLM abstraction, and streaming SSE delivery. As the platform productionises, the SDK could replace, simplify, or improve the agentic parts of this system.

Key Concepts

Built-in agent loop — No manual while (round < MAX_ROUNDS). The SDK handles tool dispatch, result collection, and continuation automatically.
Custom tools via MCP — Domain-specific tools are defined as in-process MCP servers using createSdkMcpServer() with Zod schemas. Existing tool handlers slot in directly.
Sessions — Persistent conversation state with continue, resume, and fork. Maps well to exploration threads.
Subagents — Named agents with scoped tool sets. Replaces the current manual sub-agent spawning in analyze_document.
Hooks — Lifecycle events (PreToolUse, PostToolUse, Stop) for logging, metrics, audit, and streaming.
Cost governance — maxTurns and maxBudgetUsd as built-in guardrails. Per-session token usage and USD cost tracking.
LiteLLM proxy pattern — By pointing ANTHROPIC_BASE_URL at a LiteLLM proxy, the SDK's Anthropic-format requests are translated to any backend (OpenAI, Mistral, Ollama, etc.).

Current Architecture

The platform has 6 distinct LLM interfaces, each with different loop complexity and tool requirements:

Interface	File	Loop	Tools	Output
Chat	`apps/api/src/services/chat.ts`	5-round tool loop, streaming	13 tools	Streamed markdown + surface blocks
Research Agent	`apps/data-plane/src/services/research-agent.ts`	5-round tool loop, structured	search, query	`InferenceOutputItem[]` via zodResponseFormat
Exploration Converse	`apps/data-plane/src/services/exploration-converse-agent.ts`	Multi-turn within cluster	search, query, generate_surface	Messages + suggested paths
Exploration Prep	`apps/data-plane/src/services/exploration-prep-agent.ts`	Single LLM call	None	Cluster definitions
Story Pipeline	`apps/data-plane/src/services/story-*.ts` (5 agents)	Single call each	None	Pyramid, storyline, surfaces
Sub-Agent	`apps/data-plane/src/services/sub-agent.ts`	3-round tool loop	search_within_document	Summary + chunks

Current Flow

User Request
    │
    ▼
┌─────────────────────────────┐
│  Hono API / SSE Endpoint    │
│  (apps/api)                 │
└─────────┬───────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  Hand-rolled Tool Loop      │
│  while (round < MAX_ROUNDS) │
│    ├─ LLM call (OpenAI SDK) │
│    ├─ Parse tool_calls       │
│    ├─ executeTool()          │
│    └─ Append results         │
└─────────┬───────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  Tool Registry              │
│  registerTool() / execute() │
│  13 tools, availability     │
│  predicates, callbacks      │
└─────────────────────────────┘

Proposed Flow (Agent SDK)

User Request
    │
    ▼
┌─────────────────────────────┐
│  Hono API / SSE Endpoint    │
│  (apps/api)                 │
└─────────┬───────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  Claude Agent SDK           │
│  query({                    │
│    prompt,                  │
│    options: {               │
│      mcpServers,            │
│      maxTurns,              │
│      maxBudgetUsd,          │
│      hooks                  │
│    }                        │
│  })                         │
└─────────┬───────────────────┘
          │
          ▼
┌─────────────────────────────┐
│  MCP Tool Server            │
│  createSdkMcpServer({       │
│    tools: [                 │
│      search_documents,      │
│      query_documents,       │
│      generate_surface, ...  │
│    ]                        │
│  })                         │
└─────────────────────────────┘

Fit Assessment

Chat — Good Fit

Current: ~200 lines of manual tool loop, dispatch, context budgeting, streaming assembly.

With SDK: The agent loop replaces while (round < MAX_TOOL_ROUNDS) entirely. Custom tools register as MCP tools. Streaming via async generators pipes into Hono's streamSSE.

What improves: Eliminates manual loop/dispatch code. Gains built-in streaming, cost tracking, maxTurns/maxBudgetUsd guardrails, and hooks for observability. Subagent support means analyze_document becomes native rather than hand-rolled.

Caveats: Dynamic system prompt (DB schema, feed context) needs rethinking — SDK auto-generates tool descriptions from MCP definitions. Surface block handling (nudging toward generate_surface) moves into hooks or post-processing.

Exploration Converse — Good Fit

The most "agentic" interface. Multi-turn conversation within inference clusters, with tools for search, query, and surface generation.

With SDK: Session management (continue/resume/fork) maps well to exploration threads. Subagents handle cross-cluster research. Progressive disclosure becomes natural.

Caveats: Rich exploration state (clusters, connections, deltas) stays in the DB — SDK sessions don't replace this. Suggested paths and research tasks need custom post-processing.

Research Agents — Moderate Fit

Current: BullMQ worker with 5-round tool loop and structured output via zodResponseFormat.

With SDK: Replaces tool loop. Cost tracking per agent run is a strong operational benefit. Worker wrapper stays — just swap the inner engine.

Caveats: Verify SDK's output_format handles complex nested schemas (InferenceOutputItem with evidence arrays, surfaceHints). Feed scoping needs to pass context into MCP tool implementations.

Story Pipeline — Poor Fit

5 sequential agents making single structured LLM calls with no tools. Data passed in prompts. The SDK's tool loop and session management add complexity without benefit. Keep as-is.

Exploration Prep — Poor Fit

Single structured LLM call to cluster inferences. No tools. Keep as-is.

Sub-Agent — Poor Fit

3-round tool loop with one tool. Too simple to justify SDK overhead. Keep as-is.

Tool Migration

Existing Tools → MCP

All 13 domain-specific tools would be implemented as custom MCP tools:

const condeloTools = createSdkMcpServer({
  name: "condelo",
  tools: [
    tool("search_documents", "Semantic/keyword search across documents", {
      query: z.string(),
      feed_id: z.string().optional(),
      search_type: z.enum(["semantic", "keyword", "hybrid"]).optional(),
    }, async (args) => {
      // Existing handler logic from tools/handlers/search-documents.ts
      return { content: [{ type: "text", text: JSON.stringify(results) }] };
    }),
    // ... query_documents, generate_surface, etc.
  ]
});

SDK Built-in Replacements

Current Tool	SDK Equivalent	Notes
`web_search`	`WebSearch`	Direct replacement
`fetch_web_page`	`WebFetch`	Direct replacement
`grep_documents`	`Grep`	Only if documents are on filesystem; current implementation queries the DB

New Tools Enabled by SDK

Tool	Purpose	Why
`create_inference`	Agent creates inferences directly	Currently inferences are extracted from structured output. With SDK tools, the agent could create them iteratively as it discovers insights.
`compare_documents`	Cross-document analysis	Currently requires multiple search calls. A dedicated tool would be more efficient.
`update_exploration_state`	Agent updates exploration clusters/connections	Enable the agent to actively manage exploration topology.
`schedule_research`	Queue async research tasks	Agent identifies questions needing deeper investigation and schedules them.
`memory_recall`	Retrieve context from previous conversations	SDK sessions handle this partially, but explicit recall of past findings would enhance continuity.

Multi-Provider Support

Less of a trade-off than initially assumed. While the SDK natively targets Claude models via the Anthropic API, it supports model flexibility through multiple mechanisms.

Native Provider Support

Anthropic API (direct) — Claude Opus, Sonnet, Haiku
AWS Bedrock — set CLAUDE_CODE_USE_BEDROCK=1 + AWS credentials
Google Vertex AI — set CLAUDE_CODE_USE_VERTEX=1 + GCP credentials
Azure AI Foundry — set CLAUDE_CODE_USE_FOUNDRY=1 + Azure credentials

LiteLLM Proxy Pattern (Any Model)

By pointing ANTHROPIC_BASE_URL at a LiteLLM proxy, the SDK's Anthropic-format requests are translated to any backend:

OpenAI (GPT-4o, o1, etc.)
Mistral
Ollama (fully local, air-gapped)
Any OpenAI-compatible endpoint

The current multi-provider flexibility is preserved — the mechanism changes from direct OpenAI SDK calls to LiteLLM-proxied Anthropic-format calls. Agent code stays identical regardless of backend model.

Deployment Options

Scenario	Backend	Data Leaves Infra?
Standard	Anthropic API	Yes (to Anthropic)
Cloud-regulated	Bedrock / Vertex / Azure	No (stays in cloud contract)
Air-gapped / offline	Ollama + LiteLLM	No (fully local)

What This Means for Condelo

The current multi-provider system (LLM_PROVIDER env var → OpenAI/OpenRouter/Ollama/LM Studio) would be replaced by:

Agentic workloads: Agent SDK → ANTHROPIC_BASE_URL pointing at either Anthropic directly or a LiteLLM proxy for other models
Non-agentic workloads (embeddings, story pipeline, title generation): Keep raw OpenAI SDK — these don't benefit from the agent loop

One SDK for agentic flows (with model flexibility via proxy), one SDK for simple completions.

Implementation Approach

Phase 1: Proof of Concept (1–2 weeks)

┌─────────────────────────────────────────────┐
│  Standalone script                          │
│  ├─ Register 13 tools as MCP               │
│  ├─ Run chat flow via SDK agent loop        │
│  ├─ Compare: quality, latency, cost         │
│  └─ Validate structured output for schemas  │
└─────────────────────────────────────────────┘

Install @anthropic-ai/claude-agent-sdk
Create standalone script running chat flow via SDK
Register existing tools as MCP tools
Compare quality, latency, cost, tool-use patterns against current implementation
Validate structured output support for inference schemas

Phase 2: Chat Migration (2–3 weeks)

Create apps/api/src/services/chat-agent-sdk.ts alongside existing chat.ts
Implement MCP tool server for all 13 tools
Build SSE adapter (SDK async generator → Hono streamSSE)
Add feature flag: CHAT_ENGINE=agent-sdk|legacy
Migrate system prompt rules into tool descriptions + MCP server metadata
Add hooks for LangSmith tracing and event bus integration
A/B test against existing implementation

Phase 3: Research Agent Migration (2 weeks)

Replace research-agent.ts tool loop with SDK agent
Validate structured output (InferenceOutputItem[]) works with SDK
Keep BullMQ worker wrapper — swap the inner engine
Add cost tracking per agent run

Phase 4: Exploration Migration (2–3 weeks)

Migrate exploration-converse to SDK with session support
Implement subagent pattern for cross-cluster research
Integrate SDK sessions with exploration_messages table
Preserve suggested paths and research task extraction

Phase 5: New Agentic Capabilities (Ongoing)

Add new tools (create_inference, compare_documents, etc.)
Implement subagent patterns for complex analysis
Explore SDK guardrails for production safety

Phase 1          Phase 2          Phase 3          Phase 4          Phase 5
PoC              Chat             Research         Exploration      New Tools
──────────┐  ┌──────────────┐  ┌────────────┐  ┌──────────────┐  ┌──────────
  1-2 wks │  │   2-3 wks    │  │   2 wks    │  │   2-3 wks    │  │ ongoing
  SDK +   │  │  chat-agent- │  │  research-  │  │  explore +   │  │ create_
  MCP     │  │  sdk.ts +    │  │  agent.ts   │  │  sessions +  │  │ inference
  tools + │  │  feature     │  │  swap inner │  │  subagents   │  │ compare_
  bench-  │  │  flag +      │  │  engine     │  │              │  │ docs ...
  mark    │  │  A/B test    │  │             │  │              │  │
──────────┘  └──────────────┘  └────────────┘  └──────────────┘  └──────────

Risks & Mitigations

Risk	Severity	Mitigation
LiteLLM proxy reliability	Medium	LiteLLM is the bridge to non-Claude models. Test tool-calling fidelity across models. Have fallback to direct provider SDKs if needed.
Cost increase	Medium	Claude API pricing vs OpenAI/OpenRouter. Use `maxBudgetUsd` guardrails. Haiku for quick tasks, Sonnet for agents. LiteLLM proxy to cheaper models for cost-sensitive workloads.
Latency	Medium	SDK adds overhead (session management, tool search). LiteLLM proxy adds a hop. Benchmark against current implementation.
Loss of fine-grained control	Medium	Current implementation has custom context budgeting (12K char truncation), visualisation nudging, exhaustion tracking. Some moves into hooks, some may need workarounds.
Tool-calling quality variance	Medium	Different models handle tool use with varying quality. Claude excels at complex multi-tool reasoning. Via LiteLLM, weaker models may make poor tool choices. Test per-model and set minimum capability requirements.
Structured output compatibility	Medium	Verify SDK's `output_format` supports complex Zod schemas before committing. Test with LiteLLM-proxied models too.
LangSmith compatibility	Low	SDK hooks can forward events to LangSmith. May need custom integration.
Session storage	Low	SDK persists sessions to local filesystem. For multi-instance deployment, need shared storage or custom session backend.
Breaking changes	Low	SDK is new — API may evolve. Pin versions, abstract behind internal interfaces.

Recommendation

Adopt the Claude Agent SDK for all agentic workloads. The LiteLLM proxy pattern eliminates the model lock-in concern, making full migration of agentic interfaces viable.

Do Migrate

Chat — biggest win. Eliminates ~200 lines of loop/dispatch code. Gains streaming, sessions, cost tracking, and guardrails.
Exploration Converse — natural fit for SDK sessions and subagents. The most agentic feature in the platform.
Research Agents — moderate win. Simplifies tool loop, adds cost tracking.

Don't Migrate

Story Pipeline — not agentic, just structured completions. Keep raw OpenAI SDK.
Exploration Prep — single LLM call, no tools. Keep raw OpenAI SDK.
Sub-Agent — too simple to justify SDK overhead. Keep as-is.
Embeddings — not applicable. Keep OpenAI SDK.
Title generation / quick tasks — simple completions. Keep raw OpenAI SDK.

Decision Framework

Do you want a production-grade agent framework or prefer maintaining custom loops? The SDK eliminates ~200+ lines of hand-rolled orchestration per agent type, but you trade direct control for framework conventions.
Is the LiteLLM proxy acceptable for non-Claude models? It adds a translation layer but preserves model flexibility. Test with your specific models (especially tool-calling fidelity) before committing.
Do you need the operational benefits now? Cost tracking, session management, guardrails, hooks, and subagents are production-grade features you'd otherwise build yourself.
Are you comfortable with Anthropic SDK format as the wire protocol? Even with LiteLLM, the SDK sends Anthropic-format requests. Non-agentic workloads stay on OpenAI SDK, so you'll have two API formats.
Do offline/air-gapped deployments matter? If yes, the Ollama + LiteLLM pattern is a strong advantage — same agent code runs fully local with no external calls.

Start with: A proof-of-concept on the chat interface behind a feature flag. This is the highest-traffic interface with the most complex tool loop — if the SDK works well here, it validates the approach for everything else.

Code Reference

File	Description
`apps/api/src/services/chat.ts`	Current chat tool loop (migration target)
`apps/api/src/tools/registry.ts`	Current tool registry (`registerTool`, `getAvailableTools`, `executeTool`)
`apps/api/src/tools/types.ts`	`ToolContext`, `ToolCallbacks`, `ToolEntry`, `ToolResult` types
`apps/data-plane/src/services/research-agent.ts`	Research agent execution (Phase 3 target)
`apps/data-plane/src/services/exploration-converse-agent.ts`	Exploration agent (Phase 4 target)
`apps/data-plane/src/services/sub-agent.ts`	Sub-agent (keep as-is)
`apps/data-plane/src/services/story-*.ts`	Story pipeline agents (keep as-is)

Relationships

Tool Registry — All 13 tools would migrate to MCP tool definitions
Agents — Research agents are the Phase 3 migration target
Surfaces & Experiences — generate_surface tool migrates to MCP; surface block handling moves to hooks
Inferences & Signals — Structured output for InferenceOutputItem[] needs SDK compatibility validation
Feeds — Feed scoping for agent tool access passes through MCP tool context