Overview
This is an analysis of how the Claude Agent SDK fits into the Condelo RAG platform — what it would replace, what it would improve, and how to implement it. The SDK provides the same tools, agent loop, and context management that power Claude Code, programmable in TypeScript.
Condelo RAG has a mature, custom-built agentic architecture with 6 distinct LLM interfaces, a hand-rolled tool loop, 13+ tools, multi-provider LLM abstraction, and streaming SSE delivery. As the platform productionises, the SDK could replace, simplify, or improve the agentic parts of this system.
Key Concepts
- Built-in agent loop — No manual
while (round < MAX_ROUNDS). The SDK handles tool dispatch, result collection, and continuation automatically. - Custom tools via MCP — Domain-specific tools are defined as in-process MCP servers using
createSdkMcpServer()with Zod schemas. Existing tool handlers slot in directly. - Sessions — Persistent conversation state with continue, resume, and fork. Maps well to exploration threads.
- Subagents — Named agents with scoped tool sets. Replaces the current manual sub-agent spawning in
analyze_document. - Hooks — Lifecycle events (PreToolUse, PostToolUse, Stop) for logging, metrics, audit, and streaming.
- Cost governance —
maxTurnsandmaxBudgetUsdas built-in guardrails. Per-session token usage and USD cost tracking. - LiteLLM proxy pattern — By pointing
ANTHROPIC_BASE_URLat a LiteLLM proxy, the SDK's Anthropic-format requests are translated to any backend (OpenAI, Mistral, Ollama, etc.).
Current Architecture
The platform has 6 distinct LLM interfaces, each with different loop complexity and tool requirements:
| Interface | File | Loop | Tools | Output |
|---|---|---|---|---|
| Chat | apps/api/src/services/chat.ts | 5-round tool loop, streaming | 13 tools | Streamed markdown + surface blocks |
| Research Agent | apps/data-plane/src/services/research-agent.ts | 5-round tool loop, structured | search, query | InferenceOutputItem[] via zodResponseFormat |
| Exploration Converse | apps/data-plane/src/services/exploration-converse-agent.ts | Multi-turn within cluster | search, query, generate_surface | Messages + suggested paths |
| Exploration Prep | apps/data-plane/src/services/exploration-prep-agent.ts | Single LLM call | None | Cluster definitions |
| Story Pipeline | apps/data-plane/src/services/story-*.ts (5 agents) | Single call each | None | Pyramid, storyline, surfaces |
| Sub-Agent | apps/data-plane/src/services/sub-agent.ts | 3-round tool loop | search_within_document | Summary + chunks |
Current Flow
User Request
│
▼
┌─────────────────────────────┐
│ Hono API / SSE Endpoint │
│ (apps/api) │
└─────────┬───────────────────┘
│
▼
┌─────────────────────────────┐
│ Hand-rolled Tool Loop │
│ while (round < MAX_ROUNDS) │
│ ├─ LLM call (OpenAI SDK) │
│ ├─ Parse tool_calls │
│ ├─ executeTool() │
│ └─ Append results │
└─────────┬───────────────────┘
│
▼
┌─────────────────────────────┐
│ Tool Registry │
│ registerTool() / execute() │
│ 13 tools, availability │
│ predicates, callbacks │
└─────────────────────────────┘
Proposed Flow (Agent SDK)
User Request
│
▼
┌─────────────────────────────┐
│ Hono API / SSE Endpoint │
│ (apps/api) │
└─────────┬───────────────────┘
│
▼
┌─────────────────────────────┐
│ Claude Agent SDK │
│ query({ │
│ prompt, │
│ options: { │
│ mcpServers, │
│ maxTurns, │
│ maxBudgetUsd, │
│ hooks │
│ } │
│ }) │
└─────────┬───────────────────┘
│
▼
┌─────────────────────────────┐
│ MCP Tool Server │
│ createSdkMcpServer({ │
│ tools: [ │
│ search_documents, │
│ query_documents, │
│ generate_surface, ... │
│ ] │
│ }) │
└─────────────────────────────┘
Fit Assessment
Chat — Good Fit
Current: ~200 lines of manual tool loop, dispatch, context budgeting, streaming assembly.
With SDK: The agent loop replaces while (round < MAX_TOOL_ROUNDS) entirely. Custom tools register as MCP tools. Streaming via async generators pipes into Hono's streamSSE.
What improves: Eliminates manual loop/dispatch code. Gains built-in streaming, cost tracking, maxTurns/maxBudgetUsd guardrails, and hooks for observability. Subagent support means analyze_document becomes native rather than hand-rolled.
Caveats: Dynamic system prompt (DB schema, feed context) needs rethinking — SDK auto-generates tool descriptions from MCP definitions. Surface block handling (nudging toward generate_surface) moves into hooks or post-processing.
Exploration Converse — Good Fit
The most "agentic" interface. Multi-turn conversation within inference clusters, with tools for search, query, and surface generation.
With SDK: Session management (continue/resume/fork) maps well to exploration threads. Subagents handle cross-cluster research. Progressive disclosure becomes natural.
Caveats: Rich exploration state (clusters, connections, deltas) stays in the DB — SDK sessions don't replace this. Suggested paths and research tasks need custom post-processing.
Research Agents — Moderate Fit
Current: BullMQ worker with 5-round tool loop and structured output via zodResponseFormat.
With SDK: Replaces tool loop. Cost tracking per agent run is a strong operational benefit. Worker wrapper stays — just swap the inner engine.
Caveats: Verify SDK's output_format handles complex nested schemas (InferenceOutputItem with evidence arrays, surfaceHints). Feed scoping needs to pass context into MCP tool implementations.
Story Pipeline — Poor Fit
5 sequential agents making single structured LLM calls with no tools. Data passed in prompts. The SDK's tool loop and session management add complexity without benefit. Keep as-is.
Exploration Prep — Poor Fit
Single structured LLM call to cluster inferences. No tools. Keep as-is.
Sub-Agent — Poor Fit
3-round tool loop with one tool. Too simple to justify SDK overhead. Keep as-is.
Tool Migration
Existing Tools → MCP
All 13 domain-specific tools would be implemented as custom MCP tools:
const condeloTools = createSdkMcpServer({
name: "condelo",
tools: [
tool("search_documents", "Semantic/keyword search across documents", {
query: z.string(),
feed_id: z.string().optional(),
search_type: z.enum(["semantic", "keyword", "hybrid"]).optional(),
}, async (args) => {
// Existing handler logic from tools/handlers/search-documents.ts
return { content: [{ type: "text", text: JSON.stringify(results) }] };
}),
// ... query_documents, generate_surface, etc.
]
});
SDK Built-in Replacements
| Current Tool | SDK Equivalent | Notes |
|---|---|---|
web_search | WebSearch | Direct replacement |
fetch_web_page | WebFetch | Direct replacement |
grep_documents | Grep | Only if documents are on filesystem; current implementation queries the DB |
New Tools Enabled by SDK
| Tool | Purpose | Why |
|---|---|---|
create_inference | Agent creates inferences directly | Currently inferences are extracted from structured output. With SDK tools, the agent could create them iteratively as it discovers insights. |
compare_documents | Cross-document analysis | Currently requires multiple search calls. A dedicated tool would be more efficient. |
update_exploration_state | Agent updates exploration clusters/connections | Enable the agent to actively manage exploration topology. |
schedule_research | Queue async research tasks | Agent identifies questions needing deeper investigation and schedules them. |
memory_recall | Retrieve context from previous conversations | SDK sessions handle this partially, but explicit recall of past findings would enhance continuity. |
Multi-Provider Support
Less of a trade-off than initially assumed. While the SDK natively targets Claude models via the Anthropic API, it supports model flexibility through multiple mechanisms.
Native Provider Support
- Anthropic API (direct) — Claude Opus, Sonnet, Haiku
- AWS Bedrock — set
CLAUDE_CODE_USE_BEDROCK=1+ AWS credentials - Google Vertex AI — set
CLAUDE_CODE_USE_VERTEX=1+ GCP credentials - Azure AI Foundry — set
CLAUDE_CODE_USE_FOUNDRY=1+ Azure credentials
LiteLLM Proxy Pattern (Any Model)
By pointing ANTHROPIC_BASE_URL at a LiteLLM proxy, the SDK's Anthropic-format requests are translated to any backend:
- OpenAI (GPT-4o, o1, etc.)
- Mistral
- Ollama (fully local, air-gapped)
- Any OpenAI-compatible endpoint
The current multi-provider flexibility is preserved — the mechanism changes from direct OpenAI SDK calls to LiteLLM-proxied Anthropic-format calls. Agent code stays identical regardless of backend model.
Deployment Options
| Scenario | Backend | Data Leaves Infra? |
|---|---|---|
| Standard | Anthropic API | Yes (to Anthropic) |
| Cloud-regulated | Bedrock / Vertex / Azure | No (stays in cloud contract) |
| Air-gapped / offline | Ollama + LiteLLM | No (fully local) |
What This Means for Condelo
The current multi-provider system (LLM_PROVIDER env var → OpenAI/OpenRouter/Ollama/LM Studio) would be replaced by:
- Agentic workloads: Agent SDK →
ANTHROPIC_BASE_URLpointing at either Anthropic directly or a LiteLLM proxy for other models - Non-agentic workloads (embeddings, story pipeline, title generation): Keep raw OpenAI SDK — these don't benefit from the agent loop
One SDK for agentic flows (with model flexibility via proxy), one SDK for simple completions.
Implementation Approach
Phase 1: Proof of Concept (1–2 weeks)
┌─────────────────────────────────────────────┐
│ Standalone script │
│ ├─ Register 13 tools as MCP │
│ ├─ Run chat flow via SDK agent loop │
│ ├─ Compare: quality, latency, cost │
│ └─ Validate structured output for schemas │
└─────────────────────────────────────────────┘
- Install
@anthropic-ai/claude-agent-sdk - Create standalone script running chat flow via SDK
- Register existing tools as MCP tools
- Compare quality, latency, cost, tool-use patterns against current implementation
- Validate structured output support for inference schemas
Phase 2: Chat Migration (2–3 weeks)
- Create
apps/api/src/services/chat-agent-sdk.tsalongside existingchat.ts - Implement MCP tool server for all 13 tools
- Build SSE adapter (SDK async generator → Hono
streamSSE) - Add feature flag:
CHAT_ENGINE=agent-sdk|legacy - Migrate system prompt rules into tool descriptions + MCP server metadata
- Add hooks for LangSmith tracing and event bus integration
- A/B test against existing implementation
Phase 3: Research Agent Migration (2 weeks)
- Replace
research-agent.tstool loop with SDK agent - Validate structured output (
InferenceOutputItem[]) works with SDK - Keep BullMQ worker wrapper — swap the inner engine
- Add cost tracking per agent run
Phase 4: Exploration Migration (2–3 weeks)
- Migrate exploration-converse to SDK with session support
- Implement subagent pattern for cross-cluster research
- Integrate SDK sessions with
exploration_messagestable - Preserve suggested paths and research task extraction
Phase 5: New Agentic Capabilities (Ongoing)
- Add new tools (create_inference, compare_documents, etc.)
- Implement subagent patterns for complex analysis
- Explore SDK guardrails for production safety
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
PoC Chat Research Exploration New Tools
──────────┐ ┌──────────────┐ ┌────────────┐ ┌──────────────┐ ┌──────────
1-2 wks │ │ 2-3 wks │ │ 2 wks │ │ 2-3 wks │ │ ongoing
SDK + │ │ chat-agent- │ │ research- │ │ explore + │ │ create_
MCP │ │ sdk.ts + │ │ agent.ts │ │ sessions + │ │ inference
tools + │ │ feature │ │ swap inner │ │ subagents │ │ compare_
bench- │ │ flag + │ │ engine │ │ │ │ docs ...
mark │ │ A/B test │ │ │ │ │ │
──────────┘ └──────────────┘ └────────────┘ └──────────────┘ └──────────
Risks & Mitigations
| Risk | Severity | Mitigation |
|---|---|---|
| LiteLLM proxy reliability | Medium | LiteLLM is the bridge to non-Claude models. Test tool-calling fidelity across models. Have fallback to direct provider SDKs if needed. |
| Cost increase | Medium | Claude API pricing vs OpenAI/OpenRouter. Use maxBudgetUsd guardrails. Haiku for quick tasks, Sonnet for agents. LiteLLM proxy to cheaper models for cost-sensitive workloads. |
| Latency | Medium | SDK adds overhead (session management, tool search). LiteLLM proxy adds a hop. Benchmark against current implementation. |
| Loss of fine-grained control | Medium | Current implementation has custom context budgeting (12K char truncation), visualisation nudging, exhaustion tracking. Some moves into hooks, some may need workarounds. |
| Tool-calling quality variance | Medium | Different models handle tool use with varying quality. Claude excels at complex multi-tool reasoning. Via LiteLLM, weaker models may make poor tool choices. Test per-model and set minimum capability requirements. |
| Structured output compatibility | Medium | Verify SDK's output_format supports complex Zod schemas before committing. Test with LiteLLM-proxied models too. |
| LangSmith compatibility | Low | SDK hooks can forward events to LangSmith. May need custom integration. |
| Session storage | Low | SDK persists sessions to local filesystem. For multi-instance deployment, need shared storage or custom session backend. |
| Breaking changes | Low | SDK is new — API may evolve. Pin versions, abstract behind internal interfaces. |
Recommendation
Adopt the Claude Agent SDK for all agentic workloads. The LiteLLM proxy pattern eliminates the model lock-in concern, making full migration of agentic interfaces viable.
Do Migrate
- Chat — biggest win. Eliminates ~200 lines of loop/dispatch code. Gains streaming, sessions, cost tracking, and guardrails.
- Exploration Converse — natural fit for SDK sessions and subagents. The most agentic feature in the platform.
- Research Agents — moderate win. Simplifies tool loop, adds cost tracking.
Don't Migrate
- Story Pipeline — not agentic, just structured completions. Keep raw OpenAI SDK.
- Exploration Prep — single LLM call, no tools. Keep raw OpenAI SDK.
- Sub-Agent — too simple to justify SDK overhead. Keep as-is.
- Embeddings — not applicable. Keep OpenAI SDK.
- Title generation / quick tasks — simple completions. Keep raw OpenAI SDK.
Decision Framework
-
Do you want a production-grade agent framework or prefer maintaining custom loops? The SDK eliminates ~200+ lines of hand-rolled orchestration per agent type, but you trade direct control for framework conventions.
-
Is the LiteLLM proxy acceptable for non-Claude models? It adds a translation layer but preserves model flexibility. Test with your specific models (especially tool-calling fidelity) before committing.
-
Do you need the operational benefits now? Cost tracking, session management, guardrails, hooks, and subagents are production-grade features you'd otherwise build yourself.
-
Are you comfortable with Anthropic SDK format as the wire protocol? Even with LiteLLM, the SDK sends Anthropic-format requests. Non-agentic workloads stay on OpenAI SDK, so you'll have two API formats.
-
Do offline/air-gapped deployments matter? If yes, the Ollama + LiteLLM pattern is a strong advantage — same agent code runs fully local with no external calls.
Start with: A proof-of-concept on the chat interface behind a feature flag. This is the highest-traffic interface with the most complex tool loop — if the SDK works well here, it validates the approach for everything else.
Code Reference
| File | Description |
|---|---|
apps/api/src/services/chat.ts | Current chat tool loop (migration target) |
apps/api/src/tools/registry.ts | Current tool registry (registerTool, getAvailableTools, executeTool) |
apps/api/src/tools/types.ts | ToolContext, ToolCallbacks, ToolEntry, ToolResult types |
apps/data-plane/src/services/research-agent.ts | Research agent execution (Phase 3 target) |
apps/data-plane/src/services/exploration-converse-agent.ts | Exploration agent (Phase 4 target) |
apps/data-plane/src/services/sub-agent.ts | Sub-agent (keep as-is) |
apps/data-plane/src/services/story-*.ts | Story pipeline agents (keep as-is) |
Relationships
- Tool Registry — All 13 tools would migrate to MCP tool definitions
- Agents — Research agents are the Phase 3 migration target
- Surfaces & Experiences —
generate_surfacetool migrates to MCP; surface block handling moves to hooks - Inferences & Signals — Structured output for
InferenceOutputItem[]needs SDK compatibility validation - Feeds — Feed scoping for agent tool access passes through MCP tool context