Platform Strategy
Platform Gaps (Must Fix Before First Paying Client)
Critical (Blocking)
| Gap | Current State | What's Needed | Effort |
|---|---|---|---|
| Multi-tenancy / Orgs | Single user owns spaces. No orgs, teams, or shared access | Organizations table, membership, role-based access, RLS rewrite | 2-3 weeks |
| Client isolation | All clients share one Postgres + Qdrant + Redis | Per-client spaces (logical isolation) or per-client databases (physical). Logical is fine initially | Already works via spaces |
| API key management | Demo apps use hardcoded Bearer tokens from env vars | API key table, key generation, rotation, scoping (per-space, per-app), rate limits | 1-2 weeks |
| Usage metering | Zero tracking of LLM tokens, API calls, storage, or processing | Usage events table, middleware to log every metered action, aggregation for billing | 2-3 weeks |
| Stripe Connect | Nothing exists | Stripe Connect for marketplace billing: end-user subscriptions, 30/70 auto-split, partner payouts, revenue dashboards | 2-3 weeks |
| Deployment | No Dockerfiles, no CI/CD, local-only | Dockerfiles for all services, docker-compose.prod.yml, CI/CD pipeline, secrets management | 2-3 weeks |
| HTTPS / Domain | Hardcoded localhost CORS | Proper domain setup, TLS, environment-aware CORS, reverse proxy | 1 week |
Important (First 3 months)
| Gap | What's Needed | Effort |
|---|---|---|
| Rate limiting | Per-key, per-endpoint rate limits (token bucket or similar) | 1 week |
| Admin dashboard | Platform operator view: all clients, usage, billing, health | 2-3 weeks |
| Client onboarding flow | Automated: create org, provision space, generate API keys, configure webhooks | 1-2 weeks |
| Backup & recovery | Automated PostgreSQL + Qdrant backups, tested recovery procedure | 1 week |
| Monitoring & alerting | Proper dashboards, PagerDuty/Slack alerts for downtime, errors, queue backlogs | 1 week |
| App template / SDK | Starter kit for building apps on top of Condelo (auth, API client, common patterns) | 1-2 weeks |
Nice to Have (6+ months)
| Gap | What's Needed |
|---|---|
| RBAC (fine-grained permissions) | Beyond admin/member — editor, viewer, custom roles |
| Audit logging | Who did what, when, for compliance-heavy clients |
| Multi-region deployment | For clients requiring data residency |
| White-labeling | Custom domains, branding per client app |
| Webhook delivery guarantees | Retry logic, dead letter queue, delivery status |
LLM Plane: Centralized AI Metering
The Problem
LLM calls are scattered across ~38 call sites in two apps (API + Data Plane), using two nearly-identical abstraction layers. There's zero local token tracking — only LangSmith (external). To bill clients on usage, you need to meter every LLM call and attribute costs to an org/space.
Current State
apps/api/src/lib/llm/ ← LLM abstraction (config + client factory)
apps/data-plane/src/lib/llm/ ← Nearly identical copy
Call sites:
API: 9 call sites (chat, exploration, feed/agent suggestions)
Data Plane: 29 call sites (embedding, metadata, agents, stories, wiki, etc.)
Total: 38 call sites, 0 token tracking
Three Options for the LLM Plane
Option A: Separate HTTP Service (LLM Proxy)
[API] ──HTTP──→ [LLM Plane :6316] ──HTTP──→ [OpenAI/OpenRouter]
[Data Plane] ──HTTP──→ [LLM Plane :6316] ──HTTP──→ [OpenAI/OpenRouter]
A new Hono service that proxies all LLM calls, meters them, and forwards to providers.
| Pros | Cons |
|---|---|
| Complete isolation — all LLM traffic flows through one point | Extra network hop adds latency (~5-20ms per call) |
| Can enforce rate limits, quotas, model routing centrally | New service to deploy, monitor, and keep running |
| Can swap providers without touching calling code | Streaming passthrough is complex to implement correctly |
| Could serve external clients later (API-as-a-service) | Operational overhead for a 2-person team |
Option B: Shared Package @condelo/llm (RECOMMENDED)
packages/llm/ ← New shared package
├── src/
│ ├── client.ts ← Metered OpenAI client wrapper
│ ├── config.ts ← Unified config (replaces both copies)
│ ├── metering.ts ← Token counting + cost estimation + DB logging
│ ├── providers.ts ← Provider registry (OpenAI, OpenRouter, Ollama, etc.)
│ └── index.ts
Both API and Data Plane import @condelo/llm instead of their own lib/llm/. The metering layer wraps every client.chat.completions.create() and client.embeddings.create() call, extracts response.usage, and logs to the usage_events table.
| Pros | Cons |
|---|---|
| Zero latency overhead (same process) | Both apps must share the same package version |
Eliminates duplicated code (two lib/llm/ folders → one package) | Token tracking is in-process, not centralized |
| Simple to implement — wrap existing client factory | Can't rate-limit across services (each service limits independently) |
| No new service to deploy or monitor | |
| Metering data still goes to shared DB — same visibility |
Option C: External LLM Proxy (LiteLLM, Helicone, etc.)
Use an existing open-source proxy like LiteLLM or a managed service like Helicone.
| Pros | Cons |
|---|---|
| Battle-tested, handles streaming, retries, fallbacks | External dependency |
| Built-in dashboards and cost tracking | Another service to deploy (LiteLLM) or monthly fee (Helicone ~$40/mo) |
| Model routing, load balancing, caching | May not integrate cleanly with your metering/billing needs |
Recommendation: Option B (Shared Package)
For a 2-person team, a shared package is the right balance:
- Eliminates duplication — the two identical
lib/llm/folders merge intopackages/llm/ - Zero operational overhead — no new service to deploy
- Automatic metering — wraps the OpenAI client to extract
response.usageafter every call - Attributes costs — every call tagged with
{ orgId, spaceId, taskType, model }and logged tousage_events - Simple upgrade path — if you later need a separate service (Option A), the package becomes the client SDK for it
Detailed Implementation Design
Package Structure
packages/llm/
├── src/
│ ├── index.ts # Public API: createLLM(), types
│ ├── client.ts # MeteredOpenAI client (wraps OpenAI SDK)
│ ├── config.ts # Unified config (merges both copies)
│ ├── metering.ts # Usage logging to DB
│ ├── pricing.ts # Model pricing lookup (from models.ts)
│ ├── providers.ts # Provider registry + key resolution
│ └── types.ts # Shared types
├── package.json # Dependencies: openai, @condelo/db, @condelo/shared, langsmith
└── tsconfig.json
How Metering Works (Zero Changes to Callers)
The key insight: metering wraps the OpenAI client at creation time, so all 38 call sites are metered automatically without any code changes.
Currently callers do:
// apps/api/src/services/chat.ts (line 319-320)
const { client, model } = await getSystemLLMConfig();
const response = await client.chat.completions.create({ model, messages, ... });
// response.usage is IGNORED — tokens discarded
After the migration, callers still do the exact same thing:
const { client, model } = await getSystemLLMConfig();
const response = await client.chat.completions.create({ model, messages, ... });
// But now 'client' is a MeteredOpenAI that auto-logs usage
MeteredOpenAI Client Design
// packages/llm/src/client.ts
import OpenAI from "openai";
import { logUsageEvent } from "./metering.js";
import { lookupModelPricing } from "./pricing.js";
interface MeteringContext {
orgId?: string; // Set per-request via AsyncLocalStorage or explicit param
spaceId?: string;
taskType: string; // "chat", "embedding", "agent", "story", "quick", "metadata"
}
// Returns a Proxy around the OpenAI client that intercepts completions.create()
// and embeddings.create() to capture response.usage
export function createMeteredClient(
baseClient: OpenAI,
defaultContext: MeteringContext
): OpenAI {
// Proxy intercepts client.chat.completions.create() calls
// After the response resolves:
// 1. Extract response.usage.prompt_tokens + completion_tokens
// 2. Look up pricing: lookupModelPricing(model, provider)
// 3. Calculate cost: (prompt_tokens * prompt_price + completion_tokens * completion_price) / 1_000_000
// 4. Fire-and-forget: logUsageEvent({ ...context, model, tokensIn, tokensOut, cost })
// For streaming responses (stream: true):
// OpenAI SDK streams include a final chunk with usage data when
// stream_options: { include_usage: true } is set.
// The proxy adds this option automatically and captures the final usage chunk.
return new Proxy(baseClient, { ... });
}
Critical detail for streaming: OpenAI's streaming API can include usage in the final chunk when you set stream_options: { include_usage: true }. The proxy automatically injects this option so streaming calls are metered too — callers don't need to change anything.
Usage Event Schema
// packages/db/src/schema/usage.ts
export const usageEvents = pgTable("usage_events", {
id: uuid("id").defaultRandom().primaryKey(),
orgId: uuid("org_id"), // null for internal/unattributed
spaceId: uuid("space_id"), // null for system-level calls
eventType: text("event_type").notNull(), // "llm_chat", "llm_embedding", "doc_process", "agent_run"
model: text("model"), // "gpt-4o-mini", "text-embedding-3-small"
provider: text("provider"), // "openai", "openrouter"
taskType: text("task_type"), // "chat", "agent", "story", "quick", "metadata", "embedding"
tokensIn: integer("tokens_in"), // prompt_tokens
tokensOut: integer("tokens_out"), // completion_tokens
estimatedCost: numeric("estimated_cost", { precision: 10, scale: 6 }), // in USD
metadata: jsonb("metadata"), // Additional context (thread_id, agent_run_id, etc.)
createdAt: timestamp("created_at").defaultNow().notNull(),
});
// Indexes for billing queries
// CREATE INDEX idx_usage_events_org_month ON usage_events (org_id, date_trunc('month', created_at));
// CREATE INDEX idx_usage_events_space ON usage_events (space_id, created_at);
How Context (orgId, spaceId) Flows to the LLM Layer
The challenge: LLM calls happen deep in service code. How does the metering layer know which org/space to attribute the cost to?
Solution: AsyncLocalStorage (Node.js built-in, zero dependency)
// packages/llm/src/context.ts
import { AsyncLocalStorage } from "node:async_hooks";
interface LLMContext {
orgId?: string;
spaceId?: string;
userId?: string;
}
export const llmContext = new AsyncLocalStorage<LLMContext>();
// Used in API middleware:
// app.use("*", (c, next) => {
// return llmContext.run({ orgId: c.get("orgId"), spaceId: c.get("spaceId") }, next);
// });
// The metered client reads this automatically:
// const ctx = llmContext.getStore();
// logUsageEvent({ orgId: ctx?.orgId, spaceId: ctx?.spaceId, ... });
This means: middleware sets the context once per request, and every LLM call in that request chain automatically gets attributed to the right org/space. No need to pass orgId through 5 layers of function calls.
Pricing Lookup
// packages/llm/src/pricing.ts
// Merges the existing OPENAI_PRICING from apps/api/src/services/models.ts
// with OpenRouter pricing (from their /models endpoint, cached)
export function lookupModelPricing(model: string, provider: string): {
promptPer1M: number; // USD per 1M prompt tokens
completionPer1M: number; // USD per 1M completion tokens
} {
// OpenAI: use hardcoded table (already exists in models.ts)
// OpenRouter: their API returns pricing per model
// Ollama/LM Studio: cost = 0 (local models)
// Unknown: return { 0, 0 } (don't fail, just can't estimate cost)
}
What Changes for Each App
apps/api:
- Delete
src/lib/llm/index.tsandsrc/lib/llm/config.ts - Import
{ getSystemLLMConfig, getTaskLLMConfig, llm }from@condelo/llm - Add AsyncLocalStorage middleware to set org/space context
- Move
models.tspricing data topackages/llm/src/pricing.ts - No changes to any service files — they already use the same function signatures
apps/data-plane:
- Delete
src/lib/llm/index.tsandsrc/lib/llm/config.ts - Import from
@condelo/llm - Add context propagation (data-plane receives orgId/spaceId from API calls via headers → set in AsyncLocalStorage)
- Remove
getStoryLLMConfigbackward-compat alias (usegetTaskLLMConfig("story")) - No changes to any service files
packages/llm dependencies:
openai— OpenAI SDKlangsmith— LangSmith tracing wrapper@condelo/db— Forusage_eventstable writes@condelo/shared— For LLMProvider types, PROVIDER_DEFAULTS
What Stays External
- LangSmith tracing: Kept as-is. The metered client wraps LangSmith wrapping — so you get both LangSmith traces AND local usage logging. LangSmith is for debugging/observability, local metering is for billing.
- Redis caching of system settings: The config functions call
cacheGet/cacheSet. The package needs access to Redis — simplest approach is passing a cache interface at init time rather than importing directly from each app'slib/cache.ts.
Migration Path
- Create
packages/llm/with unified config + metered client - Add
usage_eventstable topackages/db/src/schema/ - Update
apps/apito import from@condelo/llm, deletesrc/lib/llm/ - Update
apps/data-planeto import from@condelo/llm, deletesrc/lib/llm/ - Add AsyncLocalStorage middleware to both apps
- Run
npm run db:pushto create usage_events table - Verify: all LLM call sites now auto-log to
usage_events— check with a few test queries - Build usage aggregation queries (daily/monthly rollups by org) for the billing dashboard
Effort Estimate
| Task | Effort |
|---|---|
Create packages/llm/ package scaffolding | 2 hours |
Merge two lib/llm/ into one (they're 95% identical) | 3 hours |
| Build MeteredOpenAI proxy (chat + embedding + streaming) | 1 day |
Add usage_events schema + migration | 2 hours |
| Pricing lookup (port from models.ts + OpenRouter API) | 3 hours |
| AsyncLocalStorage context propagation in both apps | 3 hours |
| Update imports in API (9 call sites, all use same function names) | 2 hours |
| Update imports in Data Plane (29 call sites) | 3 hours |
| Cache interface abstraction (so package doesn't depend on each app's Redis) | 2 hours |
| Testing + verification | 1 day |
| Total | ~4-5 days |
API Decomposition
Current State
The codebase already has a good two-service split:
- API (port 6311): 20 route files — user-facing CRUD, auth, chat, orchestration
- Data Plane (port 6312): 15 route files — heavy processing, ML ops, workers
This split is sound. The Data Plane handles the expensive async work (embedding, agents, doc processing) while the API handles user-facing requests. Don't break this up further — the question is how to organize within each service.
Proposed Logical Groups (Within Existing Services)
Rather than splitting into microservices (operational overhead too high for 2 people), organize routes into logical domains with clear boundaries. This makes future extraction easy if needed.
apps/api/src/routes/
├── core/ ← Core Platform (always needed)
│ ├── health.ts
│ ├── spaces.ts
│ ├── documents.ts
│ ├── sources.ts
│ ├── webhook.ts ← Webhook ingestion (a source type)
│ └── threads.ts + messages.ts
│
├── intelligence/ ← AI Features (the value layer)
│ ├── feeds.ts
│ ├── agents.ts
│ ├── inferences.ts
│ ├── explorations.ts
│ ├── stories.ts
│ └── surfaces.ts
│
├── engagement/ ← User-facing notifications & events
│ ├── signals.ts
│ ├── notifications.ts
│ ├── events.ts
│ └── experiences.ts
│
├── admin/ ← Operator-only
│ ├── settings.ts
│ ├── wiki.ts
│ ├── billing.ts ← NEW: Stripe Connect management
│ ├── organizations.ts ← NEW: Client/partner management
│ └── usage.ts ← NEW: Usage dashboards
│
└── index.ts ← Route registration (unchanged)
The Data Plane stays as-is — it's already well-structured as an internal service.
Why Logical Groups, Not Microservices
| Approach | Operational Cost | Right For |
|---|---|---|
| Monolith with route groups (recommended now) | 1 deploy, 1 process, 1 log stream | 2-person team, < 20 clients |
| 2 services (current: API + Data Plane) | Already done, works well | Current scale |
| 4+ microservices | 4+ deploys, service discovery, distributed tracing | 10+ person team, 100+ clients |
Key insight: the bottleneck for the next 12 months is client acquisition and app building, not system architecture. Route groups give you clean code organization with zero operational overhead. You can extract services later if a specific group needs independent scaling.
What Actually Needs to Be New Services (Eventually)
Only two things genuinely benefit from separate deployment:
- Worker processes (already separate — Data Plane workers run via
dev:workeranddev:agent-worker) - Client apps (already separate — each is its own React Router 7 app)
Everything else stays in the monolith for now.
Target Architecture (Updated)
┌───────────────────────────────────────────────────────────────────┐
│ CLIENT APPS (separate stacks) │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ CRM App │ │ Legal App │ │ Research App │ ... │
│ │ React + BFF │ │ React + BFF │ │ React + BFF │ │
│ │ + own DB │ │ + own DB │ │ + own DB │ │
│ └──────┬────────┘ └──────┬────────┘ └──────┬────────┘ │
│ └──────────────────┼──────────────────┘ │
│ │ API Key + X-Space-Id │
└───────────────────────────┼───────────────────────────────────────┘
│
┌───────────────────────────┼───────────────────────────────────────┐
│ CONDELO PLATFORM ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ API (Hono, port 6311) │ │
│ │ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Core │ │ Intelligence │ │ Admin │ │ │
│ │ │ spaces │ │ feeds,agents │ │ billing,orgs │ │ │
│ │ │ docs,src│ │ explorations │ │ settings │ │ │
│ │ │ threads │ │ stories │ │ usage │ │ │
│ │ │ webhooks│ └──────────────┘ └──────────────┘ │ │
│ │ └──────────┘ ┌──────────────┐ │ │
│ │ │ Engagement │ │ │
│ │ │ signals │ │ │
│ │ │ events,notif│ │ │
│ │ └──────────────┘ │ │
│ └─────────────────────────┼───────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────┼───────────────────────────────────┐ │
│ │ DATA PLANE (port 6312) │ │
│ │ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │ │
│ │ │ Ingestion │ │ AI Agents │ │ Search & │ │ │
│ │ │ upload, │ │ research, │ │ Retrieval │ │ │
│ │ │ convert, │ │ stories, │ │ vector, │ │ │
│ │ │ chunk,embed│ │ wiki, etc. │ │ keyword │ │ │
│ │ └─────────────┘ └──────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────┼──────────────────────────────────┐ │
│ │ @condelo/llm (shared package) │ │
│ │ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │ │
│ │ │ Client │ │ Metering │ │ Config │ │ │
│ │ │ factory │ │ token count │ │ unified │ │ │
│ │ │ (OpenAI) │ │ cost track │ │ DB + env │ │ │
│ │ └─────────────┘ └──────┬───────┘ └─────────────┘ │ │
│ └──────────────────────────┼──────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────┼──────────────────────────────────┐ │
│ │ DATA LAYER (shared, RLS-isolated) │ │
│ │ ┌──────────┐ ┌──────┐ ┌───────┐ ┌───────┐ │ │
│ │ │ Postgres │ │Qdrant│ │ Redis │ │ MinIO │ │ │
│ │ │+usage_ │ │ │ │ │ │ │ │ │
│ │ │ events │ │ │ │ │ │ │ │ │
│ │ └──────────┘ └──────┘ └───────┘ └───────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
Key Architectural Decision: Logical vs Physical Isolation
Recommendation: Logical isolation (shared infra, RLS boundaries) for now.
Reasons:
- Already works — spaces + RLS provide strong data isolation
- Lower operational cost (one Postgres, one Qdrant)
- Simpler to manage with a 2-person team
- Scale to ~20 clients before needing dedicated databases
Physical isolation (per-client databases) becomes necessary when:
- Client demands it contractually (regulated industries)
- Performance isolation required (noisy neighbor problems)
- Data residency requirements (different regions)