Platform Strategy

What needs to change before the first paying client: multi-tenancy, metering, deployment, and the target architecture.

Platform Gaps

Critical blockers and timeline for multi-tenancy, API keys, billing, and deployment.

LLM Plane

Centralized AI metering via a shared package — zero latency overhead, automatic token tracking.

API Decomposition

Logical route groups within the existing two-service split — no microservices overhead.

Target Architecture

Client apps (own stack + BFF) consuming the Condelo platform via API keys.

Platform Strategy

Platform Gaps (Must Fix Before First Paying Client)

Critical (Blocking)

GapCurrent StateWhat's NeededEffort
Multi-tenancy / OrgsSingle user owns spaces. No orgs, teams, or shared accessOrganizations table, membership, role-based access, RLS rewrite2-3 weeks
Client isolationAll clients share one Postgres + Qdrant + RedisPer-client spaces (logical isolation) or per-client databases (physical). Logical is fine initiallyAlready works via spaces
API key managementDemo apps use hardcoded Bearer tokens from env varsAPI key table, key generation, rotation, scoping (per-space, per-app), rate limits1-2 weeks
Usage meteringZero tracking of LLM tokens, API calls, storage, or processingUsage events table, middleware to log every metered action, aggregation for billing2-3 weeks
Stripe ConnectNothing existsStripe Connect for marketplace billing: end-user subscriptions, 30/70 auto-split, partner payouts, revenue dashboards2-3 weeks
DeploymentNo Dockerfiles, no CI/CD, local-onlyDockerfiles for all services, docker-compose.prod.yml, CI/CD pipeline, secrets management2-3 weeks
HTTPS / DomainHardcoded localhost CORSProper domain setup, TLS, environment-aware CORS, reverse proxy1 week

Important (First 3 months)

GapWhat's NeededEffort
Rate limitingPer-key, per-endpoint rate limits (token bucket or similar)1 week
Admin dashboardPlatform operator view: all clients, usage, billing, health2-3 weeks
Client onboarding flowAutomated: create org, provision space, generate API keys, configure webhooks1-2 weeks
Backup & recoveryAutomated PostgreSQL + Qdrant backups, tested recovery procedure1 week
Monitoring & alertingProper dashboards, PagerDuty/Slack alerts for downtime, errors, queue backlogs1 week
App template / SDKStarter kit for building apps on top of Condelo (auth, API client, common patterns)1-2 weeks

Nice to Have (6+ months)

GapWhat's Needed
RBAC (fine-grained permissions)Beyond admin/member — editor, viewer, custom roles
Audit loggingWho did what, when, for compliance-heavy clients
Multi-region deploymentFor clients requiring data residency
White-labelingCustom domains, branding per client app
Webhook delivery guaranteesRetry logic, dead letter queue, delivery status

LLM Plane: Centralized AI Metering

The Problem

LLM calls are scattered across ~38 call sites in two apps (API + Data Plane), using two nearly-identical abstraction layers. There's zero local token tracking — only LangSmith (external). To bill clients on usage, you need to meter every LLM call and attribute costs to an org/space.

Current State

apps/api/src/lib/llm/           ← LLM abstraction (config + client factory)
apps/data-plane/src/lib/llm/    ← Nearly identical copy

Call sites:
  API:         9 call sites (chat, exploration, feed/agent suggestions)
  Data Plane: 29 call sites (embedding, metadata, agents, stories, wiki, etc.)
  Total:      38 call sites, 0 token tracking

Three Options for the LLM Plane

Option A: Separate HTTP Service (LLM Proxy)

[API] ──HTTP──→ [LLM Plane :6316] ──HTTP──→ [OpenAI/OpenRouter]
[Data Plane] ──HTTP──→ [LLM Plane :6316] ──HTTP──→ [OpenAI/OpenRouter]

A new Hono service that proxies all LLM calls, meters them, and forwards to providers.

ProsCons
Complete isolation — all LLM traffic flows through one pointExtra network hop adds latency (~5-20ms per call)
Can enforce rate limits, quotas, model routing centrallyNew service to deploy, monitor, and keep running
Can swap providers without touching calling codeStreaming passthrough is complex to implement correctly
Could serve external clients later (API-as-a-service)Operational overhead for a 2-person team
packages/llm/                   ← New shared package
├── src/
│   ├── client.ts              ← Metered OpenAI client wrapper
│   ├── config.ts              ← Unified config (replaces both copies)
│   ├── metering.ts            ← Token counting + cost estimation + DB logging
│   ├── providers.ts           ← Provider registry (OpenAI, OpenRouter, Ollama, etc.)
│   └── index.ts

Both API and Data Plane import @condelo/llm instead of their own lib/llm/. The metering layer wraps every client.chat.completions.create() and client.embeddings.create() call, extracts response.usage, and logs to the usage_events table.

ProsCons
Zero latency overhead (same process)Both apps must share the same package version
Eliminates duplicated code (two lib/llm/ folders → one package)Token tracking is in-process, not centralized
Simple to implement — wrap existing client factoryCan't rate-limit across services (each service limits independently)
No new service to deploy or monitor
Metering data still goes to shared DB — same visibility

Option C: External LLM Proxy (LiteLLM, Helicone, etc.)

Use an existing open-source proxy like LiteLLM or a managed service like Helicone.

ProsCons
Battle-tested, handles streaming, retries, fallbacksExternal dependency
Built-in dashboards and cost trackingAnother service to deploy (LiteLLM) or monthly fee (Helicone ~$40/mo)
Model routing, load balancing, cachingMay not integrate cleanly with your metering/billing needs

Recommendation: Option B (Shared Package)

For a 2-person team, a shared package is the right balance:

  1. Eliminates duplication — the two identical lib/llm/ folders merge into packages/llm/
  2. Zero operational overhead — no new service to deploy
  3. Automatic metering — wraps the OpenAI client to extract response.usage after every call
  4. Attributes costs — every call tagged with { orgId, spaceId, taskType, model } and logged to usage_events
  5. Simple upgrade path — if you later need a separate service (Option A), the package becomes the client SDK for it

Detailed Implementation Design

Package Structure

packages/llm/
├── src/
│   ├── index.ts                 # Public API: createLLM(), types
│   ├── client.ts                # MeteredOpenAI client (wraps OpenAI SDK)
│   ├── config.ts                # Unified config (merges both copies)
│   ├── metering.ts              # Usage logging to DB
│   ├── pricing.ts               # Model pricing lookup (from models.ts)
│   ├── providers.ts             # Provider registry + key resolution
│   └── types.ts                 # Shared types
├── package.json                 # Dependencies: openai, @condelo/db, @condelo/shared, langsmith
└── tsconfig.json

How Metering Works (Zero Changes to Callers)

The key insight: metering wraps the OpenAI client at creation time, so all 38 call sites are metered automatically without any code changes.

Currently callers do:

// apps/api/src/services/chat.ts (line 319-320)
const { client, model } = await getSystemLLMConfig();
const response = await client.chat.completions.create({ model, messages, ... });
// response.usage is IGNORED — tokens discarded

After the migration, callers still do the exact same thing:

const { client, model } = await getSystemLLMConfig();
const response = await client.chat.completions.create({ model, messages, ... });
// But now 'client' is a MeteredOpenAI that auto-logs usage

MeteredOpenAI Client Design

// packages/llm/src/client.ts

import OpenAI from "openai";
import { logUsageEvent } from "./metering.js";
import { lookupModelPricing } from "./pricing.js";

interface MeteringContext {
  orgId?: string;     // Set per-request via AsyncLocalStorage or explicit param
  spaceId?: string;
  taskType: string;   // "chat", "embedding", "agent", "story", "quick", "metadata"
}

// Returns a Proxy around the OpenAI client that intercepts completions.create()
// and embeddings.create() to capture response.usage
export function createMeteredClient(
  baseClient: OpenAI,
  defaultContext: MeteringContext
): OpenAI {
  // Proxy intercepts client.chat.completions.create() calls
  // After the response resolves:
  //   1. Extract response.usage.prompt_tokens + completion_tokens
  //   2. Look up pricing: lookupModelPricing(model, provider)
  //   3. Calculate cost: (prompt_tokens * prompt_price + completion_tokens * completion_price) / 1_000_000
  //   4. Fire-and-forget: logUsageEvent({ ...context, model, tokensIn, tokensOut, cost })

  // For streaming responses (stream: true):
  //   OpenAI SDK streams include a final chunk with usage data when
  //   stream_options: { include_usage: true } is set.
  //   The proxy adds this option automatically and captures the final usage chunk.

  return new Proxy(baseClient, { ... });
}

Critical detail for streaming: OpenAI's streaming API can include usage in the final chunk when you set stream_options: { include_usage: true }. The proxy automatically injects this option so streaming calls are metered too — callers don't need to change anything.

Usage Event Schema

// packages/db/src/schema/usage.ts
export const usageEvents = pgTable("usage_events", {
  id: uuid("id").defaultRandom().primaryKey(),
  orgId: uuid("org_id"),                              // null for internal/unattributed
  spaceId: uuid("space_id"),                           // null for system-level calls
  eventType: text("event_type").notNull(),             // "llm_chat", "llm_embedding", "doc_process", "agent_run"
  model: text("model"),                                // "gpt-4o-mini", "text-embedding-3-small"
  provider: text("provider"),                          // "openai", "openrouter"
  taskType: text("task_type"),                         // "chat", "agent", "story", "quick", "metadata", "embedding"
  tokensIn: integer("tokens_in"),                      // prompt_tokens
  tokensOut: integer("tokens_out"),                    // completion_tokens
  estimatedCost: numeric("estimated_cost", { precision: 10, scale: 6 }), // in USD
  metadata: jsonb("metadata"),                         // Additional context (thread_id, agent_run_id, etc.)
  createdAt: timestamp("created_at").defaultNow().notNull(),
});

// Indexes for billing queries
// CREATE INDEX idx_usage_events_org_month ON usage_events (org_id, date_trunc('month', created_at));
// CREATE INDEX idx_usage_events_space ON usage_events (space_id, created_at);

How Context (orgId, spaceId) Flows to the LLM Layer

The challenge: LLM calls happen deep in service code. How does the metering layer know which org/space to attribute the cost to?

Solution: AsyncLocalStorage (Node.js built-in, zero dependency)

// packages/llm/src/context.ts
import { AsyncLocalStorage } from "node:async_hooks";

interface LLMContext {
  orgId?: string;
  spaceId?: string;
  userId?: string;
}

export const llmContext = new AsyncLocalStorage<LLMContext>();

// Used in API middleware:
// app.use("*", (c, next) => {
//   return llmContext.run({ orgId: c.get("orgId"), spaceId: c.get("spaceId") }, next);
// });

// The metered client reads this automatically:
// const ctx = llmContext.getStore();
// logUsageEvent({ orgId: ctx?.orgId, spaceId: ctx?.spaceId, ... });

This means: middleware sets the context once per request, and every LLM call in that request chain automatically gets attributed to the right org/space. No need to pass orgId through 5 layers of function calls.

Pricing Lookup

// packages/llm/src/pricing.ts
// Merges the existing OPENAI_PRICING from apps/api/src/services/models.ts
// with OpenRouter pricing (from their /models endpoint, cached)

export function lookupModelPricing(model: string, provider: string): {
  promptPer1M: number;   // USD per 1M prompt tokens
  completionPer1M: number; // USD per 1M completion tokens
} {
  // OpenAI: use hardcoded table (already exists in models.ts)
  // OpenRouter: their API returns pricing per model
  // Ollama/LM Studio: cost = 0 (local models)
  // Unknown: return { 0, 0 } (don't fail, just can't estimate cost)
}

What Changes for Each App

apps/api:

  • Delete src/lib/llm/index.ts and src/lib/llm/config.ts
  • Import { getSystemLLMConfig, getTaskLLMConfig, llm } from @condelo/llm
  • Add AsyncLocalStorage middleware to set org/space context
  • Move models.ts pricing data to packages/llm/src/pricing.ts
  • No changes to any service files — they already use the same function signatures

apps/data-plane:

  • Delete src/lib/llm/index.ts and src/lib/llm/config.ts
  • Import from @condelo/llm
  • Add context propagation (data-plane receives orgId/spaceId from API calls via headers → set in AsyncLocalStorage)
  • Remove getStoryLLMConfig backward-compat alias (use getTaskLLMConfig("story"))
  • No changes to any service files

packages/llm dependencies:

  • openai — OpenAI SDK
  • langsmith — LangSmith tracing wrapper
  • @condelo/db — For usage_events table writes
  • @condelo/shared — For LLMProvider types, PROVIDER_DEFAULTS

What Stays External

  • LangSmith tracing: Kept as-is. The metered client wraps LangSmith wrapping — so you get both LangSmith traces AND local usage logging. LangSmith is for debugging/observability, local metering is for billing.
  • Redis caching of system settings: The config functions call cacheGet/cacheSet. The package needs access to Redis — simplest approach is passing a cache interface at init time rather than importing directly from each app's lib/cache.ts.

Migration Path

  1. Create packages/llm/ with unified config + metered client
  2. Add usage_events table to packages/db/src/schema/
  3. Update apps/api to import from @condelo/llm, delete src/lib/llm/
  4. Update apps/data-plane to import from @condelo/llm, delete src/lib/llm/
  5. Add AsyncLocalStorage middleware to both apps
  6. Run npm run db:push to create usage_events table
  7. Verify: all LLM call sites now auto-log to usage_events — check with a few test queries
  8. Build usage aggregation queries (daily/monthly rollups by org) for the billing dashboard

Effort Estimate

TaskEffort
Create packages/llm/ package scaffolding2 hours
Merge two lib/llm/ into one (they're 95% identical)3 hours
Build MeteredOpenAI proxy (chat + embedding + streaming)1 day
Add usage_events schema + migration2 hours
Pricing lookup (port from models.ts + OpenRouter API)3 hours
AsyncLocalStorage context propagation in both apps3 hours
Update imports in API (9 call sites, all use same function names)2 hours
Update imports in Data Plane (29 call sites)3 hours
Cache interface abstraction (so package doesn't depend on each app's Redis)2 hours
Testing + verification1 day
Total~4-5 days

API Decomposition

Current State

The codebase already has a good two-service split:

  • API (port 6311): 20 route files — user-facing CRUD, auth, chat, orchestration
  • Data Plane (port 6312): 15 route files — heavy processing, ML ops, workers

This split is sound. The Data Plane handles the expensive async work (embedding, agents, doc processing) while the API handles user-facing requests. Don't break this up further — the question is how to organize within each service.

Proposed Logical Groups (Within Existing Services)

Rather than splitting into microservices (operational overhead too high for 2 people), organize routes into logical domains with clear boundaries. This makes future extraction easy if needed.

apps/api/src/routes/
├── core/                       ← Core Platform (always needed)
│   ├── health.ts
│   ├── spaces.ts
│   ├── documents.ts
│   ├── sources.ts
│   ├── webhook.ts              ← Webhook ingestion (a source type)
│   └── threads.ts + messages.ts
│
├── intelligence/               ← AI Features (the value layer)
│   ├── feeds.ts
│   ├── agents.ts
│   ├── inferences.ts
│   ├── explorations.ts
│   ├── stories.ts
│   └── surfaces.ts
│
├── engagement/                 ← User-facing notifications & events
│   ├── signals.ts
│   ├── notifications.ts
│   ├── events.ts
│   └── experiences.ts
│
├── admin/                      ← Operator-only
│   ├── settings.ts
│   ├── wiki.ts
│   ├── billing.ts              ← NEW: Stripe Connect management
│   ├── organizations.ts        ← NEW: Client/partner management
│   └── usage.ts                ← NEW: Usage dashboards
│
└── index.ts                    ← Route registration (unchanged)

The Data Plane stays as-is — it's already well-structured as an internal service.

Why Logical Groups, Not Microservices

ApproachOperational CostRight For
Monolith with route groups (recommended now)1 deploy, 1 process, 1 log stream2-person team, < 20 clients
2 services (current: API + Data Plane)Already done, works wellCurrent scale
4+ microservices4+ deploys, service discovery, distributed tracing10+ person team, 100+ clients

Key insight: the bottleneck for the next 12 months is client acquisition and app building, not system architecture. Route groups give you clean code organization with zero operational overhead. You can extract services later if a specific group needs independent scaling.

What Actually Needs to Be New Services (Eventually)

Only two things genuinely benefit from separate deployment:

  1. Worker processes (already separate — Data Plane workers run via dev:worker and dev:agent-worker)
  2. Client apps (already separate — each is its own React Router 7 app)

Everything else stays in the monolith for now.


Target Architecture (Updated)

┌───────────────────────────────────────────────────────────────────┐
│                  CLIENT APPS (separate stacks)                    │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐          │
│  │   CRM App     │  │  Legal App    │  │  Research App │  ...     │
│  │ React + BFF   │  │ React + BFF   │  │ React + BFF   │          │
│  │ + own DB      │  │ + own DB      │  │ + own DB      │          │
│  └──────┬────────┘  └──────┬────────┘  └──────┬────────┘          │
│         └──────────────────┼──────────────────┘                   │
│                            │ API Key + X-Space-Id                 │
└───────────────────────────┼───────────────────────────────────────┘
                            │
┌───────────────────────────┼───────────────────────────────────────┐
│  CONDELO PLATFORM          ▼                                      │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │                  API  (Hono, port 6311)                     │  │
│  │  ┌──────────┐  ┌──────────────┐  ┌──────────────┐           │  │
│  │  │  Core    │  │ Intelligence │  │    Admin     │           │  │
│  │  │  spaces  │  │ feeds,agents │  │ billing,orgs │           │  │
│  │  │  docs,src│  │ explorations │  │ settings     │           │  │
│  │  │  threads │  │ stories      │  │ usage        │           │  │
│  │  │  webhooks│  └──────────────┘  └──────────────┘           │  │
│  │  └──────────┘  ┌──────────────┐                             │  │
│  │                │  Engagement  │                             │  │
│  │                │  signals     │                             │  │
│  │                │  events,notif│                             │  │
│  │                └──────────────┘                             │  │
│  └─────────────────────────┼───────────────────────────────────┘  │
│                            │                                      │
│  ┌─────────────────────────┼───────────────────────────────────┐  │
│  │            DATA PLANE (port 6312)                           │  │
│  │  ┌─────────────┐  ┌──────────────┐  ┌─────────────┐         │  │
│  │  │  Ingestion  │  │  AI Agents   │  │  Search &   │         │  │
│  │  │  upload,    │  │  research,   │  │  Retrieval  │         │  │
│  │  │  convert,   │  │  stories,    │  │  vector,    │         │  │
│  │  │  chunk,embed│  │  wiki, etc.  │  │  keyword    │         │  │
│  │  └─────────────┘  └──────────────┘  └─────────────┘         │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                            │                                      │
│  ┌──────────────────────────┼──────────────────────────────────┐  │
│  │        @condelo/llm  (shared package)                       │  │
│  │  ┌─────────────┐  ┌──────────────┐  ┌─────────────┐         │  │
│  │  │   Client    │  │   Metering   │  │   Config    │         │  │
│  │  │  factory    │  │ token count  │  │  unified    │         │  │
│  │  │  (OpenAI)   │  │ cost track   │  │  DB + env   │         │  │
│  │  └─────────────┘  └──────┬───────┘  └─────────────┘         │  │
│  └──────────────────────────┼──────────────────────────────────┘  │
│                             │                                     │
│  ┌──────────────────────────┼──────────────────────────────────┐  │
│  │         DATA LAYER (shared, RLS-isolated)                   │  │
│  │  ┌──────────┐  ┌──────┐  ┌───────┐  ┌───────┐               │  │
│  │  │ Postgres │  │Qdrant│  │ Redis │  │ MinIO │               │  │
│  │  │+usage_   │  │      │  │       │  │       │               │  │
│  │  │ events   │  │      │  │       │  │       │               │  │
│  │  └──────────┘  └──────┘  └───────┘  └───────┘               │  │
│  └─────────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────────┘

Key Architectural Decision: Logical vs Physical Isolation

Recommendation: Logical isolation (shared infra, RLS boundaries) for now.

Reasons:

  • Already works — spaces + RLS provide strong data isolation
  • Lower operational cost (one Postgres, one Qdrant)
  • Simpler to manage with a 2-person team
  • Scale to ~20 clients before needing dedicated databases

Physical isolation (per-client databases) becomes necessary when:

  • Client demands it contractually (regulated industries)
  • Performance isolation required (noisy neighbor problems)
  • Data residency requirements (different regions)

Making the unknown, known.

© 2026 Condelo. All rights reserved.