Platform Strategy

Platform Gaps (Must Fix Before First Paying Client)

Critical (Blocking)

Gap	Current State	What's Needed	Effort
Multi-tenancy / Orgs	Single user owns spaces. No orgs, teams, or shared access	Organizations table, membership, role-based access, RLS rewrite	2-3 weeks
Client isolation	All clients share one Postgres + Qdrant + Redis	Per-client spaces (logical isolation) or per-client databases (physical). Logical is fine initially	Already works via spaces
API key management	Demo apps use hardcoded Bearer tokens from env vars	API key table, key generation, rotation, scoping (per-space, per-app), rate limits	1-2 weeks
Usage metering	Zero tracking of LLM tokens, API calls, storage, or processing	Usage events table, middleware to log every metered action, aggregation for billing	2-3 weeks
Stripe Connect	Nothing exists	Stripe Connect for marketplace billing: end-user subscriptions, 30/70 auto-split, partner payouts, revenue dashboards	2-3 weeks
Deployment	No Dockerfiles, no CI/CD, local-only	Dockerfiles for all services, docker-compose.prod.yml, CI/CD pipeline, secrets management	2-3 weeks
HTTPS / Domain	Hardcoded localhost CORS	Proper domain setup, TLS, environment-aware CORS, reverse proxy	1 week

Important (First 3 months)

Gap	What's Needed	Effort
Rate limiting	Per-key, per-endpoint rate limits (token bucket or similar)	1 week
Admin dashboard	Platform operator view: all clients, usage, billing, health	2-3 weeks
Client onboarding flow	Automated: create org, provision space, generate API keys, configure webhooks	1-2 weeks
Backup & recovery	Automated PostgreSQL + Qdrant backups, tested recovery procedure	1 week
Monitoring & alerting	Proper dashboards, PagerDuty/Slack alerts for downtime, errors, queue backlogs	1 week
App template / SDK	Starter kit for building apps on top of Condelo (auth, API client, common patterns)	1-2 weeks

Nice to Have (6+ months)

Gap	What's Needed
RBAC (fine-grained permissions)	Beyond admin/member — editor, viewer, custom roles
Audit logging	Who did what, when, for compliance-heavy clients
Multi-region deployment	For clients requiring data residency
White-labeling	Custom domains, branding per client app
Webhook delivery guarantees	Retry logic, dead letter queue, delivery status

LLM Plane: Centralized AI Metering

The Problem

LLM calls are scattered across ~38 call sites in two apps (API + Data Plane), using two nearly-identical abstraction layers. There's zero local token tracking — only LangSmith (external). To bill clients on usage, you need to meter every LLM call and attribute costs to an org/space.

Current State

apps/api/src/lib/llm/           ← LLM abstraction (config + client factory)
apps/data-plane/src/lib/llm/    ← Nearly identical copy

Call sites:
  API:         9 call sites (chat, exploration, feed/agent suggestions)
  Data Plane: 29 call sites (embedding, metadata, agents, stories, wiki, etc.)
  Total:      38 call sites, 0 token tracking

Three Options for the LLM Plane

Option A: Separate HTTP Service (LLM Proxy)

[API] ──HTTP──→ [LLM Plane :6316] ──HTTP──→ [OpenAI/OpenRouter]
[Data Plane] ──HTTP──→ [LLM Plane :6316] ──HTTP──→ [OpenAI/OpenRouter]

A new Hono service that proxies all LLM calls, meters them, and forwards to providers.

Pros	Cons
Complete isolation — all LLM traffic flows through one point	Extra network hop adds latency (~5-20ms per call)
Can enforce rate limits, quotas, model routing centrally	New service to deploy, monitor, and keep running
Can swap providers without touching calling code	Streaming passthrough is complex to implement correctly
Could serve external clients later (API-as-a-service)	Operational overhead for a 2-person team

Option B: Shared Package `@condelo/llm` (RECOMMENDED)

packages/llm/                   ← New shared package
├── src/
│   ├── client.ts              ← Metered OpenAI client wrapper
│   ├── config.ts              ← Unified config (replaces both copies)
│   ├── metering.ts            ← Token counting + cost estimation + DB logging
│   ├── providers.ts           ← Provider registry (OpenAI, OpenRouter, Ollama, etc.)
│   └── index.ts

Both API and Data Plane import @condelo/llm instead of their own lib/llm/. The metering layer wraps every client.chat.completions.create() and client.embeddings.create() call, extracts response.usage, and logs to the usage_events table.

Pros	Cons
Zero latency overhead (same process)	Both apps must share the same package version
Eliminates duplicated code (two `lib/llm/` folders → one package)	Token tracking is in-process, not centralized
Simple to implement — wrap existing client factory	Can't rate-limit across services (each service limits independently)
No new service to deploy or monitor
Metering data still goes to shared DB — same visibility

Option C: External LLM Proxy (LiteLLM, Helicone, etc.)

Use an existing open-source proxy like LiteLLM or a managed service like Helicone.

Pros	Cons
Battle-tested, handles streaming, retries, fallbacks	External dependency
Built-in dashboards and cost tracking	Another service to deploy (LiteLLM) or monthly fee (Helicone ~$40/mo)
Model routing, load balancing, caching	May not integrate cleanly with your metering/billing needs

Recommendation: Option B (Shared Package)

For a 2-person team, a shared package is the right balance:

Eliminates duplication — the two identical lib/llm/ folders merge into packages/llm/
Zero operational overhead — no new service to deploy
Automatic metering — wraps the OpenAI client to extract response.usage after every call
Attributes costs — every call tagged with { orgId, spaceId, taskType, model } and logged to usage_events
Simple upgrade path — if you later need a separate service (Option A), the package becomes the client SDK for it

Detailed Implementation Design

Package Structure

packages/llm/
├── src/
│   ├── index.ts                 # Public API: createLLM(), types
│   ├── client.ts                # MeteredOpenAI client (wraps OpenAI SDK)
│   ├── config.ts                # Unified config (merges both copies)
│   ├── metering.ts              # Usage logging to DB
│   ├── pricing.ts               # Model pricing lookup (from models.ts)
│   ├── providers.ts             # Provider registry + key resolution
│   └── types.ts                 # Shared types
├── package.json                 # Dependencies: openai, @condelo/db, @condelo/shared, langsmith
└── tsconfig.json

How Metering Works (Zero Changes to Callers)

The key insight: metering wraps the OpenAI client at creation time, so all 38 call sites are metered automatically without any code changes.

Currently callers do:

// apps/api/src/services/chat.ts (line 319-320)
const { client, model } = await getSystemLLMConfig();
const response = await client.chat.completions.create({ model, messages, ... });
// response.usage is IGNORED — tokens discarded

After the migration, callers still do the exact same thing:

const { client, model } = await getSystemLLMConfig();
const response = await client.chat.completions.create({ model, messages, ... });
// But now 'client' is a MeteredOpenAI that auto-logs usage

MeteredOpenAI Client Design

// packages/llm/src/client.ts

import OpenAI from "openai";
import { logUsageEvent } from "./metering.js";
import { lookupModelPricing } from "./pricing.js";

interface MeteringContext {
  orgId?: string;     // Set per-request via AsyncLocalStorage or explicit param
  spaceId?: string;
  taskType: string;   // "chat", "embedding", "agent", "story", "quick", "metadata"
}

// Returns a Proxy around the OpenAI client that intercepts completions.create()
// and embeddings.create() to capture response.usage
export function createMeteredClient(
  baseClient: OpenAI,
  defaultContext: MeteringContext
): OpenAI {
  // Proxy intercepts client.chat.completions.create() calls
  // After the response resolves:
  //   1. Extract response.usage.prompt_tokens + completion_tokens
  //   2. Look up pricing: lookupModelPricing(model, provider)
  //   3. Calculate cost: (prompt_tokens * prompt_price + completion_tokens * completion_price) / 1_000_000
  //   4. Fire-and-forget: logUsageEvent({ ...context, model, tokensIn, tokensOut, cost })

  // For streaming responses (stream: true):
  //   OpenAI SDK streams include a final chunk with usage data when
  //   stream_options: { include_usage: true } is set.
  //   The proxy adds this option automatically and captures the final usage chunk.

  return new Proxy(baseClient, { ... });
}

Critical detail for streaming: OpenAI's streaming API can include usage in the final chunk when you set stream_options: { include_usage: true }. The proxy automatically injects this option so streaming calls are metered too — callers don't need to change anything.

Usage Event Schema

// packages/db/src/schema/usage.ts
export const usageEvents = pgTable("usage_events", {
  id: uuid("id").defaultRandom().primaryKey(),
  orgId: uuid("org_id"),                              // null for internal/unattributed
  spaceId: uuid("space_id"),                           // null for system-level calls
  eventType: text("event_type").notNull(),             // "llm_chat", "llm_embedding", "doc_process", "agent_run"
  model: text("model"),                                // "gpt-4o-mini", "text-embedding-3-small"
  provider: text("provider"),                          // "openai", "openrouter"
  taskType: text("task_type"),                         // "chat", "agent", "story", "quick", "metadata", "embedding"
  tokensIn: integer("tokens_in"),                      // prompt_tokens
  tokensOut: integer("tokens_out"),                    // completion_tokens
  estimatedCost: numeric("estimated_cost", { precision: 10, scale: 6 }), // in USD
  metadata: jsonb("metadata"),                         // Additional context (thread_id, agent_run_id, etc.)
  createdAt: timestamp("created_at").defaultNow().notNull(),
});

// Indexes for billing queries
// CREATE INDEX idx_usage_events_org_month ON usage_events (org_id, date_trunc('month', created_at));
// CREATE INDEX idx_usage_events_space ON usage_events (space_id, created_at);

How Context (orgId, spaceId) Flows to the LLM Layer

The challenge: LLM calls happen deep in service code. How does the metering layer know which org/space to attribute the cost to?

Solution: AsyncLocalStorage (Node.js built-in, zero dependency)

// packages/llm/src/context.ts
import { AsyncLocalStorage } from "node:async_hooks";

interface LLMContext {
  orgId?: string;
  spaceId?: string;
  userId?: string;
}

export const llmContext = new AsyncLocalStorage<LLMContext>();

// Used in API middleware:
// app.use("*", (c, next) => {
//   return llmContext.run({ orgId: c.get("orgId"), spaceId: c.get("spaceId") }, next);
// });

// The metered client reads this automatically:
// const ctx = llmContext.getStore();
// logUsageEvent({ orgId: ctx?.orgId, spaceId: ctx?.spaceId, ... });

This means: middleware sets the context once per request, and every LLM call in that request chain automatically gets attributed to the right org/space. No need to pass orgId through 5 layers of function calls.

Pricing Lookup

// packages/llm/src/pricing.ts
// Merges the existing OPENAI_PRICING from apps/api/src/services/models.ts
// with OpenRouter pricing (from their /models endpoint, cached)

export function lookupModelPricing(model: string, provider: string): {
  promptPer1M: number;   // USD per 1M prompt tokens
  completionPer1M: number; // USD per 1M completion tokens
} {
  // OpenAI: use hardcoded table (already exists in models.ts)
  // OpenRouter: their API returns pricing per model
  // Ollama/LM Studio: cost = 0 (local models)
  // Unknown: return { 0, 0 } (don't fail, just can't estimate cost)
}

What Changes for Each App

apps/api:

Delete src/lib/llm/index.ts and src/lib/llm/config.ts
Import { getSystemLLMConfig, getTaskLLMConfig, llm } from @condelo/llm
Add AsyncLocalStorage middleware to set org/space context
Move models.ts pricing data to packages/llm/src/pricing.ts
No changes to any service files — they already use the same function signatures

apps/data-plane:

Delete src/lib/llm/index.ts and src/lib/llm/config.ts
Import from @condelo/llm
Add context propagation (data-plane receives orgId/spaceId from API calls via headers → set in AsyncLocalStorage)
Remove getStoryLLMConfig backward-compat alias (use getTaskLLMConfig("story"))
No changes to any service files

packages/llm dependencies:

openai — OpenAI SDK
langsmith — LangSmith tracing wrapper
@condelo/db — For usage_events table writes
@condelo/shared — For LLMProvider types, PROVIDER_DEFAULTS

What Stays External

LangSmith tracing: Kept as-is. The metered client wraps LangSmith wrapping — so you get both LangSmith traces AND local usage logging. LangSmith is for debugging/observability, local metering is for billing.
Redis caching of system settings: The config functions call cacheGet/cacheSet. The package needs access to Redis — simplest approach is passing a cache interface at init time rather than importing directly from each app's lib/cache.ts.

Migration Path

Create packages/llm/ with unified config + metered client
Add usage_events table to packages/db/src/schema/
Update apps/api to import from @condelo/llm, delete src/lib/llm/
Update apps/data-plane to import from @condelo/llm, delete src/lib/llm/
Add AsyncLocalStorage middleware to both apps
Run npm run db:push to create usage_events table
Verify: all LLM call sites now auto-log to usage_events — check with a few test queries
Build usage aggregation queries (daily/monthly rollups by org) for the billing dashboard

Effort Estimate

Task	Effort
Create `packages/llm/` package scaffolding	2 hours
Merge two `lib/llm/` into one (they're 95% identical)	3 hours
Build MeteredOpenAI proxy (chat + embedding + streaming)	1 day
Add `usage_events` schema + migration	2 hours
Pricing lookup (port from models.ts + OpenRouter API)	3 hours
AsyncLocalStorage context propagation in both apps	3 hours
Update imports in API (9 call sites, all use same function names)	2 hours
Update imports in Data Plane (29 call sites)	3 hours
Cache interface abstraction (so package doesn't depend on each app's Redis)	2 hours
Testing + verification	1 day
Total	~4-5 days

API Decomposition

Current State

The codebase already has a good two-service split:

API (port 6311): 20 route files — user-facing CRUD, auth, chat, orchestration
Data Plane (port 6312): 15 route files — heavy processing, ML ops, workers

This split is sound. The Data Plane handles the expensive async work (embedding, agents, doc processing) while the API handles user-facing requests. Don't break this up further — the question is how to organize within each service.

Proposed Logical Groups (Within Existing Services)

Rather than splitting into microservices (operational overhead too high for 2 people), organize routes into logical domains with clear boundaries. This makes future extraction easy if needed.

apps/api/src/routes/
├── core/                       ← Core Platform (always needed)
│   ├── health.ts
│   ├── spaces.ts
│   ├── documents.ts
│   ├── sources.ts
│   ├── webhook.ts              ← Webhook ingestion (a source type)
│   └── threads.ts + messages.ts
│
├── intelligence/               ← AI Features (the value layer)
│   ├── feeds.ts
│   ├── agents.ts
│   ├── inferences.ts
│   ├── explorations.ts
│   ├── stories.ts
│   └── surfaces.ts
│
├── engagement/                 ← User-facing notifications & events
│   ├── signals.ts
│   ├── notifications.ts
│   ├── events.ts
│   └── experiences.ts
│
├── admin/                      ← Operator-only
│   ├── settings.ts
│   ├── wiki.ts
│   ├── billing.ts              ← NEW: Stripe Connect management
│   ├── organizations.ts        ← NEW: Client/partner management
│   └── usage.ts                ← NEW: Usage dashboards
│
└── index.ts                    ← Route registration (unchanged)

The Data Plane stays as-is — it's already well-structured as an internal service.

Why Logical Groups, Not Microservices

Approach	Operational Cost	Right For
Monolith with route groups (recommended now)	1 deploy, 1 process, 1 log stream	2-person team, < 20 clients
2 services (current: API + Data Plane)	Already done, works well	Current scale
4+ microservices	4+ deploys, service discovery, distributed tracing	10+ person team, 100+ clients

Key insight: the bottleneck for the next 12 months is client acquisition and app building, not system architecture. Route groups give you clean code organization with zero operational overhead. You can extract services later if a specific group needs independent scaling.

What Actually Needs to Be New Services (Eventually)

Only two things genuinely benefit from separate deployment:

Worker processes (already separate — Data Plane workers run via dev:worker and dev:agent-worker)
Client apps (already separate — each is its own React Router 7 app)

Everything else stays in the monolith for now.

Target Architecture (Updated)

┌───────────────────────────────────────────────────────────────────┐
│                  CLIENT APPS (separate stacks)                    │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐          │
│  │   CRM App     │  │  Legal App    │  │  Research App │  ...     │
│  │ React + BFF   │  │ React + BFF   │  │ React + BFF   │          │
│  │ + own DB      │  │ + own DB      │  │ + own DB      │          │
│  └──────┬────────┘  └──────┬────────┘  └──────┬────────┘          │
│         └──────────────────┼──────────────────┘                   │
│                            │ API Key + X-Space-Id                 │
└───────────────────────────┼───────────────────────────────────────┘
                            │
┌───────────────────────────┼───────────────────────────────────────┐
│  CONDELO PLATFORM          ▼                                      │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │                  API  (Hono, port 6311)                     │  │
│  │  ┌──────────┐  ┌──────────────┐  ┌──────────────┐           │  │
│  │  │  Core    │  │ Intelligence │  │    Admin     │           │  │
│  │  │  spaces  │  │ feeds,agents │  │ billing,orgs │           │  │
│  │  │  docs,src│  │ explorations │  │ settings     │           │  │
│  │  │  threads │  │ stories      │  │ usage        │           │  │
│  │  │  webhooks│  └──────────────┘  └──────────────┘           │  │
│  │  └──────────┘  ┌──────────────┐                             │  │
│  │                │  Engagement  │                             │  │
│  │                │  signals     │                             │  │
│  │                │  events,notif│                             │  │
│  │                └──────────────┘                             │  │
│  └─────────────────────────┼───────────────────────────────────┘  │
│                            │                                      │
│  ┌─────────────────────────┼───────────────────────────────────┐  │
│  │            DATA PLANE (port 6312)                           │  │
│  │  ┌─────────────┐  ┌──────────────┐  ┌─────────────┐         │  │
│  │  │  Ingestion  │  │  AI Agents   │  │  Search &   │         │  │
│  │  │  upload,    │  │  research,   │  │  Retrieval  │         │  │
│  │  │  convert,   │  │  stories,    │  │  vector,    │         │  │
│  │  │  chunk,embed│  │  wiki, etc.  │  │  keyword    │         │  │
│  │  └─────────────┘  └──────────────┘  └─────────────┘         │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                            │                                      │
│  ┌──────────────────────────┼──────────────────────────────────┐  │
│  │        @condelo/llm  (shared package)                       │  │
│  │  ┌─────────────┐  ┌──────────────┐  ┌─────────────┐         │  │
│  │  │   Client    │  │   Metering   │  │   Config    │         │  │
│  │  │  factory    │  │ token count  │  │  unified    │         │  │
│  │  │  (OpenAI)   │  │ cost track   │  │  DB + env   │         │  │
│  │  └─────────────┘  └──────┬───────┘  └─────────────┘         │  │
│  └──────────────────────────┼──────────────────────────────────┘  │
│                             │                                     │
│  ┌──────────────────────────┼──────────────────────────────────┐  │
│  │         DATA LAYER (shared, RLS-isolated)                   │  │
│  │  ┌──────────┐  ┌──────┐  ┌───────┐  ┌───────┐               │  │
│  │  │ Postgres │  │Qdrant│  │ Redis │  │ MinIO │               │  │
│  │  │+usage_   │  │      │  │       │  │       │               │  │
│  │  │ events   │  │      │  │       │  │       │               │  │
│  │  └──────────┘  └──────┘  └───────┘  └───────┘               │  │
│  └─────────────────────────────────────────────────────────────┘  │
└───────────────────────────────────────────────────────────────────┘

Key Architectural Decision: Logical vs Physical Isolation

Recommendation: Logical isolation (shared infra, RLS boundaries) for now.

Reasons:

Already works — spaces + RLS provide strong data isolation
Lower operational cost (one Postgres, one Qdrant)
Simpler to manage with a 2-person team
Scale to ~20 clients before needing dedicated databases

Physical isolation (per-client databases) becomes necessary when:

Client demands it contractually (regulated industries)
Performance isolation required (noisy neighbor problems)
Data residency requirements (different regions)

Platform Strategy

Platform Strategy

Platform Gaps (Must Fix Before First Paying Client)

Critical (Blocking)

Important (First 3 months)

Nice to Have (6+ months)

LLM Plane: Centralized AI Metering

The Problem

Current State

Three Options for the LLM Plane

Option A: Separate HTTP Service (LLM Proxy)

Option B: Shared Package @condelo/llm (RECOMMENDED)

Option C: External LLM Proxy (LiteLLM, Helicone, etc.)

Recommendation: Option B (Shared Package)

Detailed Implementation Design

Package Structure

How Metering Works (Zero Changes to Callers)

MeteredOpenAI Client Design

Usage Event Schema

How Context (orgId, spaceId) Flows to the LLM Layer

Pricing Lookup

What Changes for Each App

What Stays External

Migration Path

Effort Estimate

API Decomposition

Current State

Proposed Logical Groups (Within Existing Services)

Why Logical Groups, Not Microservices

What Actually Needs to Be New Services (Eventually)

Target Architecture (Updated)

Key Architectural Decision: Logical vs Physical Isolation

Option B: Shared Package `@condelo/llm` (RECOMMENDED)