Chunking & Embedding — Condelo Platform

Overview

Chunking and embedding is the process of converting raw document text into searchable vector representations. Text is split into overlapping chunks using a smart break point algorithm that preserves semantic coherence, then each chunk is embedded via an external API and stored in the vector store for retrieval.

The system handles both batch embedding (for document ingestion) and cached query embedding (for search), with separate code paths optimised for each use case.

Key Concepts

Smart break points — Chunks split at natural language boundaries (paragraph, sentence, line, word) rather than arbitrary character positions, preserving meaning within each chunk.
Overlap — Adjacent chunks share 200 characters of overlap so that sentences spanning a boundary appear in full in at least one chunk.
Deterministic point IDs — Each chunk gets a UUID v5 derived from ${documentId}:${chunkIndex}, making upserts idempotent.
Batch processing — Embeddings are generated in batches respecting both item count (max 2048) and token budget (default 100,000 tokens) limits.
Cached query embeddings — Search queries are embedded once and cached for 24 hours using a SHA-256 hash of the model and text as the cache key.
Multi-model support — Documents track their embedding provider, model, and dimensions, so the platform can run multiple embedding models simultaneously.

How It Works

Text Splitting

Input text is normalised (CRLF to LF).
The splitter walks the text in windows of CHUNK_SIZE (default 1000 characters).
For each window, it searches for the best break point using a priority order: paragraph break (double newline), sentence break (. ! ? followed by space or newline), line break, word break (space).
A break point is only accepted if it falls past 30-50% of the chunk size, preventing tiny fragments.
Each resulting chunk records: content, index, startChar, endChar.

Embedding Generation

Chunks are grouped into batches. Each batch stays within MAX_ITEMS_PER_BATCH (2048) and DEFAULT_MAX_TOKENS_PER_BATCH (100,000 tokens, estimated at ~4 characters per token).
Batches are sent to the embedding API with concurrency control (default 2 concurrent requests, configurable via EMBEDDING_API_CONCURRENCY).
Transient errors trigger retry logic; permanent failures propagate immediately.
Embedded chunks are upserted into the appropriate model-specific Qdrant collection using deterministic point IDs.

Cached Query Embedding

A SHA-256 hash is computed from the model name and query text.
If a cached embedding exists and is less than 24 hours old, it is returned immediately.
Otherwise, the embedding is generated, cached, and returned.
The cache is immutable — the same model + text always produces the same embedding, so no invalidation logic is needed.

Why It Works This Way

Overlap Prevents Lost Context at Boundaries

The 200-character overlap between adjacent chunks ensures that sentences straddling a chunk boundary appear in full in at least one chunk. Without overlap, a retrieval query matching a split sentence would return a partial, misleading result.

Smart Break Points Preserve Semantic Coherence

Splitting at paragraph, sentence, or line boundaries rather than arbitrary character positions keeps complete thoughts within a single chunk. This directly improves retrieval relevance because the embedding captures a coherent meaning rather than a fragment.

Deterministic Point IDs Enable Idempotent Upserts

UUID v5 generated from documentId + chunkIndex means re-processing a document replaces its existing vectors in Qdrant rather than creating duplicates. This makes the ingestion pipeline safely re-runnable without cleanup steps.

Cached Query Embeddings Avoid Redundant API Calls

Repeated or similar search queries (common in agent tool loops) hit the cache instead of the embedding API. The 24-hour TTL is generous because embeddings for the same model and text are deterministic — the cache never returns stale data.

Configuration

Env Var	Description
`EMBEDDING_API_CONCURRENCY`	Max concurrent embedding API requests (default `2`)
`EMBEDDING_MAX_TOKENS_PER_BATCH`	Max tokens per embedding batch (default `100000`)

Code Reference

File	Description
`apps/data-plane/src/lib/text-splitter.ts`	`splitText()`, `findBreakPoint()`, `DEFAULT_CHUNK_SIZE=1000`, `DEFAULT_OVERLAP=200`
`apps/data-plane/src/services/embedding.ts`	`generateEmbeddings()`, `generateCachedQueryEmbedding()`, `buildBatches()`
`apps/data-plane/src/lib/vector-store.ts`	`pointId()` for deterministic UUID v5 generation
`apps/data-plane/src/lib/cache.ts`	Caching layer for query embeddings

Relationships

Ingestion Pipeline — Chunking and embedding are stages 1 and 2 of the ingestion pipeline
Vector Store & Search — Embedded chunks are stored in Qdrant and retrieved during search
Metadata Extraction — Runs after embedding as stage 3, enriching documents with structured metadata
Spaces — Each chunk carries a space_id for RLS-enforced isolation