Overview
The tool registry is a central system that manages 12 LLM-callable tools available to agents during conversations. Each tool has a definition (OpenAI function calling schema), an async handler, and an availability predicate that controls whether the tool appears in the LLM's context for a given request.
Tools are grouped into four categories: navigation (folder and source browsing), search (vector, keyword, and filename lookup), document (reading, analysis, and structured queries), and web (external search and page fetching). The registry ensures agents only see tools they can actually use, reducing hallucinated calls to unavailable functionality.
Key Concepts
- Availability predicates — Each tool defines an
available(context)function that determines whether the tool appears in the LLM's tool list for the current request. - ToolContext — Every tool call receives
{ userId, spaceId, requestId }, providing identity and scope. - ToolCallbacks — Streaming callbacks enable intermediate results to be sent to the client during tool execution, including sub-agent progress.
- Single registry — All tools register at startup via
registerTool(), making the registry the single source of truth for tool existence and definitions. - Sub-agent streaming — Tools like
analyze_documentspawn sub-agents whose intermediate output (chunks, tool calls, results) streams back to the user via callbacks.
Data Model
ToolEntry
| Field | Type | Description |
|---|---|---|
definition | OpenAI function tool schema | Name, description, and JSON Schema parameters |
handler | async (args, context, callbacks) => ToolResult | Execution logic |
available | (context: ToolContext) => boolean | Visibility predicate |
ToolContext
| Field | Type | Description |
|---|---|---|
userId | string | Authenticated user ID |
spaceId | string or null | Current space (null for cross-space queries) |
requestId | string | Request correlation ID |
ToolCallbacks
| Callback | Purpose |
|---|---|
onToolCall | Tool invocation started |
onToolResult | Tool returned a result |
onSubAgentStart | Sub-agent spawned (e.g., for analyze_document) |
onSubAgentChunk | Sub-agent produced a text chunk |
onSubAgentToolCall | Sub-agent called a tool |
onSubAgentToolResult | Sub-agent tool returned |
onSubAgentEnd | Sub-agent finished |
How It Works
Registration
At startup, each tool calls registerTool() with its definition, handler, and availability predicate. The registry stores entries in a Map<string, ToolEntry> and throws on duplicate names.
Tool Selection
When the LLM needs its tool list, getAvailableTools(context) iterates the registry and returns only tools where available(context) returns true. This means the LLM never sees tools it cannot use.
Execution
executeTool(name, args, context, callbacks) looks up the handler and invokes it with the parsed arguments, context, and streaming callbacks. Results are returned as ToolResult objects.
The 12 Tools
Navigation Tools (require spaceId)
| Tool | Description |
|---|---|
list_sources | List all sources in the current space |
list_folder_contents | List contents of a specific folder within a source |
folder_tree | Get the hierarchical folder structure of a source |
These tools are only available when spaceId is set, since navigation is meaningless without a space context.
Search Tools (always available)
| Tool | Description |
|---|---|
search_documents | Hybrid, vector, or keyword search across document chunks via the vector store |
grep_documents | Exact text matching across raw document content |
find_by_name | Find documents by filename pattern (glob-style matching) |
Document Tools (always available)
| Tool | Description |
|---|---|
document_info | Get extracted metadata for a specific document |
read_document | Read full or partial document content by character range |
analyze_document | Deep analysis of a document using a sub-agent with its own tool access |
query_documents | Text-to-SQL queries against tabular data stored as JSONB |
Web Tools (conditional)
| Tool | Description |
|---|---|
web_search | Search the web via API (only available when web search is enabled in system settings) |
fetch_web_page | Fetch and extract readable content from a URL |
Why It Works This Way
Availability Predicates Reduce Hallucinated Calls
When an LLM sees a tool in its context, it may try to call it regardless of whether it will succeed. By filtering tools with availability predicates before sending the tool list, agents never see list_sources when there is no space context, or web_search when the feature is disabled. This eliminates an entire class of failed tool calls.
Streaming Callbacks Enable Visible Intermediate Work
Tools like analyze_document spawn a sub-agent that may run for several seconds, making its own tool calls and producing intermediate text. The callback system streams this progress to the user in real time, so they see the agent working rather than waiting for a final answer. This builds trust and lets users interrupt if the agent goes off track.
Text-to-SQL Gives Agents Structured Data Access
The query_documents tool translates natural language into SQL queries against tabular data stored in JSONB columns. Without this, agents would need to retrieve raw CSV/spreadsheet content via vector search and parse it themselves — unreliable for numerical queries, aggregations, and filtering.
Separate Definitions From Handlers
Tool definitions (OpenAI function schemas) live in /definitions/ and handlers in /handlers/. This separation means the registry file is a clean mapping, definitions can be tested against the OpenAI schema spec independently, and handlers can be unit tested with mock contexts.
Code Reference
| File | Description |
|---|---|
apps/data-plane/src/tools/registry.ts | registerTool(), getAvailableTools(), executeTool() |
apps/data-plane/src/tools/types.ts | ToolContext, ToolCallbacks, ToolEntry, ToolResult types |
apps/data-plane/src/tools/index.ts | All 12 tool registrations with availability predicates |
apps/data-plane/src/tools/definitions/ | Individual tool OpenAI function definitions |
apps/data-plane/src/tools/handlers/ | Individual tool handler implementations |
Relationships
- Vector Store & Search — The
search_documentstool delegates to the vector store's hybrid search - Metadata Extraction — The
document_infotool returns extracted metadata fields - Chunking & Embedding — Search tools rely on chunks and embeddings created during ingestion
- Feeds — Search tools can scope queries to specific feeds
- Spaces — Navigation tools require a space context; all tool calls are RLS-scoped