Tool Registry

Overview

The tool registry is a central system that manages 12 LLM-callable tools available to agents during conversations. Each tool has a definition (OpenAI function calling schema), an async handler, and an availability predicate that controls whether the tool appears in the LLM's context for a given request.

Tools are grouped into four categories: navigation (folder and source browsing), search (vector, keyword, and filename lookup), document (reading, analysis, and structured queries), and web (external search and page fetching). The registry ensures agents only see tools they can actually use, reducing hallucinated calls to unavailable functionality.

Key Concepts

Availability predicates — Each tool defines an available(context) function that determines whether the tool appears in the LLM's tool list for the current request.
ToolContext — Every tool call receives { userId, spaceId, requestId }, providing identity and scope.
ToolCallbacks — Streaming callbacks enable intermediate results to be sent to the client during tool execution, including sub-agent progress.
Single registry — All tools register at startup via registerTool(), making the registry the single source of truth for tool existence and definitions.
Sub-agent streaming — Tools like analyze_document spawn sub-agents whose intermediate output (chunks, tool calls, results) streams back to the user via callbacks.

Data Model

ToolEntry

Field	Type	Description
`definition`	OpenAI function tool schema	Name, description, and JSON Schema parameters
`handler`	`async (args, context, callbacks) => ToolResult`	Execution logic
`available`	`(context: ToolContext) => boolean`	Visibility predicate

ToolContext

Field	Type	Description
`userId`	string	Authenticated user ID
`spaceId`	string or null	Current space (null for cross-space queries)
`requestId`	string	Request correlation ID

ToolCallbacks

Callback	Purpose
`onToolCall`	Tool invocation started
`onToolResult`	Tool returned a result
`onSubAgentStart`	Sub-agent spawned (e.g., for `analyze_document`)
`onSubAgentChunk`	Sub-agent produced a text chunk
`onSubAgentToolCall`	Sub-agent called a tool
`onSubAgentToolResult`	Sub-agent tool returned
`onSubAgentEnd`	Sub-agent finished

How It Works

Registration

At startup, each tool calls registerTool() with its definition, handler, and availability predicate. The registry stores entries in a Map<string, ToolEntry> and throws on duplicate names.

Tool Selection

When the LLM needs its tool list, getAvailableTools(context) iterates the registry and returns only tools where available(context) returns true. This means the LLM never sees tools it cannot use.

Execution

executeTool(name, args, context, callbacks) looks up the handler and invokes it with the parsed arguments, context, and streaming callbacks. Results are returned as ToolResult objects.

The 12 Tools

Tool	Description
`list_sources`	List all sources in the current space
`list_folder_contents`	List contents of a specific folder within a source
`folder_tree`	Get the hierarchical folder structure of a source

These tools are only available when spaceId is set, since navigation is meaningless without a space context.

Search Tools (always available)

Tool	Description
`search_documents`	Hybrid, vector, or keyword search across document chunks via the vector store
`grep_documents`	Exact text matching across raw document content
`find_by_name`	Find documents by filename pattern (glob-style matching)

Document Tools (always available)

Tool	Description
`document_info`	Get extracted metadata for a specific document
`read_document`	Read full or partial document content by character range
`analyze_document`	Deep analysis of a document using a sub-agent with its own tool access
`query_documents`	Text-to-SQL queries against tabular data stored as JSONB

Web Tools (conditional)

Tool	Description
`web_search`	Search the web via API (only available when web search is enabled in system settings)
`fetch_web_page`	Fetch and extract readable content from a URL

Why It Works This Way

Availability Predicates Reduce Hallucinated Calls

When an LLM sees a tool in its context, it may try to call it regardless of whether it will succeed. By filtering tools with availability predicates before sending the tool list, agents never see list_sources when there is no space context, or web_search when the feature is disabled. This eliminates an entire class of failed tool calls.

Streaming Callbacks Enable Visible Intermediate Work

Tools like analyze_document spawn a sub-agent that may run for several seconds, making its own tool calls and producing intermediate text. The callback system streams this progress to the user in real time, so they see the agent working rather than waiting for a final answer. This builds trust and lets users interrupt if the agent goes off track.

Text-to-SQL Gives Agents Structured Data Access

The query_documents tool translates natural language into SQL queries against tabular data stored in JSONB columns. Without this, agents would need to retrieve raw CSV/spreadsheet content via vector search and parse it themselves — unreliable for numerical queries, aggregations, and filtering.

Separate Definitions From Handlers

Tool definitions (OpenAI function schemas) live in /definitions/ and handlers in /handlers/. This separation means the registry file is a clean mapping, definitions can be tested against the OpenAI schema spec independently, and handlers can be unit tested with mock contexts.

Code Reference

File	Description
`apps/data-plane/src/tools/registry.ts`	`registerTool()`, `getAvailableTools()`, `executeTool()`
`apps/data-plane/src/tools/types.ts`	`ToolContext`, `ToolCallbacks`, `ToolEntry`, `ToolResult` types
`apps/data-plane/src/tools/index.ts`	All 12 tool registrations with availability predicates
`apps/data-plane/src/tools/definitions/`	Individual tool OpenAI function definitions
`apps/data-plane/src/tools/handlers/`	Individual tool handler implementations

Relationships

Vector Store & Search — The search_documents tool delegates to the vector store's hybrid search
Metadata Extraction — The document_info tool returns extracted metadata fields
Chunking & Embedding — Search tools rely on chunks and embeddings created during ingestion
Feeds — Search tools can scope queries to specific feeds
Spaces — Navigation tools require a space context; all tool calls are RLS-scoped