Tool Registry

All 12 LLM-callable tools, registry architecture, availability predicates, tool context, and callback system for streaming. Grouped: navigation, search, visualisation, web.

Overview

The tool registry is a central system that manages 12 LLM-callable tools available to agents during conversations. Each tool has a definition (OpenAI function calling schema), an async handler, and an availability predicate that controls whether the tool appears in the LLM's context for a given request.

Tools are grouped into four categories: navigation (folder and source browsing), search (vector, keyword, and filename lookup), document (reading, analysis, and structured queries), and web (external search and page fetching). The registry ensures agents only see tools they can actually use, reducing hallucinated calls to unavailable functionality.

Key Concepts

  • Availability predicates — Each tool defines an available(context) function that determines whether the tool appears in the LLM's tool list for the current request.
  • ToolContext — Every tool call receives { userId, spaceId, requestId }, providing identity and scope.
  • ToolCallbacks — Streaming callbacks enable intermediate results to be sent to the client during tool execution, including sub-agent progress.
  • Single registry — All tools register at startup via registerTool(), making the registry the single source of truth for tool existence and definitions.
  • Sub-agent streaming — Tools like analyze_document spawn sub-agents whose intermediate output (chunks, tool calls, results) streams back to the user via callbacks.

Data Model

ToolEntry

FieldTypeDescription
definitionOpenAI function tool schemaName, description, and JSON Schema parameters
handlerasync (args, context, callbacks) => ToolResultExecution logic
available(context: ToolContext) => booleanVisibility predicate

ToolContext

FieldTypeDescription
userIdstringAuthenticated user ID
spaceIdstring or nullCurrent space (null for cross-space queries)
requestIdstringRequest correlation ID

ToolCallbacks

CallbackPurpose
onToolCallTool invocation started
onToolResultTool returned a result
onSubAgentStartSub-agent spawned (e.g., for analyze_document)
onSubAgentChunkSub-agent produced a text chunk
onSubAgentToolCallSub-agent called a tool
onSubAgentToolResultSub-agent tool returned
onSubAgentEndSub-agent finished

How It Works

Registration

At startup, each tool calls registerTool() with its definition, handler, and availability predicate. The registry stores entries in a Map<string, ToolEntry> and throws on duplicate names.

Tool Selection

When the LLM needs its tool list, getAvailableTools(context) iterates the registry and returns only tools where available(context) returns true. This means the LLM never sees tools it cannot use.

Execution

executeTool(name, args, context, callbacks) looks up the handler and invokes it with the parsed arguments, context, and streaming callbacks. Results are returned as ToolResult objects.

The 12 Tools

ToolDescription
list_sourcesList all sources in the current space
list_folder_contentsList contents of a specific folder within a source
folder_treeGet the hierarchical folder structure of a source

These tools are only available when spaceId is set, since navigation is meaningless without a space context.

Search Tools (always available)

ToolDescription
search_documentsHybrid, vector, or keyword search across document chunks via the vector store
grep_documentsExact text matching across raw document content
find_by_nameFind documents by filename pattern (glob-style matching)

Document Tools (always available)

ToolDescription
document_infoGet extracted metadata for a specific document
read_documentRead full or partial document content by character range
analyze_documentDeep analysis of a document using a sub-agent with its own tool access
query_documentsText-to-SQL queries against tabular data stored as JSONB

Web Tools (conditional)

ToolDescription
web_searchSearch the web via API (only available when web search is enabled in system settings)
fetch_web_pageFetch and extract readable content from a URL

Why It Works This Way

Availability Predicates Reduce Hallucinated Calls

When an LLM sees a tool in its context, it may try to call it regardless of whether it will succeed. By filtering tools with availability predicates before sending the tool list, agents never see list_sources when there is no space context, or web_search when the feature is disabled. This eliminates an entire class of failed tool calls.

Streaming Callbacks Enable Visible Intermediate Work

Tools like analyze_document spawn a sub-agent that may run for several seconds, making its own tool calls and producing intermediate text. The callback system streams this progress to the user in real time, so they see the agent working rather than waiting for a final answer. This builds trust and lets users interrupt if the agent goes off track.

Text-to-SQL Gives Agents Structured Data Access

The query_documents tool translates natural language into SQL queries against tabular data stored in JSONB columns. Without this, agents would need to retrieve raw CSV/spreadsheet content via vector search and parse it themselves — unreliable for numerical queries, aggregations, and filtering.

Separate Definitions From Handlers

Tool definitions (OpenAI function schemas) live in /definitions/ and handlers in /handlers/. This separation means the registry file is a clean mapping, definitions can be tested against the OpenAI schema spec independently, and handlers can be unit tested with mock contexts.

Code Reference

FileDescription
apps/data-plane/src/tools/registry.tsregisterTool(), getAvailableTools(), executeTool()
apps/data-plane/src/tools/types.tsToolContext, ToolCallbacks, ToolEntry, ToolResult types
apps/data-plane/src/tools/index.tsAll 12 tool registrations with availability predicates
apps/data-plane/src/tools/definitions/Individual tool OpenAI function definitions
apps/data-plane/src/tools/handlers/Individual tool handler implementations

Relationships

  • Vector Store & Search — The search_documents tool delegates to the vector store's hybrid search
  • Metadata Extraction — The document_info tool returns extracted metadata fields
  • Chunking & Embedding — Search tools rely on chunks and embeddings created during ingestion
  • Feeds — Search tools can scope queries to specific feeds
  • Spaces — Navigation tools require a space context; all tool calls are RLS-scoped

Making the unknown, known.

© 2026 Condelo. All rights reserved.