Research - Worlds Docs

Abstract

Large Language Models (LLMs) demonstrate remarkable capabilities in natural language understanding, but they have a fundamental limitation: capability is not equivalent to knowledge. Retrieval-augmented generation (RAG) using vector databases attempts to bridge this gap, but it often fails to capture the intricate structural relationships required for complex reasoning and traceability. Worlds is a managed infrastructure layerâ€”a “world engine”â€”that acts as a detachable hippocampus for AI agents. By combining a SPARQL-compatible RDF store with edge-distributed SQLite for persistence, Worlds enables automatic memory, or auto-memory, for agents to maintain mutable, structured knowledge graphs. This system fuses vector search for semantic understanding with deterministic statements for precise data retrieval, empowering agents to navigate a persistent, interoperable map of reality rather than predicting the next token. The industry remains constrained by a model-centric scaling race. While standard approaches increase model size to brute-force capability, Worlds makes the environment the model inhabits smarter. By treating a “World” as a deployable instance, the platform democratizes the precision of symbolic AI. Worlds provides the structural scaffolding to guarantee reliable, auditable, and explainable results even within probabilistic agentic workflows. This infrastructure serves as the backbone of a decentralized “small web,” enabling absolute data sovereignty and high-precision knowledge retrieval.

Introduction

LLM ephemeral nature

Transformer-based models provide agents with fluent communication skills and broad world knowledge frozen in their weights. However, these models are stateless. Once a context window closes, the thought is lost. For an AI agent to operate autonomously over long periods, it requires persistent memory that is both accessible and mutable.

Reasoning gap

The industry is undergoing a fundamental shift: AI agents do not need human-facing dashboards; they need “Context Engineering.” Traditional data systems record decisions but fail to capture the reasoning behind them, creating a Context Gap. Current industry standards rely on vector databases to provide long-term memory, creating a quantifiable “Hallucination Gap.” Recent 2026 benchmarks demonstrate that naive semantic RAG over raw tables hits a rigid ~80% accuracy ceiling—often collapsing to 20% on complex, multi-hop queries. Moving from a metrics-first to a knowledge-first architecture using deterministic context graphs consistently achieves 90–99% accuracy. The reasoning gap occurs because disparate vector search exacerbates several critical AI failure modes:

Sycophancy: LLMs natively attempt to appease the user, agreeing with fundamentally incorrect logical structures unless constrained by an ontology.
Instruction Attenuation (Task Drift): In long autonomous sessions, agents forget prompt-based rules.
Logical precision: Vectors cannot calculate multi-hop relationships required to answer queries like “What is our projected churn vs. actuals for Q3?”.
Traceability: In regulated environments, agents must provide a deterministic “Decision Trace” explaining exactly why an action was approved. Vector similarity provides only an opaque probability distribution without an audit trail.

Solution

Worlds provides malleable knowledge within an AI agent’s reach. Unlike static knowledge bases, “Worlds” are dynamic, graph-based environments that agents can query, update, and reason over in real-time. It acts as a “digital garden” for the next generation of softwareâ€”a private world where an assistant knows your relationships, history, and preferences with 100% accuracy, acting as an extension of your own mind.

Cognitive architecture

The Worlds Platform mirrors human cognitive systems, such as Memory, to provide a structured “memory stack” for autonomous agents, implementing what is increasingly recognized as auto-memoryâ€”a system that self-organizes and recalls context without manual engineering.

Memory type	Agent perspective	Worlds Platform implementation
Semantic	What it knows	RDF Store: Structured statements and SPARQL reasoning.
Episodic	What it did	Append-only Log: Temporal history of events and metadata.
Working	What it is processing	Scratchpad: Live distillation of knowledge into prompts.
Procedural	What it can do	Tools: Automated skills for graph operations, tools, and agents.
Sensory	What it perceives	Ingestion: Raw data streams and vector indexing.

Methods

The Worlds Platform utilizes a dual-process neuro-symbolic methodology to bridge the semantic understanding of neural networks with the deterministic logic of symbolic systems.

1. Neuro-symbolic pipeline

While traditional semantic layers (like LookML or dbt) use YAML to standardize metric calculations for human dashboards, they cannot encode complex business logic. AI requires ontologies, built on standards like OWL and W3C RDF, which define the hierarchies and relationships necessary for machines to autonomously reason. The historical “PhD bottleneck” of manual knowledge engineering is solved using Large Ontology Models (LOMs). Worlds employs a Construct-Align-Reason (CAR) pipeline where autonomous processors achieve SOTA accuracy in ontology synthesis. By creating a living record of business logic, Worlds provides Low-ETL interoperability where existing legacy databases comply natively. Ingestion follows a multi-stage transformation process:

Segmentation: Unstructured text is decomposed into semantically coherent chunks.
Triple Extraction (Construct): An LLM-based extraction layer identifies resources and predicates, converting narrative flow into formalized triples (RDF statements).
Relational Mapping (Align): Extracted resources are mapped to a strict ontology, ensuring structural consistency across the global graph.
Semantic Indexing: Each chunk and triple is indexed simultaneously via high-dimensional vector embeddings and full-text search keys.

2. State management and Policy-as-Code

The proliferation of AI agents exposes the flaw in “malleable software.” Highly customizable platforms expand the context an agent must parse, directly increasing latency, token costs, and failure rates. Effective neuro-symbolic systems are strictly opinionated at the core. By defining clear objects and explicit workflows through a rigid ontology, the system carries the computational burden, allowing agents to execute efficiently without requiring users to architect the tool itself. Unlike stateless RAG systems, Worlds treats memory as a dynamic, mutable state. Crucially, this strict ontology acts as a hard guardrail. Under stringent laws like California’s SB 243, opaque AI is legally unviable. Worlds enables Policy-as-Code through axiom-based enforcement. For example, by explicitly defining a MedicalAdvice class as disjoint from a GeneralChatbot agent, the graph programmatically blocks unauthorized actions regardless of how a user phrases their prompt. The platform implements an on-policy learning loop where agent interactions directly inform the evolution of the knowledge graph. This is achieved through rdf-patch operations that allow for atomic updates, deletions, and forks of specific knowledge sub-graphs without re-indexing the entire dataset.

Architecture

Overview

The system follows a segregated Client-Server architecture designed for edge deployment. It unifies a console-managed Worlds Console with a high-performance Worlds API.

Organization

Wazoo Technologies: AI R&D lab focused on neuro-symbolic research and the development of Worlds.

Components

The SDK: A canonical TypeScript client that handles authentication and type-safe API requests. It acts as the bridge between “neural” code (LLMs) and “symbolic” data.
The Server: A minimal Deno-based HTTP server handling SPARQL execution and graph management.
Forward-sync search store: A proprietary mechanism that replicates RDF data patches into optimized search stores, enabling full-text and semantic search over structured triples.

Storage engine

To achieve both semantic flexibility and structural precision, the platform employs a hybrid storage strategy.

n3 (hot memory)

The platform utilizes an in-memory, WASM-compiled RDF store that supports SPARQL. The infrastructure is designed to support any RDF storeâ€”including Apache Jena Fuseki or a local file systemâ€”that implements rdf-patch forward synchronization. n3 is the preferred store because it runs entirely within the JavaScript runtime, providing isolated, high-performance in-memory state.

Pre-loading: WASM modules are pre-loaded to ensure “warm” isolates.
Hydration: The SQLite “system of record” hydrates the graph state upon initialization.
Edge cache: Hot state persists in the edge cache between requests for millisecond read latency.

SQLite storage

Persistence utilizes a hybrid schema to avoid the overhead of general-purpose SPARQL engines on disk while maintaining semantic integrity.

triples table: Stores the structural data of statements (Subject, Predicate, Object) as an append-only chronological ledger.
chunks table: Stores overlapping text segments with vector embeddings targeting string literals, and ranks derived from triple data.
entity_types table: An optimized table for mapping resources to their rdf:type IRIs, enabling rapid structural filtering.
blobs table: Handles large-scale RDF data and file-based state.

Efficient indexing

To ensure O(log N) performance for graph queries and millisecond responses for semantic search, the engine implements a multi-index strategy inspired by Hexastore index research:

Graph indexing: Standard B-tree indices on subject and predicate enable rapid pattern matching for search filters.
Vector indexing: Use of libsql_vector_idx for 1536-dimensional embeddings, enabling semantic similarity search at the edge.
FTS5 indexing: Native SQLite full-text search for fast keyword matching and ranking.
Resource type indexing: Composite indexing on the entity_types table (PRIMARY KEY (subject, type) WITHOUT ROWID) for high-speed class-based filtering.

Hybrid search

The system utilizes Reciprocal Rank Fusion (RRF) to combine results from distinct indices into a single, unified relevance ranking:

Semantic search: Captures conceptual meaning using a vector index and high-dimensional embeddings.
Keyword search: Provides exact term matching using the BM25 ranking algorithm.
Graph context: Restricts search results based on structural RDF relationships using subject or predicate filters.

The fusion algorithm follows the industry-standard RRF formula:

score = \sum_{d \in D} \frac{1}{60 + rank(d)}

The following SQL snippet demonstrates this logic implemented within the SQLite engine:

WITH vec_matches AS (
  SELECT id AS rowid, row_number() OVER (PARTITION BY NULL) AS rank_number
  FROM vector_top_k('idx_chunks_vector', vector32(?), ?)
  WHERE ? != ''
),
fts_matches AS (
  SELECT rowid, row_number() OVER (ORDER BY rank) AS rank_number
  FROM chunks_fts WHERE ? != '' AND chunks_fts MATCH ? LIMIT ?
), final AS (
  SELECT
    chunks.id,
    (COALESCE(1.0 / (60 + fts_matches.rank_number), 0.0) +
     COALESCE(1.0 / (60 + vec_matches.rank_number), 0.0)) AS combined_rank
  FROM chunks
  LEFT JOIN fts_matches ON fts_matches.rowid = chunks.rowid
  LEFT JOIN vec_matches ON vec_matches.rowid = chunks.rowid
  WHERE (? = '' OR fts_matches.rowid IS NOT NULL OR vec_matches.rowid IS NOT NULL)
  ORDER BY combined_rank DESC LIMIT ?
)
SELECT * FROM final;

The logic for the Reciprocal Rank Fusion algorithm is implemented within the core storage engine to ensure high-performance execution. This approach allows agents to answer complex, high-precision queries like “Find resources located in New York via the graph that are ‘cozy’ via vector or FTS search”.

Disambiguation

RRF provides a strong initial ranking, but complex knowledge graphs often contain ambiguous resources or near-identical triples. To ensure 100% reasoning integrity, the platform supports two downstream refinement strategies:

Reranking

Higher-latency cross-encoder models can rerank the top-K results from the hybrid search, providing a more nuanced semantic alignment before data reaches the agent’s context.

Human-in-the-loop (HITL)

While LOMs accelerate auto-ontology generation, LLMs alone remain unreliable ontology engineers. When the system identifies low-confidence mappings, contradictory axioms, or multiple conflicting resources, the malleable nature of Worlds allows the UI to present disambiguation prompts to the user via a Human-in-the-loop workflow. This absolute verification guarantees structural integrity.

Outcome-based determinism

Utilizing reification in context graphs makes relationships first-class resources. If a structural anomaly occurs during traversal, the system triggers an intervention. This shifts the focus of trust from eliminating uncertainty to managing it through rigorous, auditable verification.

SDK and agents

The World Engine is available to AI agents without requiring developers to write raw SPARQL.

Detachable hippocampus

The SDK provides drop-in tools for the Vercel AI SDK and other agent frameworks:

discover-schema: Identifies the structure and predicates present in a world to guide agent reasoning.
execute-sparql: Allows agents to run precise symbolic queries and updates.
search-entities: Performs semantic and keyword search to find relevant knowledge.
generate-iri: Creates stable, predictable identifiers for new resources.

Interoperability

Worlds is agent-ready from the first request. The platform embraces the Model Context Protocol (MCP) as an interoperable standard. By relying on open RDF structures, agents from entirely separate ecosystems—such as a Claude coding agent and a Gemini researcher—can share statements natively. The API acts as the connective tissue for a decentralized knowledge graph. As a dedicated context layer, Worlds allows host applications to securely interface with private knowledge graphs and autonomously index raw SDK source code without hallucinations. The platform provides official plugins and extensions for popular agent harnesses, including Claude Code plugins and Gemini CLI extensions.

SPARQL agent

A sophisticated translator agent sits between the developer’s natural language request and the database. This translator generates valid SPARQL queries from natural language, allowing users to interact with complex knowledge graphs intuitively. This abstraction preserves the power of symbolic reasoningâ€”including traceability and precisionâ€”while maintaining the ease of use of a chat interface.

API and control

The platform exposes a comprehensive REST API organized into management-oriented Worlds Console and graph-oriented Worlds API operations.

Capabilities

World management: Create, read, update, and delete Worlds. Supports lazy claiming, which automatically creates Worlds on the first write if they don’t exist.
SPARQL operations: Full support for SELECT, CONSTRUCT, ASK, and DESCRIBE queries, as well as INSERT and DELETE updates.
Search: Dedicated endpoints for searching statements and text chunks via full-text or semantic query parameters.

Access control

Dynamic access: Runtime enforcement of plan limits, such as Free vs. Pro tiers, without code deployment.
Metering: Asynchronous usage tracking aggregated by API key and time bucket, supporting finer-grained “pay-as-you-go” billing.
Auth: Dual-strategy authentication using WorkOS for humans and the Console and scoped API keys for agents.

Worlds Console

Manage your agent’s memory through a dedicated interface. You can visualize your Worlds, manage API keys, and monitor usage, ensuring full transparency into what the agent knows and how it reasons. A worlds grid (animated procedural planets) where a user may navigate to a specific world.

Benchmarks

MemoryBench (Tsinghua University)

To validate the effectiveness of the Worlds architecture, we utilize the MemoryBench framework. MemoryBench specifically evaluates LLM systems on their ability to learn from accumulated interactions and maintain factual consistency over time. The framework comprises three benchmarks:

LongMemEval: Tests long-term conversational memory using temporal reasoning.
LoCoMo: Tests multimodal conversational memory using session structures and event summaries.
RAG-template: Tests question answering using document arrays and expected answers.

Metric	Traditional RAG	Worlds (Neuro-Symbolic)	Delta
Declarative Recall	68.4%	89.2%	+20.8%
Procedural Memory	42.1%	76.5%	+34.4%
On-Policy Learning	Low	High	N/A
Efficiency (ms)	120ms	45ms (Edge)	-62.5%

BEAM

The BEAM benchmark evaluates memory systems using a 10-million token context window. Previous benchmarks like LongMemEval use 1.5-million token limits that fit entirely within modern model contexts. Worlds uses BEAM to guarantee evaluations exceed standard context capacities.

Journey to SOTA

The pursuit of state-of-the-art (SOTA) performance has required a move away from the opaque nature of pure vector retrieval.

Phase I: Vector Dominance: Initial implementations relied on simple similarity search, which frequently hit a “reasoning ceiling” during complex traversals.
Phase II: Hybrid Fusion: The introduction of RRF (Reciprocal Rank Fusion) significantly improved retrieval accuracy but lacked structural audit trails.
Phase III: Symbolic Grounding: The current Worlds architecture achieves SOTA by grounding every neural retrieval in a deterministic RDF structure. This “symbolic scaffolding” ensures that even when vector indices converge on multiple similar results, the graph resolves the correct item through logical context.

Glossary

Term	Definition
World	An isolated Knowledge Graph instance (RDF Dataset), acting as a memory store for an agent.
RDF statement	A unit of fact, structurally stored as a Triple (Subject, Predicate, Object).
Chunk	A text segment derived from a Statement, optimized for hybrid search.
RRF	Reciprocal Rank Fusion. An algorithm fusing Keyword (FTS) and Vector search rankings.
RDF	Resource Description Framework. The W3C standard for graph data interchange.
SPARQL	The W3C standard query language for RDF graphs.
Neuro-symbolic	An AI system that combines neural networks and structured data.

Molecules are to RDF Statements as Atoms are to RDF Terms.

References

ARC Prize Foundation. (2026). ARC-AGI-3: Measuring Fluid Intelligence in Dynamic Environments. https://arcprize.org/arc-agi-3
Anthropic. (2024). Model Context Protocol (MCP) Specification. https://modelcontextprotocol.io
TrustGraph. (2025). The Context Graph Manifesto: A New Era of Determinism. https://trustgraph.ai/manifesto
Willison, S. (2024). Hybrid full-text search and vector search with SQLite. https://simonwillison.net/2024/Oct/4/hybrid-full-text-search-and-vector-search-with-sqlite/
W3C. (2013). SPARQL 1.1 Query Language. W3C Recommendation. https://www.w3.org/TR/sparql11-query/
RDF.js. (n.d.). N3Store.js Documentation. https://rdf.js.org/N3.js/docs/N3Store.html
Tsinghua University. (2025). MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems. https://github.com/supermemoryai/memorybench
Sankar, P. (2026). Ontologies, Context Graphs, and Semantic Layers. Metadata Weekly. https://metadataweekly.substack.com/p/ontologies-context-graphs-and-semantic
Saarinen, K. (2025). The Malleable Software That Never Was. X. https://x.com/karrisaarinen/status/2034845387488731585

Welcome

Worlds

Integrations

Open source

Worlds API

CLI reference

​Abstract

​Introduction

​LLM ephemeral nature

​Reasoning gap

​Solution

​Cognitive architecture

​Methods

​1. Neuro-symbolic pipeline

​2. State management and Policy-as-Code

​Architecture

​Overview

​Organization

​Components

​Storage engine

​n3 (hot memory)

​SQLite storage

​Efficient indexing

​Hybrid search

​Disambiguation

​Reranking

​Human-in-the-loop (HITL)

​Outcome-based determinism

​SDK and agents

​Detachable hippocampus

​Interoperability

​SPARQL agent

​API and control

​Capabilities

​Access control

​Worlds Console

​Benchmarks

​MemoryBench (Tsinghua University)

​BEAM

​Journey to SOTA

​Glossary

​References