Fact Extraction

Large documents and decisions contain many distinct pieces of information, but standard search treats them as single units. A five-paragraph architecture document might match a query about database choice, deployment strategy, and team ownership equally — because they all live in the same embedding. Fact extraction solves this by decomposing rich entities into atomic facts, each embedded and searchable on its own.

What Are Atomic Facts?

An atomic fact is a single, self-contained statement that can be understood without the original document. Each fact captures one piece of information with enough context to be useful in isolation.

Given a DOCUMENT entity like “Q1 Architecture Review Notes”, fact extraction produces individual facts such as:

“The team decided to migrate from MySQL to PostgreSQL for better JSON support in Q1 2025.”
“Sarah Chen proposed using read replicas to handle the reporting workload.”
“The estimated migration timeline is 6 weeks starting March 15, 2025.”
“gRPC was considered for the internal API but ruled out due to limited browser support.”

Each of these becomes its own FACT entity with its own embedding vector, linked back to the parent document via a PART_OF relationship.

Why Facts Matter for Search

Without fact extraction, searching for “database migration timeline” returns the entire architecture review document. The relevance score reflects the overall document, not the specific sentence about the timeline. With fact extraction, the search returns the precise fact: “The estimated migration timeline is 6 weeks starting March 15, 2025” — ranked higher because its embedding closely matches the query.

Facts participate in both sides of TeamLoop’s hybrid search:

Vector search — Each fact has its own embedding, so semantic similarity is measured against a focused statement rather than a diluted document embedding.
Full-text search — Fact text is indexed for keyword matching, increasing the chance of matching specific terms.

Because facts are stored as regular entities with type FACT, they flow through the same search pipeline as all other entity types. No special search configuration is needed.

How Facts Are Created

Facts reach the knowledge graph through three paths, each designed for a different stage of the workflow.

1. Auto-extraction in `teamloop_save_knowledge`

When you save entities via teamloop_save_knowledge, TeamLoop automatically identifies candidates for fact extraction. Any DOCUMENT or DECISION entity with a description longer than 200 characters is eligible.

Two extraction mechanisms run in parallel:

Server-side extraction — If an LLM client is configured (Bedrock), TeamLoop extracts facts in the background using the built-in extraction prompt. Facts are saved asynchronously without blocking the response.
Host-LLM instructions — The save_knowledge response includes extraction instructions and a list of candidate entities. The host AI assistant can then call teamloop_save_facts to provide its own extracted facts.

This dual approach means facts are extracted even when only one LLM path is available, and the host assistant can apply domain knowledge the server-side model lacks.

2. Auto-extraction in `teamloop_remember`

The teamloop_remember tool also triggers automatic fact extraction. When the remember flow creates DOCUMENT or DECISION entities with descriptions longer than 200 characters, server-side extraction runs asynchronously in the background. This means casual “remember this” interactions still produce searchable atomic facts without any extra steps.

3. Direct extraction with `teamloop_save_facts`

The teamloop_save_facts MCP tool gives the host assistant explicit control over fact extraction. This is useful when:

The host LLM receives extraction instructions from save_knowledge and processes them
You want to manually decompose a specific entity into facts
You want to add facts that the automatic extraction missed

The `teamloop_save_facts` Tool

Parameters

Parameter	Type	Required	Description
`parent_entity_id`	string	Yes	UUID of the parent entity these facts were extracted from
`facts`	array	Yes	Array of fact objects to save

Each fact object in the array:

Field	Type	Required	Description
`text`	string	Yes	The complete atomic fact as a standalone sentence
`attribution`	string	No	Person name if the fact can be attributed to someone
`date`	string	No	Date in YYYY-MM-DD format if the fact references a specific date

Example

{
  "parent_entity_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "facts": [
    {
      "text": "The payments service was migrated from Stripe v2 to Stripe v3 API in February 2025.",
      "date": "2025-02-01"
    },
    {
      "text": "Jake Martinez led the Stripe migration and estimated 3 weeks of work.",
      "attribution": "Jake Martinez"
    },
    {
      "text": "Webhook reliability improved from 94% to 99.7% after the Stripe v3 migration."
    }
  ]
}

Response

Extracted 3 facts from entity "Payments Service Migration Plan".

If some facts already exist (deduplication by name match), the response reports how many were skipped:

Extracted 2 facts from entity "Payments Service Migration Plan". 1 facts skipped (already exist).

What Makes a Good Atomic Fact

The extraction prompt enforces these principles, and you should follow them when calling teamloop_save_facts directly:

Self-contained — Each fact must be understandable without the original document. Use full names and context instead of pronouns.

Bad	Good
”It was decided to use PostgreSQL."	"The backend team decided to use PostgreSQL for the user service database."
"They chose option B."	"The architecture review chose gRPC over REST for internal service communication.”

Specific — Include names, dates, versions, and metrics. Vague facts do not help search.

Bad	Good
”Performance improved after the change."	"API response latency dropped from 340ms to 85ms after enabling connection pooling."
"The deadline is next month."	"The v2.0 release deadline is March 30, 2025.”

Factual — Extract only stated facts, not opinions or speculation. If someone made a prediction, attribute it.

Attributed — When the source mentions who said, decided, or authored something, include the attribution. This makes facts searchable by person.

Non-trivial — Skip boilerplate like “This is a meeting document” or “The project is ongoing.” Every fact should carry information worth retrieving.

How Facts Are Stored

Each fact is stored as an entity with type FACT:

Name — The fact text, truncated to 200 characters for display
Description — The full fact text
Embedding — Generated from the fact text for vector search
Event date — Inherited from the parent entity, or overridden if the fact specifies a date
Source metadata — Inherited from the parent entity (integration, external ID)
Properties — Attribution stored as a property when provided

A PART_OF relationship links each fact back to its parent entity, preserving the connection to the original source document.

Deduplication

Before creating a fact, TeamLoop checks for an existing FACT entity with the same name (truncated text) for the current user. Duplicate facts are silently skipped. This prevents redundant facts when the same entity is processed multiple times or when both server-side and host-LLM extraction run on the same content.

Tips

Let auto-extraction work first — In most cases, the automatic extraction from save_knowledge and remember is sufficient. Use save_facts directly only when you need to add facts the automatic extraction missed.
Aim for 3-10 facts per entity — Too few means the document was not decomposed enough. Too many suggests facts are too granular to be useful.
Include dates for temporal search — Facts with dates participate in timeline queries and point-in-time search, making temporal retrieval more precise.
Use attribution for people search — When a fact is attributed to a person, searching for that person’s name will surface their specific contributions and decisions.