Fact Extraction
Large documents and decisions contain many distinct pieces of information, but standard search treats them as single units. A five-paragraph architecture document might match a query about database choice, deployment strategy, and team ownership equally — because they all live in the same embedding. Fact extraction solves this by decomposing rich entities into atomic facts, each embedded and searchable on its own.
What Are Atomic Facts?
Section titled “What Are Atomic Facts?”An atomic fact is a single, self-contained statement that can be understood without the original document. Each fact captures one piece of information with enough context to be useful in isolation.
Given a DOCUMENT entity like “Q1 Architecture Review Notes”, fact extraction produces individual facts such as:
- “The team decided to migrate from MySQL to PostgreSQL for better JSON support in Q1 2025.”
- “Sarah Chen proposed using read replicas to handle the reporting workload.”
- “The estimated migration timeline is 6 weeks starting March 15, 2025.”
- “gRPC was considered for the internal API but ruled out due to limited browser support.”
Each of these becomes its own FACT entity with its own embedding vector, linked back to the parent document via a PART_OF relationship.
Why Facts Matter for Search
Section titled “Why Facts Matter for Search”Without fact extraction, searching for “database migration timeline” returns the entire architecture review document. The relevance score reflects the overall document, not the specific sentence about the timeline. With fact extraction, the search returns the precise fact: “The estimated migration timeline is 6 weeks starting March 15, 2025” — ranked higher because its embedding closely matches the query.
Facts participate in both sides of TeamLoop’s hybrid search:
- Vector search — Each fact has its own embedding, so semantic similarity is measured against a focused statement rather than a diluted document embedding.
- Full-text search — Fact text is indexed for keyword matching, increasing the chance of matching specific terms.
Because facts are stored as regular entities with type FACT, they flow through the same search pipeline as all other entity types. No special search configuration is needed.
How Facts Are Created
Section titled “How Facts Are Created”Facts reach the knowledge graph through three paths, each designed for a different stage of the workflow.
1. Auto-extraction in teamloop_save_knowledge
Section titled “1. Auto-extraction in teamloop_save_knowledge”When you save entities via teamloop_save_knowledge, TeamLoop automatically identifies candidates for fact extraction. Any DOCUMENT or DECISION entity with a description longer than 200 characters is eligible.
Two extraction mechanisms run in parallel:
- Server-side extraction — If an LLM client is configured (Bedrock), TeamLoop extracts facts in the background using the built-in extraction prompt. Facts are saved asynchronously without blocking the response.
- Host-LLM instructions — The
save_knowledgeresponse includes extraction instructions and a list of candidate entities. The host AI assistant can then callteamloop_save_factsto provide its own extracted facts.
This dual approach means facts are extracted even when only one LLM path is available, and the host assistant can apply domain knowledge the server-side model lacks.
2. Auto-extraction in teamloop_remember
Section titled “2. Auto-extraction in teamloop_remember”The teamloop_remember tool also triggers automatic fact extraction. When the remember flow creates DOCUMENT or DECISION entities with descriptions longer than 200 characters, server-side extraction runs asynchronously in the background. This means casual “remember this” interactions still produce searchable atomic facts without any extra steps.
3. Direct extraction with teamloop_save_facts
Section titled “3. Direct extraction with teamloop_save_facts”The teamloop_save_facts MCP tool gives the host assistant explicit control over fact extraction. This is useful when:
- The host LLM receives extraction instructions from
save_knowledgeand processes them - You want to manually decompose a specific entity into facts
- You want to add facts that the automatic extraction missed
The teamloop_save_facts Tool
Section titled “The teamloop_save_facts Tool”Parameters
Section titled “Parameters”| Parameter | Type | Required | Description |
|---|---|---|---|
parent_entity_id | string | Yes | UUID of the parent entity these facts were extracted from |
facts | array | Yes | Array of fact objects to save |
Each fact object in the array:
| Field | Type | Required | Description |
|---|---|---|---|
text | string | Yes | The complete atomic fact as a standalone sentence |
attribution | string | No | Person name if the fact can be attributed to someone |
date | string | No | Date in YYYY-MM-DD format if the fact references a specific date |
Example
Section titled “Example”{ "parent_entity_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "facts": [ { "text": "The payments service was migrated from Stripe v2 to Stripe v3 API in February 2025.", "date": "2025-02-01" }, { "text": "Jake Martinez led the Stripe migration and estimated 3 weeks of work.", "attribution": "Jake Martinez" }, { "text": "Webhook reliability improved from 94% to 99.7% after the Stripe v3 migration." } ]}Response
Section titled “Response”Extracted 3 facts from entity "Payments Service Migration Plan".If some facts already exist (deduplication by name match), the response reports how many were skipped:
Extracted 2 facts from entity "Payments Service Migration Plan". 1 facts skipped (already exist).What Makes a Good Atomic Fact
Section titled “What Makes a Good Atomic Fact”The extraction prompt enforces these principles, and you should follow them when calling teamloop_save_facts directly:
Self-contained — Each fact must be understandable without the original document. Use full names and context instead of pronouns.
| Bad | Good |
|---|---|
| ”It was decided to use PostgreSQL." | "The backend team decided to use PostgreSQL for the user service database." |
| "They chose option B." | "The architecture review chose gRPC over REST for internal service communication.” |
Specific — Include names, dates, versions, and metrics. Vague facts do not help search.
| Bad | Good |
|---|---|
| ”Performance improved after the change." | "API response latency dropped from 340ms to 85ms after enabling connection pooling." |
| "The deadline is next month." | "The v2.0 release deadline is March 30, 2025.” |
Factual — Extract only stated facts, not opinions or speculation. If someone made a prediction, attribute it.
Attributed — When the source mentions who said, decided, or authored something, include the attribution. This makes facts searchable by person.
Non-trivial — Skip boilerplate like “This is a meeting document” or “The project is ongoing.” Every fact should carry information worth retrieving.
How Facts Are Stored
Section titled “How Facts Are Stored”Each fact is stored as an entity with type FACT:
- Name — The fact text, truncated to 200 characters for display
- Description — The full fact text
- Embedding — Generated from the fact text for vector search
- Event date — Inherited from the parent entity, or overridden if the fact specifies a date
- Source metadata — Inherited from the parent entity (integration, external ID)
- Properties — Attribution stored as a property when provided
A PART_OF relationship links each fact back to its parent entity, preserving the connection to the original source document.
Deduplication
Section titled “Deduplication”Before creating a fact, TeamLoop checks for an existing FACT entity with the same name (truncated text) for the current user. Duplicate facts are silently skipped. This prevents redundant facts when the same entity is processed multiple times or when both server-side and host-LLM extraction run on the same content.
- Let auto-extraction work first — In most cases, the automatic extraction from
save_knowledgeandrememberis sufficient. Usesave_factsdirectly only when you need to add facts the automatic extraction missed. - Aim for 3-10 facts per entity — Too few means the document was not decomposed enough. Too many suggests facts are too granular to be useful.
- Include dates for temporal search — Facts with dates participate in timeline queries and point-in-time search, making temporal retrieval more precise.
- Use attribution for people search — When a fact is attributed to a person, searching for that person’s name will surface their specific contributions and decisions.