Agentic Retrieval
Standard retrieval runs a single pass and returns whatever it finds. If your query is ambiguous or broad, you get partial results and don’t know what you’re missing. Agentic retrieval adds an LLM-driven sufficiency loop on top of hybrid search that evaluates whether results adequately cover your query — and if not, generates targeted follow-up queries to fill the gaps.
Why Agentic Retrieval?
Section titled “Why Agentic Retrieval?”| Single-pass retrieval | Agentic retrieval |
|---|---|
| ”Show me the auth decisions” → returns 2 results about JWT | Same query → Round 1 finds JWT results, LLM identifies missing OAuth and session management coverage, Round 2 fills gaps |
| You don’t know what’s missing | Sufficiency check explicitly identifies gaps |
| Works well for precise queries | Works well for both precise and broad queries |
How It Works
Section titled “How It Works”When you run a query with agentic retrieval enabled (the default), TeamLoop executes the following flow:
Round 1: Hybrid search
Section titled “Round 1: Hybrid search”The query runs through the standard hybrid search pipeline — vector search, BM25 text search, RRF fusion, and reranking — returning the top 5 results.
Threshold gate
Section titled “Threshold gate”Before invoking the LLM, a fast check determines whether a sufficiency evaluation is needed:
- If results >= 3 entities AND the top score >= 0.4 → results are likely sufficient, skip the LLM call
- Otherwise → proceed to sufficiency check
This avoids unnecessary LLM latency for queries that already have strong results.
Sufficiency check
Section titled “Sufficiency check”An LLM evaluates the query against the Round 1 results and returns a structured assessment:
- sufficient — whether the results adequately cover the query
- reasoning — explanation of the assessment
- missing — aspects of the query not covered
- suggested_queries — up to 3 reformulated queries to fill gaps
The sufficiency check runs with a 1.5s timeout. If the LLM is unavailable or times out, Round 1 results are returned unchanged.
Round 2: Reformulated queries
Section titled “Round 2: Reformulated queries”If the sufficiency check identifies gaps, TeamLoop runs all suggested queries in parallel through the hybrid search pipeline, respecting the remaining time budget. Results are deduplicated by entity ID to avoid returning the same entity twice.
Cross-round fusion
Section titled “Cross-round fusion”Round 1 and Round 2 results are combined using RRF fusion, then reranked together to produce the final result set. This ensures the best results from both rounds surface at the top.
Graceful Degradation
Section titled “Graceful Degradation”Agentic retrieval is designed to never block or degrade the query experience:
| Scenario | Behavior |
|---|---|
| LLM client not configured | Agentic layer disabled, returns hybrid search results directly |
| Sufficiency check times out (>1.5s) | Returns Round 1 results unchanged |
| Sufficiency check errors | Logs warning, returns Round 1 results |
| Round 2 sub-query fails | Skips that query, continues with remaining |
| Time budget exceeded (>2.5s) | Stops issuing Round 2 queries, fuses what’s available |
| Results pass threshold gate | Skips LLM call entirely, returns Round 1 results |
MCP Tool
Section titled “MCP Tool”Agentic retrieval is controlled via the agentic parameter on teamloop_query.
teamloop_query
Section titled “teamloop_query”| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | required | The search query |
agentic | boolean | true | Enable agentic retrieval with sufficiency checking |
sources | string | all | Comma-separated list of sources to search |
mode | string | current | Query mode: current, as_of |
as_of | string | — | Date in YYYY-MM-DD format for temporal queries |
retrieval | string | hybrid | Retrieval strategy: hybrid or standard |
Default (agentic enabled):
Tool: teamloop_queryInput: { "query": "What decisions have been made about authentication?"}Disable agentic retrieval:
Tool: teamloop_queryInput: { "query": "PROJ-1234", "agentic": false}Disabling agentic retrieval is useful for precise queries where you know exactly what you’re looking for and want the fastest possible response.
Retrieval metadata
Section titled “Retrieval metadata”When agentic retrieval is active, the response includes a ## Retrieval Metadata section:
## Retrieval Metadata- agentic_enabled: true- rounds: 2- sufficiency_checked: true- sufficient: false- round1_count: 2- round2_count: 4- total_latency_ms: 1850- reformulated_queries: [auth token management, OAuth provider decisions, session storage architecture]| Field | Description |
|---|---|
agentic_enabled | Whether agentic retrieval was active |
rounds | Number of retrieval rounds (1 or 2) |
sufficiency_checked | Whether the LLM was consulted |
sufficient | Whether Round 1 was deemed sufficient |
round1_count | Number of results from Round 1 |
round2_count | Number of new results from Round 2 |
total_latency_ms | End-to-end latency including all rounds |
reformulated_queries | Queries generated by the sufficiency check |
Dashboard
Section titled “Dashboard”The dashboard query endpoint returns agentic metadata in the retrieval_metadata field of the JSON response. This includes all fields from the table above, allowing the UI to display retrieval diagnostics.
Configuration Defaults
Section titled “Configuration Defaults”| Parameter | Default | Description |
|---|---|---|
| Min entity threshold | 3 | Min results to skip sufficiency check |
| Confidence threshold | 0.4 | Min top score to skip sufficiency check |
| Max reformulated queries | 3 | Max Round 2 sub-queries |
| Sufficiency timeout | 1500ms | Max time for the LLM sufficiency check |
| Round 2 limit | 5 | Max results per reformulated query |
| Final top-k | 5 | Final number of results returned |
| Time budget | 2500ms | Total time budget for all rounds |
- Agentic retrieval adds modest latency — Round 2 queries run in parallel, so expect only the latency of the slowest sub-query rather than their sum. For latency-sensitive use cases, disable it with
"agentic": false. - LLM required — The sufficiency check requires an LLM client (Bedrock Claude). Without one, agentic retrieval is automatically disabled and hybrid search runs directly.
- Works with temporal modes — Agentic retrieval works with
as_ofmode, so temporal queries benefit from the same gap-filling behavior. - Check the metadata — The retrieval metadata tells you exactly what happened. If
rounds: 1andsufficiency_checked: false, the threshold gate determined results were good enough without consulting the LLM. - Broad queries benefit most — Queries like “what’s happening with the infrastructure migration?” benefit significantly from agentic retrieval. Precise queries like “PROJ-1234” typically pass the threshold gate and skip the LLM entirely.
Next Steps
Section titled “Next Steps”- Hybrid Search — The underlying search pipeline that agentic retrieval builds on
- Query Playground — Try temporal query modes with agentic retrieval
- Agent Memory — Natural language remember/recall powered by the retrieval stack