Agentic Retrieval

Standard retrieval runs a single pass and returns whatever it finds. If your query is ambiguous or broad, you get partial results and don’t know what you’re missing. Agentic retrieval adds an LLM-driven sufficiency loop on top of hybrid search that evaluates whether results adequately cover your query — and if not, generates targeted follow-up queries to fill the gaps.

Why Agentic Retrieval?

Single-pass retrieval	Agentic retrieval
”Show me the auth decisions” → returns 2 results about JWT	Same query → Round 1 finds JWT results, LLM identifies missing OAuth and session management coverage, Round 2 fills gaps
You don’t know what’s missing	Sufficiency check explicitly identifies gaps
Works well for precise queries	Works well for both precise and broad queries

How It Works

When you run a query with agentic retrieval enabled (the default), TeamLoop executes the following flow:

Round 1: Hybrid search

The query runs through the standard hybrid search pipeline — vector search, BM25 text search, RRF fusion, and reranking — returning the top 5 results.

Threshold gate

Before invoking the LLM, a fast check determines whether a sufficiency evaluation is needed:

If results >= 3 entities AND the top score >= 0.4 → results are likely sufficient, skip the LLM call
Otherwise → proceed to sufficiency check

This avoids unnecessary LLM latency for queries that already have strong results.

Sufficiency check

An LLM evaluates the query against the Round 1 results and returns a structured assessment:

sufficient — whether the results adequately cover the query
reasoning — explanation of the assessment
missing — aspects of the query not covered
suggested_queries — up to 3 reformulated queries to fill gaps

The sufficiency check runs with a 1.5s timeout. If the LLM is unavailable or times out, Round 1 results are returned unchanged.

Round 2: Reformulated queries

If the sufficiency check identifies gaps, TeamLoop runs all suggested queries in parallel through the hybrid search pipeline, respecting the remaining time budget. Results are deduplicated by entity ID to avoid returning the same entity twice.

Cross-round fusion

Round 1 and Round 2 results are combined using RRF fusion, then reranked together to produce the final result set. This ensures the best results from both rounds surface at the top.

Graceful Degradation

Agentic retrieval is designed to never block or degrade the query experience:

Scenario	Behavior
LLM client not configured	Agentic layer disabled, returns hybrid search results directly
Sufficiency check times out (>1.5s)	Returns Round 1 results unchanged
Sufficiency check errors	Logs warning, returns Round 1 results
Round 2 sub-query fails	Skips that query, continues with remaining
Time budget exceeded (>2.5s)	Stops issuing Round 2 queries, fuses what’s available
Results pass threshold gate	Skips LLM call entirely, returns Round 1 results

MCP Tool

Agentic retrieval is controlled via the agentic parameter on teamloop_query.

teamloop_query

Parameter	Type	Default	Description
`query`	string	required	The search query
`agentic`	boolean	`true`	Enable agentic retrieval with sufficiency checking
`sources`	string	all	Comma-separated list of sources to search
`mode`	string	`current`	Query mode: `current`, `as_of`
`as_of`	string	—	Date in YYYY-MM-DD format for temporal queries
`retrieval`	string	`hybrid`	Retrieval strategy: `hybrid` or `standard`

Default (agentic enabled):

Tool: teamloop_query
Input: {
  "query": "What decisions have been made about authentication?"
}

Disable agentic retrieval:

Tool: teamloop_query
Input: {
  "query": "PROJ-1234",
  "agentic": false
}

Disabling agentic retrieval is useful for precise queries where you know exactly what you’re looking for and want the fastest possible response.

Retrieval metadata

When agentic retrieval is active, the response includes a ## Retrieval Metadata section:

## Retrieval Metadata
- agentic_enabled: true
- rounds: 2
- sufficiency_checked: true
- sufficient: false
- round1_count: 2
- round2_count: 4
- total_latency_ms: 1850
- reformulated_queries: [auth token management, OAuth provider decisions, session storage architecture]

Field	Description
`agentic_enabled`	Whether agentic retrieval was active
`rounds`	Number of retrieval rounds (1 or 2)
`sufficiency_checked`	Whether the LLM was consulted
`sufficient`	Whether Round 1 was deemed sufficient
`round1_count`	Number of results from Round 1
`round2_count`	Number of new results from Round 2
`total_latency_ms`	End-to-end latency including all rounds
`reformulated_queries`	Queries generated by the sufficiency check

Dashboard

The dashboard query endpoint returns agentic metadata in the retrieval_metadata field of the JSON response. This includes all fields from the table above, allowing the UI to display retrieval diagnostics.

Configuration Defaults

Parameter	Default	Description
Min entity threshold	3	Min results to skip sufficiency check
Confidence threshold	0.4	Min top score to skip sufficiency check
Max reformulated queries	3	Max Round 2 sub-queries
Sufficiency timeout	1500ms	Max time for the LLM sufficiency check
Round 2 limit	5	Max results per reformulated query
Final top-k	5	Final number of results returned
Time budget	2500ms	Total time budget for all rounds

Tips

Agentic retrieval adds modest latency — Round 2 queries run in parallel, so expect only the latency of the slowest sub-query rather than their sum. For latency-sensitive use cases, disable it with "agentic": false.
LLM required — The sufficiency check requires an LLM client (Bedrock Claude). Without one, agentic retrieval is automatically disabled and hybrid search runs directly.
Works with temporal modes — Agentic retrieval works with as_of mode, so temporal queries benefit from the same gap-filling behavior.
Check the metadata — The retrieval metadata tells you exactly what happened. If rounds: 1 and sufficiency_checked: false, the threshold gate determined results were good enough without consulting the LLM.
Broad queries benefit most — Queries like “what’s happening with the infrastructure migration?” benefit significantly from agentic retrieval. Precise queries like “PROJ-1234” typically pass the threshold gate and skip the LLM entirely.

Next Steps

Hybrid Search — The underlying search pipeline that agentic retrieval builds on
Query Playground — Try temporal query modes with agentic retrieval
Agent Memory — Natural language remember/recall powered by the retrieval stack