Skip to content

Agentic Retrieval

Standard retrieval runs a single pass and returns whatever it finds. If your query is ambiguous or broad, you get partial results and don’t know what you’re missing. Agentic retrieval adds an LLM-driven sufficiency loop on top of hybrid search that evaluates whether results adequately cover your query — and if not, generates targeted follow-up queries to fill the gaps.

Single-pass retrievalAgentic retrieval
”Show me the auth decisions” → returns 2 results about JWTSame query → Round 1 finds JWT results, LLM identifies missing OAuth and session management coverage, Round 2 fills gaps
You don’t know what’s missingSufficiency check explicitly identifies gaps
Works well for precise queriesWorks well for both precise and broad queries

When you run a query with agentic retrieval enabled (the default), TeamLoop executes the following flow:

The query runs through the standard hybrid search pipeline — vector search, BM25 text search, RRF fusion, and reranking — returning the top 5 results.

Before invoking the LLM, a fast check determines whether a sufficiency evaluation is needed:

  • If results >= 3 entities AND the top score >= 0.4 → results are likely sufficient, skip the LLM call
  • Otherwise → proceed to sufficiency check

This avoids unnecessary LLM latency for queries that already have strong results.

An LLM evaluates the query against the Round 1 results and returns a structured assessment:

  • sufficient — whether the results adequately cover the query
  • reasoning — explanation of the assessment
  • missing — aspects of the query not covered
  • suggested_queries — up to 3 reformulated queries to fill gaps

The sufficiency check runs with a 1.5s timeout. If the LLM is unavailable or times out, Round 1 results are returned unchanged.

If the sufficiency check identifies gaps, TeamLoop runs all suggested queries in parallel through the hybrid search pipeline, respecting the remaining time budget. Results are deduplicated by entity ID to avoid returning the same entity twice.

Round 1 and Round 2 results are combined using RRF fusion, then reranked together to produce the final result set. This ensures the best results from both rounds surface at the top.

Agentic retrieval is designed to never block or degrade the query experience:

ScenarioBehavior
LLM client not configuredAgentic layer disabled, returns hybrid search results directly
Sufficiency check times out (>1.5s)Returns Round 1 results unchanged
Sufficiency check errorsLogs warning, returns Round 1 results
Round 2 sub-query failsSkips that query, continues with remaining
Time budget exceeded (>2.5s)Stops issuing Round 2 queries, fuses what’s available
Results pass threshold gateSkips LLM call entirely, returns Round 1 results

Agentic retrieval is controlled via the agentic parameter on teamloop_query.

ParameterTypeDefaultDescription
querystringrequiredThe search query
agenticbooleantrueEnable agentic retrieval with sufficiency checking
sourcesstringallComma-separated list of sources to search
modestringcurrentQuery mode: current, as_of
as_ofstringDate in YYYY-MM-DD format for temporal queries
retrievalstringhybridRetrieval strategy: hybrid or standard

Default (agentic enabled):

Tool: teamloop_query
Input: {
"query": "What decisions have been made about authentication?"
}

Disable agentic retrieval:

Tool: teamloop_query
Input: {
"query": "PROJ-1234",
"agentic": false
}

Disabling agentic retrieval is useful for precise queries where you know exactly what you’re looking for and want the fastest possible response.

When agentic retrieval is active, the response includes a ## Retrieval Metadata section:

## Retrieval Metadata
- agentic_enabled: true
- rounds: 2
- sufficiency_checked: true
- sufficient: false
- round1_count: 2
- round2_count: 4
- total_latency_ms: 1850
- reformulated_queries: [auth token management, OAuth provider decisions, session storage architecture]
FieldDescription
agentic_enabledWhether agentic retrieval was active
roundsNumber of retrieval rounds (1 or 2)
sufficiency_checkedWhether the LLM was consulted
sufficientWhether Round 1 was deemed sufficient
round1_countNumber of results from Round 1
round2_countNumber of new results from Round 2
total_latency_msEnd-to-end latency including all rounds
reformulated_queriesQueries generated by the sufficiency check

The dashboard query endpoint returns agentic metadata in the retrieval_metadata field of the JSON response. This includes all fields from the table above, allowing the UI to display retrieval diagnostics.

ParameterDefaultDescription
Min entity threshold3Min results to skip sufficiency check
Confidence threshold0.4Min top score to skip sufficiency check
Max reformulated queries3Max Round 2 sub-queries
Sufficiency timeout1500msMax time for the LLM sufficiency check
Round 2 limit5Max results per reformulated query
Final top-k5Final number of results returned
Time budget2500msTotal time budget for all rounds
  • Agentic retrieval adds modest latency — Round 2 queries run in parallel, so expect only the latency of the slowest sub-query rather than their sum. For latency-sensitive use cases, disable it with "agentic": false.
  • LLM required — The sufficiency check requires an LLM client (Bedrock Claude). Without one, agentic retrieval is automatically disabled and hybrid search runs directly.
  • Works with temporal modes — Agentic retrieval works with as_of mode, so temporal queries benefit from the same gap-filling behavior.
  • Check the metadata — The retrieval metadata tells you exactly what happened. If rounds: 1 and sufficiency_checked: false, the threshold gate determined results were good enough without consulting the LLM.
  • Broad queries benefit most — Queries like “what’s happening with the infrastructure migration?” benefit significantly from agentic retrieval. Precise queries like “PROJ-1234” typically pass the threshold gate and skip the LLM entirely.
  • Hybrid Search — The underlying search pipeline that agentic retrieval builds on
  • Query Playground — Try temporal query modes with agentic retrieval
  • Agent Memory — Natural language remember/recall powered by the retrieval stack