The Big Picture
Making tool search part of an agent's thinking lets agents pick the right tool far more often; evolutionary refinement boosts results but only when the agent is already a capable semantic thinker.
ON THIS PAGE
Key Findings
FitText turns retrieval into an active, iterative part of an agent's reasoning: the agent generates short, natural-language probes describing what it needs, uses retrieval feedback to refine those probes, and explores variants randomly. Adding an evolutionary-style selection step (memetic retrieval) keeps the best probes and avoids repeating wasted searches. On very large API sets Chain of Thought Pattern , this approach moves the correct tool much higher in the search results and raises task success rates dramatically. If the base agent model is weak, the evolutionary search can amplify noise instead of refining useful probes, so base model ability matters. Tool Use Pattern
By the Numbers
1Average retrieval rank improved from 8.81 to 2.78 on ToolRet (43,000 tools).
2Achieved a 0.73 average pass rate on StableToolBench (16,464 APIs), a 24 percentage-point absolute gain over static retrieval.
3Method evaluated at scale on libraries with 43k tools and 16,464 APIs, showing gains hold on very large ecosystems.
Why It Matters
Engineers building agent platforms and tool marketplaces will get more reliable tool selection by making retrieval dynamic rather than fixed at the query. Market-Based Coordination Pattern Platform and product leads responsible for agent reliability or delegation should consider adding iterative probe generation and selection to improve success rates, while verifying their base model can semantically reason.
Test your agentsValidate against real scenarios
Ready to evaluate your AI agents?
Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.
Learn MoreYes, But...
FitText needs a reasonably capable base model to act as a semantic operator; weaker models may make the evolutionary search worse. The iterative and stochastic process increases compute and retrieval traffic compared to a single static query. Results were shown on large API catalogs and may vary for smaller, highly specialized toolsets or different retrieval backends. Agent Service Mesh Pattern
Deep Dive
FitText reframes tool discovery as an evolving conversation inside the agent: instead of issuing one retrieval query and acting on the top hit, the agent generates short, human-style descriptions (pseudo-tool descriptions) that capture what it thinks it needs at that moment. Each generated probe is used to retrieve candidate tools; retrieval results are fed back to the agent to refine the next probe. To avoid getting stuck on small variations, FitText also produces diverse probe variants via randomized generation and then applies memetic retrieval—an evolutionary selection process that favors probes which lead to useful, non-redundant candidates. A lightweight tool memory records explored areas to prevent wasted repetition. ReAct Pattern (Reason + Act)
Test your agentsValidate against real scenarios
Credibility Assessment:
Authors show very low h-indexes (1–3), no institutional affiliations listed and only an arXiv preprint with no citations — emerging/limited info.