
The goal is to help developers build more accurate and context-aware AI systems while reducing the complexity of integrating web search, retrieval and grounding capabilities into enterprise applications, the company wrote in a blog post.
The APIs already underpins grounding for Microsoft Copilot and ChatGPT, and unlike traditional search APIs are designed to retrieve highly relevant information while minimizing token consumption, helping reduce both inference costs and response latency, Microsoft said.
Reducing the cost and complexity of web grounding
That focus on reducing inference costs and response latency to deliver Web IQ’s search capabilities will be valuable for CIOs and developers, said Phil Fersht, chief analyst at HFS Research.
“Developers have typically stitched this together themselves using search APIs, web scraping, retrieval-augmented generation, vector databases, custom ranking logic, crawling tools and separate orchestration layers. That works, but it is messy, brittle and expensive to maintain,” he said.

