Tutorial Developer Tools

How to Add an NLWeb Endpoint to Your Existing Website

NLWeb turns your Schema.org markup into a conversational endpoint and MCP server in a few hours, but one config decision determines whether it scales.

AnIntent Editorial

June 3, 2026

10 min read

How to Add an NLWeb Endpoint to Your Existing Website

Photo by Clint Patterson on Unsplash

By the end of this walkthrough, your existing site will expose two new routes, /ask and /mcp, that answer natural-language questions over your own content and let AI agents query it directly. If you want to know how to implement NLWeb protocol on a production site without rewriting your CMS, this is the path that works with the Python reference build as of June 2026.

The shortcut: clone the Microsoft repo, point it at the RSS or Schema.org feed you already publish, pick a vector store, wire in an LLM key, and proxy the resulting service behind your existing domain. The interesting part is what happens after step one.

What You're Actually Installing

NLWeb is not a chatbot framework. According to Microsoft's announcement post, the project was conceived and built by R.V. Guha, the same person who created RSS, RDF, and Schema.org, and it treats your existing structured data as the index rather than asking you to build a new one. The same Microsoft post confirms that every NLWeb instance is also a Model Context Protocol (MCP) server, making website content discoverable and accessible to AI agents in the MCP ecosystem.

That dual nature is the whole point. The same deployment serves a human typing a question into a search box and an autonomous agent calling your site as a tool. As itnext's deep dive lays out, the two primary endpoints are /ask for human and agent natural-language queries and /mcp for agent-facing tool exposure as an MCP server.

The stack is deliberately boring. Microsoft describes NLWeb as leveraging semi-structured formats already published by websites, Schema.org, RSS, and JSON-LD, combined with LLM-powered tools to create natural language interfaces usable by both humans and AI agents. If your site already emits product, article, recipe, or event markup for Google, you have most of what NLWeb needs.

The Five Steps That Get You to a Live /ask Route

For an NLWeb setup tutorial that mirrors the reference quickstart, the sequence is fixed and short. A walkthrough on dev.to documents the reference flow: clone the repo, set up a Python virtual environment, pip install requirements, copy .env.template to .env with API keys, run db_load against an RSS feed URL, then launch app-file.py to serve on localhost:8000.

The condensed checklist:

git clone https://github.com/microsoft/NLWeb and create a Python virtual environment inside it.
pip install -r requirements.txt to pull the agent, retrieval, and server modules.
Copy .env.template to .env and add at least one LLM provider key and one vector store credential set.
Run python -m data_loading.db_load <your-feed-url> to embed and index your existing RSS or Schema.org feed.
Run python app-file.py and hit http://localhost:8000/ask?query=... to confirm the agent answers from your content.

That gets you a working service on a laptop. It does not get you a production endpoint, and the gap between those two states is where most of the real work lives.

Pick the Vector Store Before You Pick the Model

The Python service is genuinely portable, but the storage decision is sticky. According to the Microsoft NLWeb GitHub repo, supported vector stores include Qdrant, Snowflake, Milvus, Azure AI Search, Elasticsearch, Postgres, and Cloudflare AutoRAG. Pick the one that already lives in your infrastructure rather than the one with the prettiest benchmark chart, because re-embedding a large catalog is the most expensive operation in this entire system.

For teams that do not want to operate retrieval at all, there is a managed path. TechRadar reports that Cloudflare added native NLWeb support via its AutoRAG infrastructure in early 2026, offering a managed deployment path for teams that don't want to handle infrastructure themselves. That gives the protocol a second infrastructure-level backer beyond Azure, which matters for anyone allergic to single-vendor commitments.

LLM choice is the easier knob. The GitHub repo lists supported LLM providers including OpenAI, DeepSeek, Gemini, Anthropic, Inception, and HuggingFace, and Microsoft describes NLWeb as technology-agnostic across operating systems, models, and vector databases, with no vendor lock-in. Swap providers by editing .env, not by rewriting code.

The Cost Trap Nobody Puts in the Quickstart

Here is the detail that does not appear on the project's landing page. As dev.to documents, a single NLWeb query can trigger 50+ targeted LLM calls for sub-tasks including query decontextualization, relevancy checking, memory detection, and a fast-track path for simple queries. Per-query LLM costs can be substantially higher than a naive RAG chatbot.

Fifty model calls per user question is not a bug. It is how the agent disambiguates intent, scores candidates, and avoids hallucinations on structured data. It is also how your monthly bill triples if you route every call to a frontier model.

The mitigation is built in. The Microsoft NLWeb repo documents architecture modules including AskAgent for the core query agent, AgentFinder for agent discovery and routing, DataFinder for natural language to SQL against HubSpot, Dynamics 365, and Jira, ModelRouter for cost-optimized LLM routing, and NLWebScorer for neural result ranking. Configure ModelRouter to send decontextualization and relevancy checks to a small, cheap model and reserve the expensive one for the final synthesis. Skipping this step is the single most common reason teams abandon NLWeb after their first invoice.

For context on why this kind of per-token accounting matters across modern AI tooling, the same arithmetic applies to coding assistants. See our breakdown of GitHub Copilot AI Credits and token billing for a parallel example.

Make Schema.org Carry Its Weight

The quality of the answers correlates directly with the quality of the markup. TechRadar's NLWeb explainer notes that NLWeb performs best with content organized as lists of items such as products, events, recipes, and reviews. Sites with poor or missing Schema.org annotations will get noticeably weaker results. This is the real meaning of Microsoft NLWeb schema.org integration: the protocol does not invent structure, it amplifies whatever you already publish.

Before loading data, audit one representative URL with Google's Rich Results Test. If the itemListElement, Product, Recipe, or Article types are missing required fields, fix them in the source templates. The fix is permanent, benefits search ranking independently, and makes every downstream NLWeb query cheaper and more accurate.

The scale argument for doing this work at all is straightforward. The Microsoft repo notes that Schema.org is used by over 100 million websites, meaning most publishers already have the semantic layer NLWeb depends on. If you are in that majority, the marginal effort to expose a conversational endpoint is hours, not weeks.

Wire the Endpoint Into Your Existing Domain

Running app-file.py on port 8000 is fine for a demo. For a real NLWeb MCP server website deployment, you need three things the reference repo will not give you.

First, a reverse proxy. Put Nginx, Caddy, or your cloud's load balancer in front of the Python process and map https://yourdomain.com/ask and https://yourdomain.com/mcp to the upstream service. Terminate TLS at the proxy, never at the Python app.

Second, an indexing job. The db_load command is a one-shot importer. Schedule it as a cron or a CI task that re-embeds changed items, because nothing in the repo does this for you. The Microsoft repo is explicit that CI/CD pipelines are not yet included, a concrete production-readiness gap developers must account for.

Third, query mode discipline. According to itnext's deep dive, NLWeb supports three explicit query modes: list, which returns ranked items, summarize, which produces an LLM-generated summary over results, and generate, which produces a full RAG response with the highest hallucination risk and the most restricted use. Default the public /ask endpoint to list or summarize. Reserve generate for authenticated contexts where you have reviewed the prompts.

For MCP clients, the contract is already standardized. itnext reports that Cloudflare's NLWeb docs confirm the /ask and /mcp endpoint contract, giving the protocol a second infrastructure-level backer alongside Microsoft Azure AI Search. Any MCP-aware client, including Claude Desktop and a growing list of agent frameworks, can call your site once /mcp is reachable.

The One Stack Choice That Will Bite Production Teams

If you read the announcements and assume there is a stable .NET path, stop. itnext reports that the .NET 9 official implementation, NLWebNet, is labeled experimental and not intended for production use, meaning Python remains the only reference implementation considered deployment-ready as of June 2026.

That is the single most under-reported fact about the project right now. Teams on a Windows or .NET-only mandate are choosing between running a Python sidecar service or waiting on an implementation that is not ready. There is no clean third option.

The upside of the Python service is its portability. The Microsoft GitHub repo describes it as designed to run on everything from data center clusters to laptops and, eventually, mobile devices. A small VM or a single container is enough to start, and you can scale horizontally behind your load balancer once traffic warrants it.

Why This Is Not Just Another Chatbot

The NLWeb vs chatbot comparison matters because it changes what you build. A traditional chatbot is a UI bolted onto a model with retrieval glue, owned end-to-end by your team. NLWeb inverts that: the protocol is the product, the UI is optional, and other agents can call your site without you shipping any client at all.

There is one user-visible capability that conventional search interfaces simply cannot replicate. Dev.to's analysis explains that multi-turn conversation context is preserved natively, each query builds on the previous session, which traditional search interfaces cannot do. A visitor can ask for red running shoes, then ask which of those are under $120, then ask which ship to Berlin, without restating the original query.

The protocol is also positioned to add a conversational interface to a website without giving up the discoverability and indexability of the underlying pages. Your HTML still ranks. Your Schema.org still feeds Google. The /ask endpoint is additive.

The licensing is permissive enough to matter for commercial use. The Microsoft NLWeb GitHub repo confirms NLWeb is licensed under the MIT License, free to use commercially with no royalty obligations. There is no per-seat or per-query fee from Microsoft, only the model and vector store costs you choose.

The One Check That Catches Most Deployment Failures

After you start the service, run the same query twice with different phrasing against /ask and compare the result sets. If the two responses share fewer than half their top items, your embeddings are under-indexed, your Schema.org is too thin, or your ModelRouter is routing decontextualization to a model that is too small. Fix that before exposing the endpoint to traffic, because a chat surface that returns different answers to the same question destroys trust faster than a slow one.

The protocol's ambitions are explicit. Microsoft's announcement frames the goal this way: just as HTML made it easy for almost anyone to create a website, NLWeb should make it easy for any web publisher to create an intelligent natural language experience. Whether it delivers on that depends on adoption that is still early. TechRadar notes that NLWeb was announced at Build 2025, and Build 2026, running June 2-3, 2026 at Fort Mason Center in San Francisco, is the first conference where it can be evaluated against real-world deployments rather than just potential.

If the endpoint is live, your Schema.org is clean, and your model routing is tiered, the next move is to register your /mcp endpoint with the MCP clients your audience already uses. For broader context on how agent infrastructure is taking shape across vendors, see our coverage in AI Infrastructure and the Developer Tools section.

Frequently Asked Questions

Does NLWeb replace my existing site search?

No. NLWeb adds an /ask endpoint for natural-language queries and an /mcp endpoint for AI agents, but your HTML pages still rank in Google and your Schema.org markup keeps powering rich results. It is additive, not a replacement.

Can I run NLWeb without sending data to OpenAI or Microsoft?

Yes. The project supports OpenAI, DeepSeek, Gemini, Anthropic, Inception, and HuggingFace as LLM providers, and self-hostable vector stores like Qdrant, Milvus, Postgres, and Elasticsearch, so a fully self-hosted deployment with open models is supported.

What does a single NLWeb query actually cost in LLM calls?

A single query can fan out into 50+ targeted LLM calls covering decontextualization, relevancy checking, and memory detection. Using the ModelRouter module to send those subtasks to a cheaper model and reserve a frontier model for final synthesis is the main cost-control lever.

Is there a .NET version of NLWeb ready for production?

Not yet. The official .NET 9 implementation, NLWebNet, is labeled experimental and not intended for production use, so the Python reference implementation remains the only deployment-ready option as of June 2026.

How do AI agents discover my NLWeb endpoint?

Every NLWeb instance is also an MCP server, so any MCP-compatible client can call your /mcp route as a tool. Microsoft joined the MCP Steering Committee at Build 2025 and contributed an updated authorization spec and a server registry service design to help with discovery.

Written by

AnIntent Editorial

AnIntent is an independent technology and automotive publication. Our editorial team researches every article from live primary sources, cross-checks key facts across multiple references, and cites claims inline so readers can verify them directly. We cover smartphones, laptops, EVs, gaming hardware, AI tools, and more — with no sponsored content and no paid placements.

About AnIntent → Editorial standards →