Over the last year, a lot of people have gone from “What is MCP?” to wrestling with how to plug it into their real, messy company data. If that sounds familiar, this piece is for you. We’ll build things up from first principles—what LLMs and RAG actually do, then layer in MCP, follow a single query all the way through the stack, and finally show where Glean MCP servers fit when you want your favourite MCP client to speak fluently to your enterprise context.
Here, we are going to talk just about that but before we begin, let's start with the basics.
How LLMs (and RAG) actually work
A large language model (LLM) is essentially a highly capable autocomplete engine. It has been trained on a massive corpus of text. When you give it a prompt, it predicts which tokens are most likely to come next. It does not, by default, have a live connection into your Jira, GitHub, Slack, Confluence, or internal RFCs.
That’s why you see two behaviours at once: it can write convincing emails, documents, or code, and it can also completely make things up when the answer depends on private or fast‑changing information it never saw during training. Think of it as a very smart person sitting an exam with no access to your company’s systems or documentation.
Retrieval‑Augmented Generation (RAG) is how we fix that. Instead of asking the model to answer from pure “memory”, we insert a retrieval step in front:
First you retrieve potentially relevant content from your own systems—documents, tickets, wiki pages, code, chat messages. Then you augment the model’s prompt with those snippets. Finally, you let the model generate the answer, now grounded in that external context.
But why context matters
Even with RAG wired in, you can still fail badly if the retrieval layer is weak or fragmented. If the system cannot see the right mix of content, you end up with either hallucinations or incomplete answers.
In practice, that means you need a few things:
- You need a solid awareness of your content, with a clear picture of which documents, tickets, repos, and messages exist, and how they relate.
- You need retrieval that brings back a small, high‑signal subset of context instead of dumping everything into the prompt.
- You need permission checks so sensitive or regulated content does not quietly leak into prompts for the wrong users.
That is exactly the gap MCP addresses. It does not try to define how you index or rank your data. Instead, it standardizes how LLM‑based applications call out to tools and data sources, so you are not re‑implementing bespoke APIs and plugins for every app‑to‑backend combination. [1][2]
Enter MCP: “APIs for LLMs”
The Model Context Protocol (MCP) is an open standard that specifies how AI applications (called “hosts”) talk to external “servers” that expose tools, resources, and prompts.[1][2] The core idea is simple: give LLM hosts a consistent, schema‑driven way to discover and invoke tools, no matter who built them.
A host is the app where the user types. That could be an IDE like Cursor or VS Code, a chat app like Claude Desktop or ChatGPT, or a custom agent frontend. The host includes an LLM plus an MCP client library. The MCP server is a process, local or remote, that exposes tools the LLM can call: “search this knowledge base”, “query this database”, “create a ticket”, “run this workflow”, and so on.
The MCP Life
Assume you are in an MCP‑enabled IDE, connected to a Glean MCP server, and you ask:
Use Glean to find the latest RFC about our hosted MCP server authentication and summarize the key architectural decisions.
Step 0 – The IDE collects your message, plus any relevant conversation history, and sends it to its configured LLM backend.
Step 1 – The LLM sees that it needs internal context to answer your question, so instead of replying directly it plans a call to Glean’s search tool via MCP.
Step 2 – The host takes that tool call, hands it to its MCP client, and the client sends a search request to the Glean MCP server over HTTPS/SSE.
Step 3 – The Glean MCP server checks who you are via SSO, applies your permissions, and then forwards the request to the right Glean backend service.
Step 4 – Glean Search runs a permission‑aware query across your enterprise graph, picks the most relevant docs and snippets, and streams them back through the MCP server.
Step 5 – The host passes those search results back into the LLM in a second call, alongside your original question and tool‑usage metadata.
Step 6 – The LLM uses that retrieved context to write a grounded answer, which the host then streams back into your IDE, without the model ever talking directly to your internal systems.
Final thoughts
If you zoom all the way out, the split of responsibilities looks like this:
LLMs give you the reasoning engine. RAG gives that engine the right pages from the book at the right time. MCP gives LLM hosts a clean, standard way to call tools and fetch that context. Glean provides the enterprise graph and security model underneath all of this. Glean MCP servers are simply the glue between those worlds.
If you are already experimenting with MCP in an IDE, chat app, or custom agent, the next logical step is straightforward: point one of those hosts at a Glean MCP server and see how it feels when the assistant suddenly knows your company as well as you do.
Stay tuned for the next part of the discussion wherein we will dive deeper into Authentication, Transports, Glean MCP offerings and on-prem connectivity.
References (for further reading)
- Model Context Protocol – official site and specification
- “Model Context Protocol” – overview and background on Wikipedia.
- “A Deep Dive Into MCP and the Future of AI Tooling” – Andreessen Horowitz
- Cloudflare “Model Context Protocol (MCP)” docs – remote vs local MCP and transports
- IBM “What is Model Context Protocol (MCP)?” – architectural breakdown and analogy