Dear Team,
Kindly review the problem statement and the possible resolution to address it.
Why GA Chats Die — And Why Nobody Tells You
The Technical "Why"
Aspect | What Happens | Why It's Not Surfaced to You |
|---|
Context Window Limit | LLMs have a hard token ceiling — GPT-5.1 (current GA primary model) caps at 256K tokens, Claude Sonnet 4.5 at 1M, GPT-4o at 128K. A heavy chat with dense Encompass content (SDs, UC mappings, OOTB catalogues) can burn through this fast. Once exceeded, the model either silently truncates older context or errors out entirely. | GA doesn't expose a "context health" meter. Glean's own docs confirm it "automatically truncates tokens behind the scenes to stay within these limits" — but there's zero user-visible indication this is happening. You don't know you're losing context until answers start degrading or it crashes. |
Tool Output Bloat | Every time GA reads a Solution Description, Use Case Mappings, or OOTB Catalogue, those outputs consume massive context chunks. Glean docs confirm: "Read document steps always read the entire contents of a document into the memory. For very long documents, this may use a lot of tokens." A chat with 10+ large doc reads can exhaust the window quickly. | Tool outputs are "invisible" to you — you see a summary, but the model ingests the full raw output. There's no feedback loop showing how much "budget" each read consumed. No way to know that one SD read just ate 15% of your remaining context. |
Silent Truncation (Not Graceful Degradation) | Glean does truncate behind the scenes to stay within limits, and will "read only a portion of documents when the token limit is being approached." But it doesn't tell you this is happening. It doesn't say "I'm running low on context, let me summarise what we have so far" or "please start a new chat." It either silently drops older context (degrading answer quality) or just fails. | This is the biggest gap. Competing products (Cursor, Claude Projects) at least warn you or auto-summarise. GA currently lacks any proactive context management UX. The error message is a generic "Something went wrong on my end" rather than "your conversation has exceeded the context limit — please start a new chat." |
References: All of this is documented in Glean's own Memory docs which lists exact token capacities per model, and confirms the silent truncation behavior.
What I'm doing to work around this: I've started accumulating my GA learning and training in Confluence so that even when a chat becomes unresponsive, the learning is preserved and a fresh GA session can follow it. I also use GA's Memory feature ("remember that...") to persist project context across sessions — but this doesn't solve the mid-conversation death problem.
What I'd like to see addressed (if possible):
- A context health indicator — even a simple "you've used X% of available context" would be transformative
- Better error messaging — tell me why it failed, not just "something went wrong"
- Proactive degradation handling — summarise and suggest a new chat before crashing
- Visibility into whether thumbs-down feedback on these failures is being tracked and triaged
I keep hitting this error daily and have been doing thumbs-down regularly to report it.
"Something went wrong on my end, so I couldn't generate a response. Let's try that again, and if it keeps happening, reach out to your admin for help."
Regards,
Sultan Shahabuddin