Dear Team,
I wanted to share some observations and feedback around Glean Canvas performance that I think could be helpful as the feature matures. We've been using Canvas fairly heavily for iterative drafting and refinement of longer documents, and it's genuinely a great fit for that workflow — the inline editing, version history, and export capabilities are excellent.
That said, we've noticed a few things during heavier sessions that I wanted to flag constructively.
What We're Experiencing
Progressive slowness during longer sessions: Canvas interactions remain snappy in the first few turns of a conversation, but once the session involves several tool calls (searches, activity lookups, email retrieval, etc.) the response times increase noticeably. By turn 8–10 in a research-heavy session, the lag becomes quite pronounced compared to fresh conversations.
"Done" but Canvas doesn't refresh: On longer runs — particularly bigger edits or multi-section rewrites — the assistant occasionally reports that the operation is complete, but the Canvas content doesn't update, or only partially updates. It looks like the model output may be getting truncated, and the UI doesn't clearly surface that the write-back didn't fully complete.
This matters because Canvas is genuinely the right tool for our use case. The current experience just makes it hard to fully trust for longer working sessions, and we sometimes end up copying content out to other tools as a safety net, which reduces some of the value.
What I Tried to Understand — The Model's Own Perspective
To better understand what's happening, I asked the assistant to describe its own mechanics during a Canvas session. Here's what it reported (lightly edited for clarity):
How Canvas Creation Works
No separate tool is invoked by the model to create a Canvas — it generates structured XML as part of its text response. The Glean platform then parses and renders this in the Canvas panel. So from the model's perspective, a Canvas is just structured text output.
How Canvas Editing Works
The model invokes the artifact_edit tool, which takes three inputs: the artifact name, the exact text to find, and the replacement text. It performs a literal find-and-replace — one replacement per call. If multiple edits are needed, that's multiple sequential tool calls, each requiring a full LLM inference cycle to determine what to change.
How Context Accumulates
From the model's perspective, there is no distinct "re-read" of the Canvas — the canvas content is part of the conversation history that gets re-processed each turn. The Glean platform may manage this differently internally, but from the model's side, the entire conversation — every message, every tool call input and output, every artifact — appears to be re-sent on every single turn. This is consistent with how transformer-based LLMs generally operate in chat.
What a Heavy Session Looks Like (Example)
In one session, the model reported the following approximate context accumulation:
Turn | Action | Estimated Context Added (approx.) |
|---|
1 | Image generation request | ~2,000 words |
2–3 | Enterprise search queries | ~8,000 words in tool results |
4 | Email + calendar lookups | ~3,000 words across tool results |
5 | User activity retrieval (3-month range) | Very large payload — tool warned results were truncated |
6 | Canvas artifact generated | ~2,000 words of output |
7 | Canvas review and analysis | ~2,500 words of output |
8 | Additional search query | Very large payload (estimated tens of thousands of words from a single result) |
9 | Follow-up question | All of the above re-sent, plus the new message |
Note: The sizes above are the model's own estimates based on the volume it processed — not measured values. Actual token counts are not visible to the user or the model.
By turn 9, the model estimated it was processing well over 100,000 words of cumulative context. The progressive slowness directly correlated with this accumulation.
Where the Slowness Comes From
Layer | What's Happening | LLM Involved? |
|---|
Input assembly | Platform collects conversation history + system prompt + tool definitions and sends to LLM | No — platform-side |
LLM inference | Model processes the full context and generates a response | Yes — this is the expensive part. More context = slower inference |
Tool execution | Tool calls go to backend services; results are appended to context | Partially — model decides to call, but execution is independent |
Canvas rendering | Frontend parses XML output and renders in Canvas panel | No — client-side |
Canvas editing | artifact_edit does a string find-and-replace
| The operation itself is lightweight, but requires full LLM inference to determine the edit |
Constructive Suggestions
These are offered as ideas for discussion — I understand the platform is evolving quickly and some of these may already be on the roadmap.
A. Context Compaction for Tool Results
Tool results appear to be stored with minimal compression in the conversation context. In one case, a search returned an estimated tens of thousands of words, of which the model used perhaps a few hundred in its response. The unused portion remains in context for all subsequent turns.
Idea: After the LLM has processed a tool result, summarize or truncate the stored version. Keep the relevant excerpts, discard the raw payload. This is sometimes called context compaction or memory summarization.
B. Canvas Pinning in Context
The Canvas artifact and tool results appear to share the same context space. As tool results accumulate, the Canvas content itself may be at risk of silent truncation — meaning the document the user cares about most could be the thing that gets partially evicted.
Idea: Treat the active Canvas as a pinned context block that's always preserved at full fidelity. If context needs to be trimmed, prioritize evicting older tool results before touching the artifact.
C. User Visibility into Context State
Currently there's no indication of how much context capacity has been consumed, whether earlier content has been truncated, or why a response might be slower than usual. The only signal is when things feel laggy or output quality degrades.
Idea: A simple visual indicator for context utilization — even a traffic light (🟢 fresh / 🟡 getting heavy / 🔴 near capacity) — would help users make informed decisions about when to start a new conversation.
D. Adaptive Tool Result Sizing
Tools appear to return their full result set regardless of how much context is already consumed. A search early in a conversation and a search at turn 9 return the same volume, even though the available context budget is very different.
Idea: When context utilization is already high, return shorter or summarized tool results automatically. Or provide a mechanism for the model to request "summary mode" from tools.
E. Canvas Continuity Across Conversations
The main workaround for context bloat is to start a new conversation — but the Canvas doesn't carry over. This creates a tension: you want a fresh context for performance, but you need the existing Canvas for continuity.
Idea: A "Continue this Canvas in a new conversation" option that brings only the artifact into a fresh context, leaving behind all the accumulated tool history.
Summary
Canvas is a genuinely useful feature and we'd love to use it more heavily. The core challenge isn't the Canvas itself — it's that the cumulative context growth in research-heavy sessions creates progressive performance degradation. The suggestions above are focused on managing that context lifecycle more transparently and efficiently.
"Side observation — Canvas feels noticeably more responsive on weekends compared to weekday business hours. Could be coincidence, but it's been consistent enough to notice. If the LLM inference layer is shared infrastructure, this might point to load-related latency rather than a pure context-size issue."
Regards,
Sultan