GA vs Specialized Agents – Behaviour on Translation

Question

Hi all,

I’m seeing a difference between how Glean Assistant (GA) behaves in chat vs a specialized workflow agent using File Analyst, and I’d like to know if this is expected and how to design this better.

Use case

I have large tables (CSV/XLSX/JSON, ~3–4k rows) with:

* English labels
* Product names
* Help/tooltip text

Goal: a workflow agent where users upload a file and receive:

* One main Arabic translation column
* Several Arabic suggestion columns (3–5 variants)
* Technical placeholders (e.g. {equipmentNumber}, ICU plural patterns) preserved exactly
* A downloadable translated CSV/XLSX

What GA chat can do

If I work directly in GA chat on the same kind of content, GA can:

* Produce fluent Arabic for both short labels and longer help text
* Preserve {...} placeholders and ICU plural patterns correctly
* Generate multiple natural Arabic variants when asked

So the underlying model clearly supports full, high‑quality translation for this domain.

What happens in the specialized agent

I built a workflow:

* Document Reader – load the whole CSV/XLSX/JSON.
* File Analyst – detect the English column, translate to Arabic, create suggestion columns, export CSV/XLSX.
* Respond – show summary + download links.

In the File Analyst step, I instruct it to:

* Translate every non‑empty English cell to Arabic
* Preserve placeholders/ICU patterns literally
* Avoid wrapping English in markers like [AR] ..., [Arabic] ... (n), [English] ...
* Avoid Notes = "Needs manual translation" / "Needs review" as the default

However, on real‑size datasets I consistently see:

* “Arabic” columns that are actually English with wrappers, e.g. [AR] The date and time.
* Rows flagged en masse as “Needs manual translation” or “Needs review” instead of real Arabic text
* In some runs, reversed English strings with an Arabic prefix, which looks like fallback behaviour

So GA chat fully translates the content, but the File Analyst‑based agent mostly produces placeholders and review flags at scale.

Questions for the community

* Is this behaviour expected for File Analyst?* Is it designed to favour mappings/placeholders and conservative “Needs review” statuses instead of free‑form MT on large tables?

* Can File Analyst be configured for a “full translation mode”?* i.e., for a given workflow, allow it to:* Use the LLM for free‑form translation on every row
* Disable placeholder patterns like [AR] ..., [Arabic] ... (n), [English] ...
* Avoid defaulting most rows to “Needs manual translation/Needs review”

* If not, what is the recommended pattern for this?* A dedicated bulk translate tool for workflows?
* A hybrid where File Analyst only handles file I/O/structuring and another step does the actual MT (similar to GA chat behaviour)?

I’m basically trying to get GA‑level translation quality and completeness inside a reusable workflow agent, without drowning the output in placeholders. Any guidance or best practices would be greatly appreciated.

Thanks!

Sultan Shahabuddin · Answer

Thank you for your support. I tried the approach you suggested and have a few pieces of feedback.

* Dataset: a CSV with 441 rows and one column of English labels.
* Goal: translate to Arabic and add three additional columns of Arabic suggestions, producing:

English | Arabic | Arabic Suggestion 1 | Arabic Suggestion 2 | Arabic Suggestion 3

Observations (using my Agent):

* LLM translation path: The model translates accurately, but the chat view won’t display all 441 rows. It only shows a subset and reports that it can’t return the entire output at once.
* File Analyst path: It returns a full, paginated data frame and offers a downloadable file, but it doesn’t perform the translations—instead it inserts “[AR]” placeholders.

I’m therefore blocked: I can’t get both complete output and actual translations in a single flow yet.

mcomolli · Answer

* I don't have any examples on hand, but I think a prompt very similar to what you use for assistant should work well!
* there is not a hard limit, but the performance also depends on the amount of text per row, so it's hard to say. If the rows are text-intensive, 3-4k might be close to the upper bound of what works well.
* Bulk translation is not currently a committed roadmap item, so I would recommend proceeding with Plan & execute!

Sultan Shahabuddin · Answer

Hi @mcomolli ,

Thanks a lot for the clarification — that makes sense.

If I’m understanding correctly:

* The Analyze Data / File Analyst node is primarily intended to generate code over structured data, not to do row‑by‑row semantic generation like translation.
* For this translation use case, I should either keep using Glean Assistant directly, or switch the workflow to a Plan and Execute node that:* Uses Document Reader to ingest the file
* Uses an LLM step to translate each row (preserving placeholders/ICU patterns)
* Then writes out a translated CSV/XLSX.

A couple of follow‑up questions:

* Do you have any example workflows or templates that show Plan and Execute being used for bulk translation or similar row‑wise generation tasks?
* Are there any recommended limits or best practices for file size / row count (e.g., 3–4k+ rows) so a Plan and Execute–based translation flow stays reliable and doesn’t time out or degrade into placeholders?
* Longer term, for this kind of “bulk MT with placeholder preservation”, would you recommend:* Using Plan and Execute as the standard pattern, or
* Waiting for / relying on a more dedicated bulk‑translation style tool (if that’s on the roadmap)?

In the meantime I’ll experiment with a Plan and Execute–based workflow where File Analyst is only used for I/O/structuring (if at all) and the LLM does the actual translation, and see how close I can get to GA‑chat quality and behavior.

Really appreciate the guidance!

Sultan

Getting Started

Events

Help Center

glean.com