Hi all,
I’m looking for some help and design advice around a query agent that needs to work with data stored in an Excel/CSV file.
Context
The agent follows a simple three-step pattern:
- Trigger – user submits a query with some parameters.
- Analyse – the agent looks up values in a tabular dataset (originally an
.xlsx, now a .csv). - Respond – it combines those lookups with its reasoning and returns an answer.
I’m trying to make this pattern reliable for structured lookups against a relatively large dataset.
The main problem
The agent is struggling to reliably read and query the file-based data source. Concretely:
- It often defaults to only looking at the first 100 rows of the file (this is visible in its own “thinking/running” commentary, where it explicitly says it will consider only the first 100 rows).
- I initially suspected file size limits, so I transformed the original
.xlsx into a .csv that is now under 1 MB, but the agent behavior (stopping at ~100 rows) persists. - Earlier, with the
.xlsx, I also saw:- Inconsistent or failed reads from the file
- Apparent size/complexity limits when the workbook got larger
- Difficulty ensuring that the agent uses the latest version of the file without manual intervention
I’m currently using a fast LLM (Sonnet 4.5) for the agent, in case model choice or context limits are relevant to this behavior.
Overall, I don’t yet have a robust pattern for “agent reads structured tabular data from a file and uses it for deterministic lookups beyond the first 100 rows.”
What I’ve tried so far
So far, I’ve experimented with:
- Pointing the agent directly at the
.xlsx stored in a document repository - Converting the
.xlsx to a smaller .csv (< 1 MB) to avoid size/complexity issues - Reducing the number of columns/sheets and simplifying the original workbook
- Treating the file as a generic reference document (unstructured) rather than a structured table, but this doesn’t give the deterministic, row-level lookups I need
These approaches haven’t given me a stable, scalable solution. The agent still tends to cap itself at ~100 rows of data.
What I’m considering next
I’m exploring whether there’s a better architectural pattern for this, for example:
- Power Automate (or similar) as a middle layer
- Use a flow to:
- Periodically read the
.xlsx/.csv file - Transform it into a more agent-friendly format (e.g., normalized CSV/JSON or a tiny API endpoint)
- Potentially expose a small API or drop a processed file somewhere the agent can reliably access
- Then have the agent call that processed data source in the Analyse step, instead of directly parsing the original file.
- Moving the data into a SQL database
- One-time or scheduled ingestion of the file into a SQL table
- Have the agent:
- Call out to a lightweight query service / API on top of the database, or
- Use any built-in SQL connector pattern (if one exists) so that the agent can run parameterized queries (e.g.,
SELECT ... WHERE key = ?) instead of scanning file rows.
- Other recommended patterns
- Any best practices around:
- “Registering” structured datasets (from Excel/CSV or otherwise) so agents can query them reliably
- Handling updates when the underlying file data changes
- Managing size/row limits and performance so the agent isn’t implicitly capped at the first N rows
What I’m asking the community
- Is there a recommended pattern in Glean for agents that need to query structured tabular data originally stored in Excel/CSV?
- Has anyone successfully:
- Used Power Automate (or another automation/orchestration tool) as a bridge between Excel/CSV and an agent?
- Connected an agent to a SQL database (directly or via a small service) for this kind of lookup logic?
- Any examples, patterns, or architectural diagrams showing how you:
- Keep the data “live” but reliable
- Avoid size/format limitations of
.xlsx/.csv - Let the agent’s Analyse step query beyond the first 100 rows in a deterministic way
I’d really appreciate any guidance, patterns, or war stories from others who have solved similar “agent + Excel/CSV/SQL data” challenges.
Thanks in advance!