Query Agent Creation

Hi all,

I’m looking for some help and design advice around a query agent that needs to work with data stored in an Excel/CSV file.

Context

The agent follows a simple three-step pattern:

Trigger – user submits a query with some parameters.
Analyse – the agent looks up values in a tabular dataset (originally an .xlsx, now a .csv).
Respond – it combines those lookups with its reasoning and returns an answer.

I’m trying to make this pattern reliable for structured lookups against a relatively large dataset.

The main problem

The agent is struggling to reliably read and query the file-based data source. Concretely:

It often defaults to only looking at the first 100 rows of the file (this is visible in its own “thinking/running” commentary, where it explicitly says it will consider only the first 100 rows).
I initially suspected file size limits, so I transformed the original .xlsx into a .csv that is now under 1 MB, but the agent behavior (stopping at ~100 rows) persists.
Earlier, with the .xlsx, I also saw:
- Inconsistent or failed reads from the file
- Apparent size/complexity limits when the workbook got larger
- Difficulty ensuring that the agent uses the latest version of the file without manual intervention

I’m currently using a fast LLM (Sonnet 4.5) for the agent, in case model choice or context limits are relevant to this behavior.

Overall, I don’t yet have a robust pattern for “agent reads structured tabular data from a file and uses it for deterministic lookups beyond the first 100 rows.”

What I’ve tried so far

So far, I’ve experimented with:

Pointing the agent directly at the .xlsx stored in a document repository
Converting the .xlsx to a smaller .csv (< 1 MB) to avoid size/complexity issues
Reducing the number of columns/sheets and simplifying the original workbook
Treating the file as a generic reference document (unstructured) rather than a structured table, but this doesn’t give the deterministic, row-level lookups I need

These approaches haven’t given me a stable, scalable solution. The agent still tends to cap itself at ~100 rows of data.

What I’m considering next

I’m exploring whether there’s a better architectural pattern for this, for example:

Power Automate (or similar) as a middle layer
- Use a flow to:
  - Periodically read the .xlsx/.csv file
  - Transform it into a more agent-friendly format (e.g., normalized CSV/JSON or a tiny API endpoint)
  - Potentially expose a small API or drop a processed file somewhere the agent can reliably access
- Then have the agent call that processed data source in the Analyse step, instead of directly parsing the original file.
Moving the data into a SQL database
- One-time or scheduled ingestion of the file into a SQL table
- Have the agent:
  - Call out to a lightweight query service / API on top of the database, or
  - Use any built-in SQL connector pattern (if one exists) so that the agent can run parameterized queries (e.g., SELECT ... WHERE key = ?) instead of scanning file rows.
Other recommended patterns
- Any best practices around:
  - “Registering” structured datasets (from Excel/CSV or otherwise) so agents can query them reliably
  - Handling updates when the underlying file data changes
  - Managing size/row limits and performance so the agent isn’t implicitly capped at the first N rows

What I’m asking the community

Is there a recommended pattern in Glean for agents that need to query structured tabular data originally stored in Excel/CSV?
Has anyone successfully:
- Used Power Automate (or another automation/orchestration tool) as a bridge between Excel/CSV and an agent?
- Connected an agent to a SQL database (directly or via a small service) for this kind of lookup logic?
Any examples, patterns, or architectural diagrams showing how you:
- Keep the data “live” but reliable
- Avoid size/format limitations of .xlsx/.csv
- Let the agent’s Analyse step query beyond the first 100 rows in a deterministic way

I’d really appreciate any guidance, patterns, or war stories from others who have solved similar “agent + Excel/CSV/SQL data” challenges.

Thanks in advance!

Find more posts tagged with

Glean Agent

Feedback and Ideas

Welcome!

It looks like you're new here. Sign in or register to see or post comments, upvote, and react.

Getting Started

Events

Help Center

glean.com