How to Search Spreadsheets With AI on Your Own Server

February 6, 2026 · Rade Petrovic

How to Search Spreadsheets With AI on Your Own Server

Someone on your team built a spreadsheet two years ago. It has the answer to the question your manager just asked. You know this because you vaguely remember seeing it. You don't know which file it's in, which tab, or which column. You have 40 minutes before the meeting.

So you start opening files. Budget_2025_final_v3.xlsx. Headcount_Q3_updated.xlsx. That one with no useful name that Sarah shared in January. You scan tab names, scroll through columns, realize you picked the wrong sheet, close it, try another.

Cloud AI tools can help - upload the spreadsheet to ChatGPT or Google's AI, type your question, get an answer. But those spreadsheets contain salary data, financial projections, client contracts, HR records. For many organizations, uploading them to a third-party server is the same problem as uploading scanned contracts - the data leaves your control.

So the question becomes: can you search spreadsheets with AI without uploading them anywhere? Including the messy ones with six tabs and no documentation?

Why Spreadsheets Are Hard to Search

Spreadsheets aren't documents. A PDF or Word file is mostly text, you can chunk it into passages, index it, and search it. A spreadsheet is structured data: rows, columns, types, relationships between sheets. That structure is exactly what makes spreadsheets useful, and exactly what makes them difficult for search systems to handle.

Here's what makes this hard:

Context lives in column headers, not in the data itself - A cell containing "42000" means nothing until you know it's in the "Annual Salary" column, in the "Engineering" department row, on the "2025 Headcount" sheet. Search systems that treat spreadsheet content as flat text lose this structure.

Multi-sheet workbooks multiply the complexity - A real-world HR file doesn't have one sheet. It has "Employees," "Departments," "Benefits," "Performance Reviews," and maybe a "Summary" tab someone created for a meeting. When you ask "What's the average salary in the Marketing department?", the system needs to figure out that the answer requires the "Employees" sheet - not "Departments," not "Summary," not "Benefits."

Sheets are often related but not explicitly linked - The "Employees" sheet has a Department column. The "Departments" sheet has Department and Budget columns. A human knows these connect through the Department field. A naive search system treats each sheet as a separate, unrelated document.

How Local AI Searches Spreadsheets

First, Understand What's Actually in There

When you upload a spreadsheet, the system doesn't just dump cell values into a search index and hope for the best. It reads each sheet and builds a profile:

- What columns exist, and what type of data they contain (numbers, text, dates)

- How many rows each sheet has

- What the unique values look like in categorical columns (departments, statuses, types)

- Basic statistics for numeric columns (averages, ranges, totals)

- Which categories appear together and how often

This analysis creates a profile for every sheet. Think of it as the system reading each tab and understanding: "This sheet is about employees - it has names, departments, salaries, and hire dates. That sheet is about departments - it has department names and budgets."

The profiles get chunked and indexed the same way document text does - so later, when you search, the system can match your question against what each sheet actually contains.

Then, Pick the Right Sheet

This is where things get interesting - and where most attempts at spreadsheet search fall apart.

You ask "What's the average salary in Marketing?" The workbook has six tabs. The system picks "Benefits" instead of "Employees." Now you get a confident, well-formatted, completely wrong answer. Worse than no answer, because you might not catch it.

So the sheet selection has to actually work:

If you name the sheet, it uses that sheet - Ask about "the Employees sheet" and there's no ambiguity.

If you say "all sheets" or "across all tabs," it combines them - The system recognizes this intent in both English ("all sheets," "every tab," "across all") and other languages. When combining, it doesn't blindly stack everything together - it filters out tiny metadata sheets (like a "Notes" or "Summary" tab with two rows) and tries to intelligently join related sheets through shared columns.

If you don't specify, it figures it out - The system scores each sheet against your question using three methods: how well the sheet's column names match the words in your query, how semantically similar your question is to the sheet's content profile, and whether any column names appear directly in your question. The scores get combined, and the highest-scoring sheet wins.

There's a safety net for close calls - If two sheets score nearly the same, the system breaks the tie by picking the one with more data rows - on the logic that the larger sheet is more likely to be the primary data source, not a lookup table or summary.

Finally, Compute - Don't Guess

Once the system has the right sheet, it doesn't ask the AI to eyeball the numbers and write a paragraph. It generates actual code - a filtered average, a group-by, a count with conditions - and runs it against the real data. "Average salary in Marketing" becomes a computation on the actual salary column for rows where the department is Marketing. The number you get back is calculated, not estimated.

And all of it, the spreadsheet data, the generated code, the execution, it stays on your server.

The Tricky Parts

Two details that seem minor but significantly affect real-world accuracy:

Smart joining across sheets - When you ask a question that spans multiple sheets - "What's the training budget for departments with more than 50 employees?" - one sheet has employee counts, another has budgets. The system looks for columns that appear in both sheets with similar values (like a "Department" column where at least 60% of the values overlap between the two sheets) and joins them automatically through a left join. If no reliable join key exists, it keeps the sheets separate rather than producing garbage by forcing a merge.

Filtering out noise sheets - Real workbooks have metadata tabs: "Instructions," "Schema," "Test Cases," a "Notes" sheet with three rows. The system identifies these by checking for very few rows, single-column layouts, or known metadata sheet names, and excludes them from analysis. They're still indexed for search - but they don't pollute the data when the system picks a sheet for computation.

What Hardware Do You Need?

The spreadsheet processing pipeline is lightweight compared to document search. Parsing Excel files, analyzing columns, and running pandas computations are all CPU operations that complete in seconds, even for workbooks with tens of thousands of rows.

The heavier component is the AI model that understands your question and generates the computation. That benefits from a GPU but runs on CPU as well - the answers just take longer without one.

Minimum: 32 GB of memory, 8 or more processor cores. An NVIDIA GPU helps with response speed but isn't required for the spreadsheet pipeline to work.

The main constraint isn't hardware - it's workbook size. An Excel file with 100,000 rows across multiple sheets processes fine. A file with millions of rows might need more memory, but that's an unusual case for most organizations.

How Selvo Lens Handles This

Selvo Lens runs the full retrieval pipeline locally, and spreadsheet handling is built in - not bolted on. You upload an Excel file or CSV the same way you'd upload any document, and the system handles the rest.

What this means in practice

Every sheet gets analyzed separately - Upload a workbook with 6 tabs and each one gets its own profile - column types, statistics, category breakdowns, row-level data. These profiles are stored with sheet-level metadata (sheet name, index, columns, row count) so the system knows exactly which sheet each piece of information came from.

Sheet selection happens automatically - You don't need to tell the system which tab to look at. It figures out the right sheet based on your question. If you do specify - "from the Employees sheet" - it respects that directly.

Multi-sheet questions work - Ask something that spans tabs, and the system attempts to join them through shared columns. It checks for at least 60% value overlap before joining, so it won't merge unrelated data. CSV files work too with automatic encoding detection. Files saved in UTF-8, Latin-1, Windows-1252, or ISO-8859-2 all get read correctly without manual encoding selection.

Limitations worth knowing - The system works best with structured tabular data - rows and columns with consistent types. Heavily formatted spreadsheets (merged cells, embedded charts, color-coded data without column labels) lose their visual structure during processing. Pivot tables get flattened. And very large workbooks (hundreds of thousands of rows) process correctly but take longer for the initial analysis.

What This Actually Changes

The honest pitch for AI spreadsheet search isn't "never open Excel again." You're still going to open Excel. You're still going to build pivot tables and conditional formatting and vlookups. Spreadsheets aren't going anywhere.

What changes is the retrieval problem. Right now, when someone on your team needs a number that's buried in one of 40 workbooks on a shared drive, they either know which file it's in, or they don't get the answer. They open three files, check the wrong tabs, give up, and ask the person who originally built the spreadsheet. That person is in a meeting.

A local AI that can read your spreadsheets, pick the right sheet, and compute the answer from actual data, without uploading anything to a third party, cuts that loop short. It doesn't replace the spreadsheet. It makes the spreadsheet accessible to people who didn't build it.