SELVO
Technology
How SELVO is built and what it runs on. This page is for IT, infrastructure, and procurement teams that need to evaluate the platform before deployment.
Overview
SELVO is a self-hosted AI platform. The entire stack runs inside your private network on a single GPU host via Docker Compose. There is no cluster, no Kubernetes, and no cloud dependency at runtime.
The platform routes each request to the appropriate engine: content questions go through hybrid retrieval, analytical questions through a sandboxed code path, exact lookups straight to the data store. Generative tasks route directly to the LLM.
Hardware requirements are modest. The recommended configuration fits in a single small server rack and uses commodity components your IT team can procure from any supplier.
Architecture
Four containers run on a single GPU host. All inter-service communication stays inside the internal Docker network. No external calls are made during normal operation.
Browser-based UI for document upload, querying, and admin dashboard.
Query routing, classification, hybrid search, analytics engine, GDPR, governance.
Vector embeddings store. Reconstructible from upload ledger if corrupted.
Local LLM inference. Model-agnostic, swap via .env config per deployment.
Query types
Every request is classified and routed to the right engine. The wrong approach gives wrong answers, so SELVO picks the routing automatically based on what the question actually needs.
Reading documents
"What does the contract say about liability?"
Hybrid vector and BM25 search with cross-encoder reranking, followed by LLM synthesis. Every answer cites the exact source page.
Doing math
"Average revenue by region for Q4"
LLM generates Pandas code against your data schema. Sandboxed execution returns deterministic results from your actual files.
Finding records
"Show all rows where status is Active"
Direct DataFrame filtering and targeted record search. No LLM hallucination on structured data.
Summarizing
"Give me an executive summary"
Multi-sheet LLM synthesis across entire documents. Produces structured overviews with key findings.
Counting and listing
"How many documents are uploaded?"
Collection-level metadata queries answered directly from the document ledger.
Working across languages
Queries in any language
LLM-based cross-language reranking when embeddings cannot handle the query language.
Drafting and writing
"Draft a report, summarize findings, rewrite a clause"
Open-ended generative tasks routed directly to the LLM. No retrieval overhead when the task is purely generative.
Hardware requirements
The entire stack runs on a single GPU host. No cluster required. Two reference configurations below: a minimum that runs the full stack at smaller scale, and a recommended configuration sized for production use in the 200 to 500 user range.