SELVO

Technology

How SELVO is built and what it runs on. This page is for IT, infrastructure, and procurement teams that need to evaluate the platform before deployment.

Overview

SELVO is a self-hosted AI platform. The entire stack runs inside your private network on a single GPU host via Docker Compose. There is no cluster, no Kubernetes, and no cloud dependency at runtime.

The platform routes each request to the appropriate engine: content questions go through hybrid retrieval, analytical questions through a sandboxed code path, exact lookups straight to the data store. Generative tasks route directly to the LLM.

Hardware requirements are modest. The recommended configuration fits in a single small server rack and uses commodity components your IT team can procure from any supplier.

Architecture

Four containers run on a single GPU host. All inter-service communication stays inside the internal Docker network. No external calls are made during normal operation.

Docker Compose · Internal Network
Frontend
Next.js · :3000

Browser-based UI for document upload, querying, and admin dashboard.

Backend
FastAPI · :8001

Query routing, classification, hybrid search, analytics engine, GDPR, governance.

ChromaDB
Vectors · :8000

Vector embeddings store. Reconstructible from upload ledger if corrupted.

vLLM Inference
GPU · OpenAI-compatible

Local LLM inference. Model-agnostic, swap via .env config per deployment.

Air-gapped Self-healing GPU-accelerated 4 uvicorn workers

Query types

Every request is classified and routed to the right engine. The wrong approach gives wrong answers, so SELVO picks the routing automatically based on what the question actually needs.

Reading documents

"What does the contract say about liability?"

Hybrid vector and BM25 search with cross-encoder reranking, followed by LLM synthesis. Every answer cites the exact source page.

Doing math

"Average revenue by region for Q4"

LLM generates Pandas code against your data schema. Sandboxed execution returns deterministic results from your actual files.

Finding records

"Show all rows where status is Active"

Direct DataFrame filtering and targeted record search. No LLM hallucination on structured data.

Summarizing

"Give me an executive summary"

Multi-sheet LLM synthesis across entire documents. Produces structured overviews with key findings.

Counting and listing

"How many documents are uploaded?"

Collection-level metadata queries answered directly from the document ledger.

Working across languages

Queries in any language

LLM-based cross-language reranking when embeddings cannot handle the query language.

Drafting and writing

"Draft a report, summarize findings, rewrite a clause"

Open-ended generative tasks routed directly to the LLM. No retrieval overhead when the task is purely generative.

Hardware requirements

The entire stack runs on a single GPU host. No cluster required. Two reference configurations below: a minimum that runs the full stack at smaller scale, and a recommended configuration sized for production use in the 200 to 500 user range.

Minimum Requirements
CPU
8 cores / 16 threads
RAM
32 GB
GPU
24 GB VRAM (NVIDIA)
Storage
256 GB NVMe SSD
Recommended
CPU
8 cores / 16 threads
RAM
32 GB DDR4/DDR5
GPU
RTX 5090 / RTX 6000
Storage
512 GB NVMe SSD