Sustainable AI: Efficient Knowledge Access with Agentic RAG
| Links: Try it | GitHub | Demo Video

Secure multi-SLM RAG system with intelligent routing and role-based access control. Reduces AI costs by 90% and energy use by 87% through domain-aware model selection.
Less Compute. More Impact. A production-ready RAG system that cuts costs by 90% and energy use by 87% through intelligent model routing.
🎥 Demo
The Problem
LLMs are expensive, slow, and often overkill. Organizations need AI that’s secure, efficient, and domain-aware without the massive compute costs and access control headaches.
Key challenges:
- 💰 High GPU/TPU costs and unpredictable expenses
- 🔒 Data access governance across clearance levels
- 🎓 Generic models lack domain expertise
- ⚠️ Vendor lock-in and compliance gaps
Our Solution
A multi-SLM agentic RAG system that intelligently routes queries to specialized models based on domain and clearance level.
How It Works
Smart Planner → analyzes request & clearance → Domain-Specific SLM → retrieves from authorized indexes → Grounded Response

Key Features
🔐 Security First
- Role-based access control at the data layer
- Index segmentation by clearance level
- PII redaction and audit logging
💰 Cost Optimized
- Right-sized models for each domain
- Cheap planner + specialized SLMs
- 90% cost reduction vs traditional LLMs
⚡ Performance
- 81% faster response times
- Domain-specific accuracy
- Intelligent caching and fallbacks
Impact

| Metric | Gemma 2B (SLM) | Gemini Pro (LLM) | Improvement |
|---|---|---|---|
| Cost/1M tokens | ~$0.50 | ~$5.00 | 90% less |
| Response Time | 150ms | 800ms | 81% faster |
| Energy/Query | 0.02 kWh | 0.15 kWh | 87% reduction |
| CO₂ Emissions | 8g | 60g | 87% lower |
At scale: 1M queries = 12 tons CO₂ saved compared to traditional LLM approaches.
Tech Stack
FastAPI • Vertex AI Vector Search • Gemini & Gemma • GCP IAM • BigQuery
What’s Next
Expanding with multi-tenant capabilities, policy-as-code governance, continuous evaluation pipelines, and edge deployment for offline scenarios.