Sustainable AI: Efficient Knowledge Access with Agentic RAG

Sustainable AI: Efficient Knowledge Access with Agentic RAG

Secure multi-SLM RAG system with intelligent routing and role-based access control. Reduces AI costs by 90% and energy use by 87% through domain-aware model selection.

Less Compute. More Impact. A production-ready RAG system that cuts costs by 90% and energy use by 87% through intelligent model routing.

🚀 Try Live Demo 💻 GitHub

🎥 Demo


The Problem

LLMs are expensive, slow, and often overkill. Organizations need AI that’s secure, efficient, and domain-aware without the massive compute costs and access control headaches.

Key challenges:

  • 💰 High GPU/TPU costs and unpredictable expenses
  • 🔒 Data access governance across clearance levels
  • 🎓 Generic models lack domain expertise
  • ⚠️ Vendor lock-in and compliance gaps

Our Solution

A multi-SLM agentic RAG system that intelligently routes queries to specialized models based on domain and clearance level.

How It Works

Smart Planner → analyzes request & clearance → Domain-Specific SLM → retrieves from authorized indexes → Grounded Response

System Architecture

Key Features

🔐 Security First

  • Role-based access control at the data layer
  • Index segmentation by clearance level
  • PII redaction and audit logging

💰 Cost Optimized

  • Right-sized models for each domain
  • Cheap planner + specialized SLMs
  • 90% cost reduction vs traditional LLMs

⚡ Performance

  • 81% faster response times
  • Domain-specific accuracy
  • Intelligent caching and fallbacks

Impact

Sustainability Comparison

MetricGemma 2B (SLM)Gemini Pro (LLM)Improvement
Cost/1M tokens~$0.50~$5.0090% less
Response Time150ms800ms81% faster
Energy/Query0.02 kWh0.15 kWh87% reduction
CO₂ Emissions8g60g87% lower

At scale: 1M queries = 12 tons CO₂ saved compared to traditional LLM approaches.


Tech Stack

Deploy Framework AI

FastAPI • Vertex AI Vector Search • Gemini & Gemma • GCP IAM • BigQuery


What’s Next

Expanding with multi-tenant capabilities, policy-as-code governance, continuous evaluation pipelines, and edge deployment for offline scenarios.


← Back to Projects