
AI & RAG • Security First
Enterprise Knowledge Retriever
Stop searching, start knowing. A secure, session-isolated RAG engine that brings all your scattered documentation into one chat interface with exact citations.
The Challenge
Legal and HR teams at TechCorp Global were drowning in documentation. Reviewing a single Master Services Agreement (MSA) took upwards of 4 hours. Critical data was fragmented across SharePoint, Drive, and local servers, making keyword search ineffective. Furthermore, strict privacy regulations prohibited the use of public AI models like ChatGPT for sensitive company data.
The Solution
We built Novus Engine, a secure RAG (Retrieval-Augmented Generation) system designed for enterprise compliance. It ingests documents from multiple sources, indexes them with parent-child chunking for context, and answers queries with strict citations.
Parent-Child Chunking
We index full "Parent" documents alongside smaller chunks. This ensures the LLM understands the full context of a clause, not just an isolated fragment.
Session Isolation
Every upload is tagged with a unique `sessionId`. Retrieval is strictly filtered by this ID, preventing data leakage between users.
Exact Citations
Answers are generated with links to the exact source paragraph, allowing lawyers to verify advice instantly.
System Architecture
A high-security RAG pipeline allowing for private, accurate document interrogation.
1. Multi-Source Ingestion
Connectors for GDrive, SharePoint, and PDF uploads normalize text and extract metadata.
2. Hybrid Indexing
Documents are split into semantic chunks and stored in Pinecone (dense vectors) and a keyword index for maximum retrieval accuracy.
3. RAG Orchestration
LangChain orchestrates the retrieval. We retrieve top k chunks, re-rank them, and inject them into the GPT-4o context window.
Tech Stack
- LangChain
- OpenAI GPT-4o
- Claude 3.5 Sonnet
- Pinecone
- ChromaDB
- Python
- FastAPI
