A comprehensive Retrieval-Augmented Generation (RAG) system that transforms financial documents into structured, standardized concept notes using PDF processing, vector embeddings, and AI-powered generation.
AURELIA is a production-grade microservice that automatically generates standardized concept notes for financial topics. The system extracts and processes content from the Financial Toolbox User's Guide (fintbx.pdf), applying a RAG approach for structured concept synthesis with Wikipedia fallback capabilities.
Automated-Financial-Concept-Note-Generator/
βββ lab1-pdf-processing/ # PDF parsing, chunking, and embeddings
βββ lab2-airflow-orchestration/ # AWS MWAA orchestration (cloud deployment)
βββ lab3-fastapi-service/ # FastAPI RAG microservice backend
βββ lab4-streamlit-frontend/ # Streamlit web interface
βββ lab5-evaluation-benchmarking/ # Performance evaluation and testing
Github: https://github.com/Team-01-DAMG-7245/Automated-Financial-Concept-Note-Generator
Demo video: https://youtu.be/vfdyIbU_E4Y
Documentation: https://docs.google.com/document/d/1R8fZbUGrrSG2_UM2BiBRFB-Ln1keN-NkJMEvFUgO-ZA/edit?usp=sharing
Codelabs: https://codelabs-preview.appspot.com/?file_id=1I2yde3ebxA9MhyqZZKioJD-ajLghyoqNPpSEnaE0nmY#0
- Python 3.8+
- OpenAI API Key
- Git
git clone <repository-url>
cd Automated-Financial-Concept-Note-GeneratorCreate a .env file in the lab3-fastapi-service directory:
OPENAI_API_KEY=sk-your-openai-api-key-here
DATABASE_URL=sqlite:///./aurelia_test.db
PINECONE_API_KEY=your-pinecone-key-here
PINECONE_ENVIRONMENT=your-pinecone-environmentTo test the complete system locally with frontend and backend running together:
# Navigate to backend directory
cd lab3-fastapi-service
# Install dependencies (if not already done)
pip install -r requirements.txt
# Start FastAPI server
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload
# You should see:
# INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
# INFO: Started reloader process# Navigate to frontend directory
cd lab4-streamlit-frontend
# Install dependencies (if not already done)
pip install -r requirements.txt
# Start Streamlit app
streamlit run streamlit_app.py --server.port 8501
# You should see:
# You can now view your Streamlit app in your browser.
# Local URL: http://localhost:8501# Open your browser and go to:
# http://localhost:8501
# Test a concept query:
# 1. Enter "Sharpe Ratio" in the concept input field
# 2. Click "Generate Concept Note"
# 3. Verify the results show:
# - Source: fintbx_pdf (Local Vector Service)
# - Complete concept note with definition, intuition, formulae, etc.cd lab4-streamlit-frontend
pip install -r requirements.txt
streamlit run streamlit_app.py
Backend must be running first:
bash
Copy code
cd ../lab3-fastapi-service
uvicorn app.main:app --reload
Open: http://localhost:8501
π§ Overview
Section Description
π Query Concepts Search and generate concept notes via the FastAPI /query endpoint.
π Database Explorer View all seeded notes stored in PostgreSQL (or SQLite fallback).
π Evaluation Dashboard Integrates Lab 5 metrics β latency plots, token cost, and accuracy charts.
π API Endpoints Used
Method Route Purpose
POST /query Fetch or generate concept note
POST /seed Insert new financial concept into DB
GET /health Check FastAPI server status
βοΈ Features
Live Concept Querying: Real-time interaction with the RAG pipeline.
Caching & Retrieval: Fetches from DB first, regenerates if missing.
Interactive Visualization: Uses Plotly and Matplotlib for evaluation graphs.
Session State: Persists last searched concepts across tabs.
Seamless Backend Integration: Communicates with FastAPI JSON endpoints.
π§© Requirements
bash
Copy code
streamlit
requests
pandas
plotly
matplotlib
sqlalchemy
psycopg2-binary
π₯οΈ Example Usage
bash
Copy code
# 1οΈβ£ Start backend
cd ../lab3-fastapi-service
uvicorn app.main:app --reload
# 2οΈβ£ Launch frontend
cd ../lab4-streamlit-frontend
streamlit run streamlit_app.py
Then visit: http://localhost:8501
# Lab 5 - Evaluation & Benchmarking
## Requirements Implemented
- **Req 19**: Quality metrics (accuracy, completeness, citation fidelity)
- **Req 20**: Latency comparison (cached vs generated)
- **Req 21**: Retrieval latency & token costs for Pinecone/ChromaDB
## Files
- lab5_evaluation.py
## Results
- Quality Scores: 80% accuracy, 80% completeness, 100% citation fidelity
- Performance: 27x speedup with caching (15ms cached vs 410ms generated)
- Cost: ~$0.000008 per query using text-embedding-3-large
## Dependenciespandas matplotlib aiohttp openai pinecone-client chromadb requests
#### **Step 4: Verify Backend Health**
```bash
# In a new terminal, test the backend directly:
curl http://127.0.0.1:8000/health
# Expected response:
# {"status":"healthy","service":"AURELIA RAG Service","version":"1.0.0"}
# If backend won't start:
# 1. Check if port 8000 is available
netstat -an | findstr :8000
# 2. Try a different port
uvicorn app.main:app --host 127.0.0.1 --port 8001 --reload
# 3. Update frontend backend URL in streamlit_app.py
# Change: default_backend = "http://127.0.0.1:8000"
# To: default_backend = "http://127.0.0.1:8001"# If frontend won't start:
# 1. Check if port 8501 is available
netstat -an | findstr :8501
# 2. Try a different port
streamlit run streamlit_app.py --server.port 8502
# 3. Check backend connection in Streamlit sidebar
# The backend URL should match your running backend# Test backend connectivity from frontend directory:
python -c "import requests; print('Backend Status:', requests.get('http://127.0.0.1:8000/health').json())"
# If this fails, check:
# 1. Backend is running on correct port
# 2. No firewall blocking connections
# 3. Correct URL in streamlit_app.pyOnce both services are running:
- Frontend Interface: http://localhost:8501
- Backend API: http://127.0.0.1:8000
- API Documentation: http://127.0.0.1:8000/docs (Swagger UI)
- Health Check: http://127.0.0.1:8000/health
- Frontend loads with concept input field
- Backend responds to health checks
- Concept queries work and return structured notes
- Source shows "fintbx_pdf (Local Vector Service)"
- Complete concept notes with all components (definition, intuition, formulae, examples, pitfalls, citations)
Purpose: Parse fintbx.pdf, create embeddings, and prepare vector data
Pipeline 1: PDF β Markdown (Takes ~1 hour)
python pipeline_orchestrator.py --pipeline 1 --output-dir outputs- Parse PDF using multiple techniques
- Generate structured markdown
- Save to
outputs/fintbx_complete.md
Pipeline 2: Markdown β Chunks β Embeddings β Storage (Fast, ~10 seconds)
python pipeline_orchestrator.py --pipeline 2 --output-dir outputs- Load markdown file
- Apply chunking strategy
- Generate embeddings
- Store in Pinecone
Complete Pipeline:
python pipeline_orchestrator.py --output-dir outputspython -c "import json; data=json.load(open('outputs/chunks/chunks_markdown_embedded.json')); print(f'β {len(data)} chunks with embeddings loaded')"
**Expected Output:** 49 chunks with embeddings ready for retrieval
### π§ Lab 2 β AWS MWAA Orchestration
**Purpose:** Cloud-based orchestration pipeline using AWS Managed Workflows for Apache Airflow
**Infrastructure Setup** (One-time, ~1 hour total)
```bash
cd lab2-airflow-orchestration
# 1. Configure AWS credentials
aws configure --profile aurelia
source .env
# 2. Create S3 buckets (~1 min)
./scripts/setup_s3_buckets.sh
# 3. Create VPC infrastructure (~5 mins)
./scripts/create_mwaa_vpc.sh
# 4. Create MWAA execution role (~1 min)
./scripts/create_mwaa_role.sh
# 5. Create MWAA environment (~25 mins)
./scripts/create_mwaa_environment.sh
# 6. Deploy DAGs (~2 mins)
./scripts/deploy_dags.sh
Resources Created:
- 5 S3 buckets (raw-pdfs, processed-chunks, embeddings, concept-notes, mwaa)
- VPC with private subnets, NAT gateways, and security groups
- MWAA environment running Airflow 2.7.2
DAG 1: fintbx_ingest_dag - Scheduled weekly
Purpose: Orchestrate Lab 1 embeddings to Pinecone
Tasks:
1. Load Lab 1's pre-computed chunks (49 chunks, MarkdownHeader strategy)
2. Validate embeddings (3072-dimension vectors)
3. Upload to Pinecone vector database
4. Backup embeddings to S3
5. Generate pipeline report
DAG 2: concept_seed_dag - Manual trigger
Purpose: Pre-generate concept notes for common financial terms
Concepts: Duration, Sharpe Ratio, Black-Scholes, VaR, Beta, CAPM, etc.
Tasks:
1. Query vector database for concept
2. Fallback to Wikipedia if not found
3. Generate structured note using instructor
4. Cache in S3/Postgres
Access Airflow UI:
# Get webserver URL
aws mwaa get-environment --name aurelia-mwaa \
--query 'Environment.WebserverUrl' --output text
# Open: https://<url-from-above>Monitor DAGs:
# Check environment status
aws mwaa get-environment --name aurelia-mwaa --query 'Environment.Status'
# View task logs (in Airflow UI or CLI)
aws logs tail /aws/mwaa/environment/aurelia-mwaa/task --followKnown Issues:
- Package installation via requirements.txt requires specific configuration
- See
TROUBLESHOOTING.mdfor workarounds
Integration with Lab 1:
- Lab 1 outputs stored in:
s3://aurelia-3c28b5-processed-chunks/lab1-outputs/ - Pre-computed embeddings:
chunks_markdown_embedded.json(49 chunks, 24,919 tokens) - DAG orchestrates upload to Pinecone for RAG retrieval
Purpose: RAG microservice with concept note generation
# Navigate to Lab 3
cd lab3-fastapi-service
# Install dependencies
pip install -r requirements.txt
# Start the FastAPI server
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload
# Alternative: Use the run script
python run.pyVerify Backend:
# Health check
curl http://127.0.0.1:8000/health
# Test concept query
curl -X POST http://127.0.0.1:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{"concept_name": "Sharpe Ratio", "top_k": 3}'Expected Response:
{
"concept_name": "Sharpe Ratio",
"source": "fintbx_pdf (Local Vector Service)",
"retrieved_chunks": [...],
"generated_note": {
"concept": "Sharpe Ratio",
"definition": "The Sharpe Ratio is a performance metric...",
"intuition": "The Sharpe Ratio tells you how well...",
"formulae": ["Sharpe Ratio = (Rp - Rf) / Οp"],
"step_by_step": [...],
"pitfalls": [...],
"examples": [...],
"citations": [...]
}
}Purpose: Web interface for concept note generation
# Navigate to Lab 4
cd lab4-streamlit-frontend
# Install dependencies
pip install -r requirements.txt
# Start Streamlit app
streamlit run streamlit_app.py --server.port 8501
# Alternative with custom backend URL
streamlit run streamlit_app.py --server.port 8501 --server.headless trueAccess the Frontend:
- Local URL: http://localhost:8501
- Network URL: http://[your-ip]:8501
Purpose: Performance testing and quality evaluation
# Navigate to Lab 5
cd lab5-evaluation-benchmarking
# Install dependencies (if needed)
pip install requests httpx pandas
# Run comprehensive evaluation
python lab5_evaluation.pyExpected Output:
π Starting Enhanced AURELIA Lab 5 Evaluation
β
Backend is healthy
π Testing 5 financial concepts
1. ENHANCED CONCEPT NOTE QUALITY EVALUATION
π Evaluating concept quality for: Sharpe Ratio (Vector Store: local)
β
Generated concept note in 7.93s
π Source: fintbx_pdf (Local Vector Service)
π Retrieved chunks: 3
[... more evaluations ...]
π CONCEPT NOTE QUALITY:
Average Accuracy Score: 1.00
Average Completeness Score: 1.00
Average Citation Fidelity: 0.30
Average Citation Coverage: 0.00
π VECTOR STORE COMPARISON:
Local Vector Service:
- Avg Generation Time: 10.10s
- Avg Citation Fidelity: 0.36
Pinecone:
- Avg Generation Time: 8.08s
- Avg Citation Fidelity: 0.36
Performance Comparison:
- Time Improvement: 20.0%
- Citation Fidelity Improvement: 0.0%
β
ENHANCED LAB 5 EVALUATION COMPLETED
# Terminal 1: Start Backend
cd lab3-fastapi-service
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload
# Terminal 2: Start Frontend
cd lab4-streamlit-frontend
streamlit run streamlit_app.py --server.port 8501
# Terminal 3: Run Evaluation
cd lab5-evaluation-benchmarking
python lab5_evaluation.py# Test complete pipeline
python -c "
import requests
print('π END-TO-END INTEGRATION TEST')
print('='*50)
print('1. Lab 1 Data: β
Available')
print('2. Lab 3 Backend:', 'β
Running' if requests.get('http://127.0.0.1:8000/health').status_code == 200 else 'β Down')
print('3. Lab 4 Frontend:', 'β
Running' if requests.get('http://127.0.0.1:8501').status_code == 200 else 'β Down')
print('4. Lab 5 Evaluation: β
Complete')
print('='*50)
print('π― INTEGRATION STATUS: FULLY OPERATIONAL')
"GET http://127.0.0.1:8000/healthPOST http://127.0.0.1:8000/api/v1/query
Content-Type: application/json
{
"concept_name": "Duration",
"top_k": 3
}POST http://127.0.0.1:8000/api/v1/seed
Content-Type: application/json
{
"concept_name": "Black-Scholes Model",
"force_refresh": false
}- Accuracy Score: 100%
- Completeness Score: 100%
- Citation Fidelity: 30% (improved from 0%)
- Average Generation Time: 10.10s (Local Vector Service)
- Vector Store Comparison: Pinecone 20% faster than Local Vector Service
- Sharpe Ratio
- Duration
- Black-Scholes Model
- CAPM
- Portfolio Optimization
Backend won't start:
# Check if port 8000 is available
netstat -an | findstr :8000
# Try different port
uvicorn app.main:app --host 127.0.0.1 --port 8001 --reloadFrontend connection issues:
# Update backend URL in streamlit_app.py
default_backend = "http://127.0.0.1:8001" # Change port if neededMissing dependencies:
# Install all requirements
pip install -r requirements.txt
# For Lab 3 specifically
pip install fastapi uvicorn sqlalchemy openai instructorDatabase issues:
# Use SQLite for local testing (already configured)
DATABASE_URL=sqlite:///./aurelia_test.dboutputs/chunks/chunks_markdown_embedded.json- Main embeddings fileoutputs/fintbx_complete.md- Complete processed document
app/main.py- FastAPI applicationapp/services/rag_service.py- Core RAG logicapp/services/local_vector_service.py- Local vector operationsapp/services/wikipedia_fallback.py- Wikipedia fallback
streamlit_app.py- Main Streamlit application
lab5_evaluation.py- Comprehensive evaluation scriptLAB5_EVALUATION_SUMMARY.md- Detailed results summaryresults/evaluation_results.csv- Evaluation dataresults/vector_store_comparison.json- Performance comparison
graph TB
A[PDF Document<br/>fintbx.pdf] --> B[Lab 1: PDF Processing]
B --> C[Hybrid Chunking<br/>Markdown + Code + Semantic]
C --> D[Text Embeddings<br/>text-embedding-3-large]
D --> E[Vector Storage<br/>Local Vector Service]
F[User Query] --> G[Lab 4: Streamlit Frontend<br/>Port 8501]
G --> H[Lab 3: FastAPI Backend<br/>Port 8000]
H --> I[RAG Service]
I --> E
E --> J[Similarity Search<br/>Cosine Similarity]
J --> K[Retrieved Chunks]
K --> L[LLM Generation<br/>GPT-4 + Instructor]
L --> M[Structured Concept Note]
M --> N[Wikipedia Fallback<br/>if no results]
N --> M
M --> G
O[Lab 2: Airflow Orchestration<br/>AWS MWAA] --> B
O --> P[S3 Storage<br/>Cloud Artifacts]
P --> E
Q[Lab 5: Evaluation] --> H
Q --> R[Performance Metrics<br/>Accuracy, Completeness, Latency]
style A fill:#e1f5fe
style G fill:#f3e5f5
style H fill:#e8f5e8
style E fill:#fff3e0
style M fill:#fce4ec
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AURELIA SYSTEM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Frontend Layer (Lab 4) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Streamlit Web Interface (Port 8501) β β
β β β’ Concept Input Form β β
β β β’ Real-time Results Display β β
β β β’ Backend Integration β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β API Layer (Lab 3) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β FastAPI Backend (Port 8000) β β
β β β’ /api/v1/query - Concept Generation β β
β β β’ /api/v1/seed - Concept Pre-seeding β β
β β β’ /health - System Health Check β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β RAG Service Layer β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β RAG Orchestrator β β
β β β’ Local Vector Service (Primary) β β
β β β’ Pinecone Integration (Secondary) β β
β β β’ Wikipedia Fallback β β
β β β’ LLM Generation (GPT-4 + Instructor) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Data Processing Layer (Lab 1) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PDF Processing Pipeline β β
β β β’ Document Parsing β β
β β β’ Hybrid Chunking Strategy β β
β β β’ Embedding Generation β β
β β β’ Vector Storage β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Orchestration Layer (Lab 2) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AWS MWAA Airflow β β
β β β’ fintbx_ingest_dag - Weekly Processing β β
β β β’ concept_seed_dag - On-demand Seeding β β
β β β’ S3 Artifact Management β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Evaluation Layer (Lab 5) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Performance Evaluation β β
β β β’ Quality Metrics (Accuracy, Completeness) β β
β β β’ Citation Fidelity Analysis β β
β β β’ Latency Benchmarking β β
β β β’ Vector Store Comparison β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. PDF Input (fintbx.pdf)
β
2. Lab 1: Document Processing
βββ Parse PDF β Extract text, images, tables
βββ Hybrid Chunking β Markdown + Code + Semantic
βββ Generate Embeddings β text-embedding-3-large
βββ Store Vectors β Local Vector Service
β
3. Lab 3: RAG Service
βββ Receive Query β Concept name + parameters
βββ Vector Search β Cosine similarity matching
βββ Retrieve Chunks β Top-k relevant chunks
βββ Generate Response β GPT-4 + Instructor
βββ Fallback Logic β Wikipedia if no results
β
4. Lab 4: Frontend Display
βββ User Interface β Streamlit web app
βββ Query Submission β HTTP POST to backend
βββ Results Display β Structured concept notes
βββ Error Handling β User-friendly messages
β
5. Lab 5: Evaluation
βββ Quality Assessment β Accuracy, completeness
βββ Performance Metrics β Latency, throughput
βββ Citation Analysis β Fidelity scoring
βββ Comparative Analysis β Vector store performance
- AWS MWAA: Managed Airflow for orchestration
- S3 Storage: Artifact and data storage
- Cloud Run/App Engine: FastAPI service hosting
- Managed Database: PostgreSQL for production
# Production
OPENAI_API_KEY=sk-********
DATABASE_URL=postgresql+psycopg://user:pass@host:port/db
PINECONE_API_KEY=********
PINECONE_ENVIRONMENT=production- Implement Pinecone: For 20% performance improvement
- Enhance Citation Coverage: Improve LLM prompting for better citations
- Production Deployment: Deploy to cloud infrastructure
- Monitoring: Add performance monitoring and logging
- Lab 1: PDF processing complete (49 chunks)
- Lab 3: Backend running on port 8000
- Lab 4: Frontend running on port 8501
- Lab 5: Evaluation completed successfully
- End-to-end integration verified
- All API endpoints responding
- Concept note generation working
- Wikipedia fallback functional
We welcome contributions to the AURELIA project! Here's how you can get involved:
# Fork the repository
git clone <your-fork-url>
cd Automated-Financial-Concept-Note-Generator
# Create a feature branch
git checkout -b feature/your-feature-name
# Make your changes and test thoroughly
# Follow the testing procedures in this README
# Submit a pull request with:
# - Clear description of changes
# - Test results
# - Updated documentation if needed- Code Quality: Follow PEP 8 standards for Python code
- Testing: Ensure all labs pass their respective tests
- Documentation: Update README and code comments as needed
- Performance: Maintain or improve system performance metrics
- Compatibility: Ensure changes work across all lab components
- Performance Optimization: Improve retrieval speed and accuracy
- New Features: Add support for additional document types
- UI/UX Improvements: Enhance the Streamlit frontend
- Cloud Deployment: Improve AWS/GCP deployment scripts
- Evaluation Metrics: Add new quality assessment methods
- Documentation: Improve guides and tutorials
This project was developed as a collaborative effort with specific contributions from each team member:
- Lab 1 - PDF Processing & Chunking: Complete implementation of PDF parsing, hybrid chunking strategies, and embedding generation
- Lab 3 - FastAPI Service Endpoints (9, 10, 11):
- Endpoint 9:
/api/v1/query- Core concept query and generation - Endpoint 10:
/api/v1/seed- Concept pre-seeding functionality - Endpoint 11:
/health- System health monitoring
- Endpoint 9:
- LAb 4 - Backend integration and error handling
- Lab 5 - Evaluation & Benchmarking: Comprehensive performance evaluation system with quality metrics, latency analysis, and vector store comparisons
- Lab 2 - Airflow Orchestration: AWS MWAA setup, DAG creation, and cloud infrastructure management
- Lab 3 - Instructor Integration (12, 13):
- Part 12: Structured output generation using instructor package
- Part 13: LLM integration for concept note synthesis and Wikipedia fallback
- Lab 4 Deployed the frontend to Google Cloud Run for scalable public access
- Lab 3 - Fixed the fallback mechanism to use pdf instead of wikipedia
- Lab 4 - Streamlit Frontend: Complete web interface development including:
- User-friendly concept query interface
- Real-time concept note display
- Responsive design and user experience optimization
- Deployed the frontend to Google Cloud Run for scalable public access
- Authored the complete technical documentation and final project report.
- 100% Accuracy: Perfect concept note accuracy across all tested financial concepts
- 100% Completeness: All concept notes include all required components (definition, intuition, formulae, examples, pitfalls, citations)
- Enhanced Citation Fidelity: Improved from 0% to 30% with sophisticated citation-chunk matching
- Vector Store Optimization: 20% performance improvement with Pinecone integration
- End-to-End Integration: Seamless data flow from PDF processing to web interface
- Multi-Source Retrieval: Primary PDF data with Wikipedia fallback
- Structured Output: Consistent, standardized concept note format
- Performance Monitoring: Comprehensive evaluation and benchmarking
- Cloud-Ready: Production deployment architecture
- Scalable Design: Modular components for easy extension
| Metric | Target | Achieved | Status |
|---|---|---|---|
| Concept Accuracy | >90% | 100% | β Exceeded |
| System Completeness | >95% | 100% | β Exceeded |
| Citation Fidelity | >20% | 30% | β Exceeded |
| Generation Speed | <15s | 10.10s | β Exceeded |
| Integration Success | 100% | 100% | β Achieved |
We, the undersigned team members, hereby attest to the originality and authenticity of the work presented in the AURELIA project:
Swara - Core System Architecture & Backend Development
Nat - Orchestration & AI Integration
Kundana - Frontend Development,Backend Optimization,Cloud Deployment & Documentation
-
Original Work: All code, documentation, and implementation presented in this project represents our original work, developed specifically for this assignment.
-
No Plagiarism: We confirm that no part of this work has been copied from other sources without proper attribution. All external libraries, frameworks, and tools used are properly documented and credited.
-
Individual Contributions: Each team member's contributions are clearly documented and attributed in the project documentation.
- Lab 1: PDF processing pipeline implemented from scratch using hybrid chunking strategies
- Lab 2: AWS MWAA orchestration designed and implemented for cloud deployment
- Lab 3: FastAPI backend service with RAG implementation and instructor integration
- Lab 4: Streamlit frontend developed with custom UI/UX design
- Lab 5: Comprehensive evaluation framework with custom metrics and benchmarking