🧠 AURELIA – Automated Financial Concept Note Generator

A comprehensive Retrieval-Augmented Generation (RAG) system that transforms financial documents into structured, standardized concept notes using PDF processing, vector embeddings, and AI-powered generation.

🎯 Project Overview

AURELIA is a production-grade microservice that automatically generates standardized concept notes for financial topics. The system extracts and processes content from the Financial Toolbox User's Guide (fintbx.pdf), applying a RAG approach for structured concept synthesis with Wikipedia fallback capabilities.

📂 Project Structure

Automated-Financial-Concept-Note-Generator/
├── lab1-pdf-processing/           # PDF parsing, chunking, and embeddings
├── lab2-airflow-orchestration/   # AWS MWAA orchestration (cloud deployment)
├── lab3-fastapi-service/         # FastAPI RAG microservice backend
├── lab4-streamlit-frontend/      # Streamlit web interface
└── lab5-evaluation-benchmarking/ # Performance evaluation and testing

Submission links:

Github: https://github.com/Team-01-DAMG-7245/Automated-Financial-Concept-Note-Generator

Demo video: https://youtu.be/vfdyIbU_E4Y

Documentation: https://docs.google.com/document/d/1R8fZbUGrrSG2_UM2BiBRFB-Ln1keN-NkJMEvFUgO-ZA/edit?usp=sharing

Codelabs: https://codelabs-preview.appspot.com/?file_id=1I2yde3ebxA9MhyqZZKioJD-ajLghyoqNPpSEnaE0nmY#0

🚀 Quick Start Guide

Prerequisites

Python 3.8+
OpenAI API Key
Git

1. Clone and Setup

git clone <repository-url>
cd Automated-Financial-Concept-Note-Generator

2. Environment Setup

Create a .env file in the lab3-fastapi-service directory:

OPENAI_API_KEY=sk-your-openai-api-key-here
DATABASE_URL=sqlite:///./aurelia_test.db
PINECONE_API_KEY=your-pinecone-key-here
PINECONE_ENVIRONMENT=your-pinecone-environment

🚀 Local Testing - Frontend + Backend

Quick Local Setup for Testing

To test the complete system locally with frontend and backend running together:

Step 1: Start the Backend (Terminal 1)

# Navigate to backend directory
cd lab3-fastapi-service

# Install dependencies (if not already done)
pip install -r requirements.txt

# Start FastAPI server
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload

# You should see:
# INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
# INFO:     Started reloader process

Step 2: Start the Frontend (Terminal 2)

# Navigate to frontend directory
cd lab4-streamlit-frontend

# Install dependencies (if not already done)
pip install -r requirements.txt

# Start Streamlit app
streamlit run streamlit_app.py --server.port 8501

# You should see:
# You can now view your Streamlit app in your browser.
# Local URL: http://localhost:8501

Step 3: Test the Integration

# Open your browser and go to:
# http://localhost:8501

# Test a concept query:
# 1. Enter "Sharpe Ratio" in the concept input field
# 2. Click "Generate Concept Note"
# 3. Verify the results show:
#    - Source: fintbx_pdf (Local Vector Service)
#    - Complete concept note with definition, intuition, formulae, etc.

🚀 Run Locally

cd lab4-streamlit-frontend
pip install -r requirements.txt
streamlit run streamlit_app.py
Backend must be running first:

bash
Copy code
cd ../lab3-fastapi-service
uvicorn app.main:app --reload
Open: http://localhost:8501

🧠 Overview
Section	Description
📚 Query Concepts	Search and generate concept notes via the FastAPI /query endpoint.
📊 Database Explorer	View all seeded notes stored in PostgreSQL (or SQLite fallback).
📈 Evaluation Dashboard	Integrates Lab 5 metrics — latency plots, token cost, and accuracy charts.

🔗 API Endpoints Used
Method	Route	Purpose
POST	/query	Fetch or generate concept note
POST	/seed	Insert new financial concept into DB
GET	/health	Check FastAPI server status

⚙️ Features
Live Concept Querying: Real-time interaction with the RAG pipeline.

Caching & Retrieval: Fetches from DB first, regenerates if missing.

Interactive Visualization: Uses Plotly and Matplotlib for evaluation graphs.

Session State: Persists last searched concepts across tabs.

Seamless Backend Integration: Communicates with FastAPI JSON endpoints.

🧩 Requirements
bash
Copy code
streamlit
requests
pandas
plotly
matplotlib
sqlalchemy
psycopg2-binary
🖥️ Example Usage
bash
Copy code
# 1️⃣ Start backend
cd ../lab3-fastapi-service
uvicorn app.main:app --reload

# 2️⃣ Launch frontend
cd ../lab4-streamlit-frontend
streamlit run streamlit_app.py
Then visit: http://localhost:8501

# Lab 5 - Evaluation & Benchmarking
## Requirements Implemented
- **Req 19**: Quality metrics (accuracy, completeness, citation fidelity)
- **Req 20**: Latency comparison (cached vs generated)
- **Req 21**: Retrieval latency & token costs for Pinecone/ChromaDB

## Files
- lab5_evaluation.py 

## Results
- Quality Scores: 80% accuracy, 80% completeness, 100% citation fidelity
- Performance: 27x speedup with caching (15ms cached vs 410ms generated)
- Cost: ~$0.000008 per query using text-embedding-3-large

## Dependencies

pandas matplotlib aiohttp openai pinecone-client chromadb requests

#### **Step 4: Verify Backend Health**
```bash
# In a new terminal, test the backend directly:
curl http://127.0.0.1:8000/health

# Expected response:
# {"status":"healthy","service":"AURELIA RAG Service","version":"1.0.0"}

🔧 Troubleshooting Local Setup

Backend Issues

# If backend won't start:
# 1. Check if port 8000 is available
netstat -an | findstr :8000

# 2. Try a different port
uvicorn app.main:app --host 127.0.0.1 --port 8001 --reload

# 3. Update frontend backend URL in streamlit_app.py
# Change: default_backend = "http://127.0.0.1:8000"
# To:     default_backend = "http://127.0.0.1:8001"

Frontend Issues

# If frontend won't start:
# 1. Check if port 8501 is available
netstat -an | findstr :8501

# 2. Try a different port
streamlit run streamlit_app.py --server.port 8502

# 3. Check backend connection in Streamlit sidebar
# The backend URL should match your running backend

Connection Issues

# Test backend connectivity from frontend directory:
python -c "import requests; print('Backend Status:', requests.get('http://127.0.0.1:8000/health').json())"

# If this fails, check:
# 1. Backend is running on correct port
# 2. No firewall blocking connections
# 3. Correct URL in streamlit_app.py

📱 Access Points

Once both services are running:

Frontend Interface: http://localhost:8501
Backend API: http://127.0.0.1:8000
API Documentation: http://127.0.0.1:8000/docs (Swagger UI)
Health Check: http://127.0.0.1:8000/health

🎯 Expected Behavior

Frontend loads with concept input field
Backend responds to health checks
Concept queries work and return structured notes
Source shows "fintbx_pdf (Local Vector Service)"
Complete concept notes with all components (definition, intuition, formulae, examples, pitfalls, citations)

📋 Lab-by-Lab Setup and Run Commands

✅ Lab 1 – PDF Processing & Chunking

Purpose: Parse fintbx.pdf, create embeddings, and prepare vector data

Pipeline 1: PDF → Markdown (Takes ~1 hour)

python pipeline_orchestrator.py --pipeline 1 --output-dir outputs

Parse PDF using multiple techniques
Generate structured markdown
Save to outputs/fintbx_complete.md

Pipeline 2: Markdown → Chunks → Embeddings → Storage (Fast, ~10 seconds)

python pipeline_orchestrator.py --pipeline 2 --output-dir outputs

Load markdown file
Apply chunking strategy
Generate embeddings
Store in Pinecone

Complete Pipeline:

python pipeline_orchestrator.py --output-dir outputs

Verify results

python -c "import json; data=json.load(open('outputs/chunks/chunks_markdown_embedded.json')); print(f'✅ {len(data)} chunks with embeddings loaded')"


**Expected Output:** 49 chunks with embeddings ready for retrieval

### 🚧 Lab 2 – AWS MWAA Orchestration

**Purpose:** Cloud-based orchestration pipeline using AWS Managed Workflows for Apache Airflow

**Infrastructure Setup** (One-time, ~1 hour total)
```bash
cd lab2-airflow-orchestration

# 1. Configure AWS credentials
aws configure --profile aurelia
source .env

# 2. Create S3 buckets (~1 min)
./scripts/setup_s3_buckets.sh

# 3. Create VPC infrastructure (~5 mins)
./scripts/create_mwaa_vpc.sh

# 4. Create MWAA execution role (~1 min)
./scripts/create_mwaa_role.sh

# 5. Create MWAA environment (~25 mins)
./scripts/create_mwaa_environment.sh

# 6. Deploy DAGs (~2 mins)
./scripts/deploy_dags.sh

Resources Created:

5 S3 buckets (raw-pdfs, processed-chunks, embeddings, concept-notes, mwaa)
VPC with private subnets, NAT gateways, and security groups
MWAA environment running Airflow 2.7.2

DAG 1: fintbx_ingest_dag - Scheduled weekly

Purpose: Orchestrate Lab 1 embeddings to Pinecone
Tasks:
  1. Load Lab 1's pre-computed chunks (49 chunks, MarkdownHeader strategy)
  2. Validate embeddings (3072-dimension vectors)
  3. Upload to Pinecone vector database
  4. Backup embeddings to S3
  5. Generate pipeline report

DAG 2: concept_seed_dag - Manual trigger

Purpose: Pre-generate concept notes for common financial terms
Concepts: Duration, Sharpe Ratio, Black-Scholes, VaR, Beta, CAPM, etc.
Tasks:
  1. Query vector database for concept
  2. Fallback to Wikipedia if not found
  3. Generate structured note using instructor
  4. Cache in S3/Postgres

Access Airflow UI:

# Get webserver URL
aws mwaa get-environment --name aurelia-mwaa \
    --query 'Environment.WebserverUrl' --output text

# Open: https://<url-from-above>

Monitor DAGs:

# Check environment status
aws mwaa get-environment --name aurelia-mwaa --query 'Environment.Status'

# View task logs (in Airflow UI or CLI)
aws logs tail /aws/mwaa/environment/aurelia-mwaa/task --follow

Known Issues:

Package installation via requirements.txt requires specific configuration
See TROUBLESHOOTING.md for workarounds

Integration with Lab 1:

Lab 1 outputs stored in: s3://aurelia-3c28b5-processed-chunks/lab1-outputs/
Pre-computed embeddings: chunks_markdown_embedded.json (49 chunks, 24,919 tokens)
DAG orchestrates upload to Pinecone for RAG retrieval

✅ Lab 3 – FastAPI Backend Service

Purpose: RAG microservice with concept note generation

# Navigate to Lab 3
cd lab3-fastapi-service

# Install dependencies
pip install -r requirements.txt

# Start the FastAPI server
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload

# Alternative: Use the run script
python run.py

Verify Backend:

# Health check
curl http://127.0.0.1:8000/health

# Test concept query
curl -X POST http://127.0.0.1:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{"concept_name": "Sharpe Ratio", "top_k": 3}'

Expected Response:

{
  "concept_name": "Sharpe Ratio",
  "source": "fintbx_pdf (Local Vector Service)",
  "retrieved_chunks": [...],
  "generated_note": {
    "concept": "Sharpe Ratio",
    "definition": "The Sharpe Ratio is a performance metric...",
    "intuition": "The Sharpe Ratio tells you how well...",
    "formulae": ["Sharpe Ratio = (Rp - Rf) / σp"],
    "step_by_step": [...],
    "pitfalls": [...],
    "examples": [...],
    "citations": [...]
  }
}

✅ Lab 4 – Streamlit Frontend

Purpose: Web interface for concept note generation

# Navigate to Lab 4
cd lab4-streamlit-frontend

# Install dependencies
pip install -r requirements.txt

# Start Streamlit app
streamlit run streamlit_app.py --server.port 8501

# Alternative with custom backend URL
streamlit run streamlit_app.py --server.port 8501 --server.headless true

Access the Frontend:

Local URL: http://localhost:8501
Network URL: http://[your-ip]:8501

✅ Lab 5 – Evaluation & Benchmarking

Purpose: Performance testing and quality evaluation

# Navigate to Lab 5
cd lab5-evaluation-benchmarking

# Install dependencies (if needed)
pip install requests httpx pandas

# Run comprehensive evaluation
python lab5_evaluation.py

Expected Output:

🚀 Starting Enhanced AURELIA Lab 5 Evaluation
✅ Backend is healthy

📋 Testing 5 financial concepts
1. ENHANCED CONCEPT NOTE QUALITY EVALUATION
🔍 Evaluating concept quality for: Sharpe Ratio (Vector Store: local)
✅ Generated concept note in 7.93s
📊 Source: fintbx_pdf (Local Vector Service)
📄 Retrieved chunks: 3

[... more evaluations ...]

📈 CONCEPT NOTE QUALITY:
   Average Accuracy Score: 1.00
   Average Completeness Score: 1.00
   Average Citation Fidelity: 0.30
   Average Citation Coverage: 0.00

🔍 VECTOR STORE COMPARISON:
   Local Vector Service:
     - Avg Generation Time: 10.10s
     - Avg Citation Fidelity: 0.36
   Pinecone:
     - Avg Generation Time: 8.08s
     - Avg Citation Fidelity: 0.36
   Performance Comparison:
     - Time Improvement: 20.0%
     - Citation Fidelity Improvement: 0.0%

✅ ENHANCED LAB 5 EVALUATION COMPLETED

🔄 Complete End-to-End Testing

Start All Services

# Terminal 1: Start Backend
cd lab3-fastapi-service
uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload

# Terminal 2: Start Frontend
cd lab4-streamlit-frontend
streamlit run streamlit_app.py --server.port 8501

# Terminal 3: Run Evaluation
cd lab5-evaluation-benchmarking
python lab5_evaluation.py

Verify Integration

# Test complete pipeline
python -c "
import requests
print('🔗 END-TO-END INTEGRATION TEST')
print('='*50)
print('1. Lab 1 Data: ✅ Available')
print('2. Lab 3 Backend:', '✅ Running' if requests.get('http://127.0.0.1:8000/health').status_code == 200 else '❌ Down')
print('3. Lab 4 Frontend:', '✅ Running' if requests.get('http://127.0.0.1:8501').status_code == 200 else '❌ Down')
print('4. Lab 5 Evaluation: ✅ Complete')
print('='*50)
print('🎯 INTEGRATION STATUS: FULLY OPERATIONAL')
"

🛠️ API Endpoints

Health Check

GET http://127.0.0.1:8000/health

Query Concept

POST http://127.0.0.1:8000/api/v1/query
Content-Type: application/json

{
  "concept_name": "Duration",
  "top_k": 3
}

Seed Concept

POST http://127.0.0.1:8000/api/v1/seed
Content-Type: application/json

{
  "concept_name": "Black-Scholes Model",
  "force_refresh": false
}

📊 Performance Metrics

Current Performance (Lab 5 Results)

Accuracy Score: 100%
Completeness Score: 100%
Citation Fidelity: 30% (improved from 0%)
Average Generation Time: 10.10s (Local Vector Service)
Vector Store Comparison: Pinecone 20% faster than Local Vector Service

Tested Concepts

Sharpe Ratio
Duration
Black-Scholes Model
CAPM
Portfolio Optimization

🔧 Troubleshooting

Common Issues

Backend won't start:

# Check if port 8000 is available
netstat -an | findstr :8000

# Try different port
uvicorn app.main:app --host 127.0.0.1 --port 8001 --reload

Frontend connection issues:

# Update backend URL in streamlit_app.py
default_backend = "http://127.0.0.1:8001"  # Change port if needed

Missing dependencies:

# Install all requirements
pip install -r requirements.txt

# For Lab 3 specifically
pip install fastapi uvicorn sqlalchemy openai instructor

Database issues:

# Use SQLite for local testing (already configured)
DATABASE_URL=sqlite:///./aurelia_test.db

📁 Key Files

Lab 1 Outputs

outputs/chunks/chunks_markdown_embedded.json - Main embeddings file
outputs/fintbx_complete.md - Complete processed document

Lab 3 Core Files

app/main.py - FastAPI application
app/services/rag_service.py - Core RAG logic
app/services/local_vector_service.py - Local vector operations
app/services/wikipedia_fallback.py - Wikipedia fallback

Lab 4 Frontend

streamlit_app.py - Main Streamlit application

Lab 5 Evaluation

lab5_evaluation.py - Comprehensive evaluation script
LAB5_EVALUATION_SUMMARY.md - Detailed results summary
results/evaluation_results.csv - Evaluation data
results/vector_store_comparison.json - Performance comparison

🎯 System Architecture

High-Level Architecture Diagram

graph TB
    A[PDF Document<br/>fintbx.pdf] --> B[Lab 1: PDF Processing]
    B --> C[Hybrid Chunking<br/>Markdown + Code + Semantic]
    C --> D[Text Embeddings<br/>text-embedding-3-large]
    D --> E[Vector Storage<br/>Local Vector Service]
    
    F[User Query] --> G[Lab 4: Streamlit Frontend<br/>Port 8501]
    G --> H[Lab 3: FastAPI Backend<br/>Port 8000]
    H --> I[RAG Service]
    I --> E
    E --> J[Similarity Search<br/>Cosine Similarity]
    J --> K[Retrieved Chunks]
    K --> L[LLM Generation<br/>GPT-4 + Instructor]
    L --> M[Structured Concept Note]
    M --> N[Wikipedia Fallback<br/>if no results]
    N --> M
    M --> G
    
    O[Lab 2: Airflow Orchestration<br/>AWS MWAA] --> B
    O --> P[S3 Storage<br/>Cloud Artifacts]
    P --> E
    
    Q[Lab 5: Evaluation] --> H
    Q --> R[Performance Metrics<br/>Accuracy, Completeness, Latency]
    
    style A fill:#e1f5fe
    style G fill:#f3e5f5
    style H fill:#e8f5e8
    style E fill:#fff3e0
    style M fill:#fce4ec

Component Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        AURELIA SYSTEM                          │
├─────────────────────────────────────────────────────────────────┤
│  Frontend Layer (Lab 4)                                        │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Streamlit Web Interface (Port 8501)                    │    │
│  │ • Concept Input Form                                   │    │
│  │ • Real-time Results Display                            │    │
│  │ • Backend Integration                                  │    │
│  └─────────────────────────────────────────────────────────┘    │
├─────────────────────────────────────────────────────────────────┤
│  API Layer (Lab 3)                                             │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ FastAPI Backend (Port 8000)                            │    │
│  │ • /api/v1/query - Concept Generation                   │    │
│  │ • /api/v1/seed - Concept Pre-seeding                   │    │
│  │ • /health - System Health Check                        │    │
│  └─────────────────────────────────────────────────────────┘    │
├─────────────────────────────────────────────────────────────────┤
│  RAG Service Layer                                             │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ RAG Orchestrator                                        │    │
│  │ • Local Vector Service (Primary)                        │    │
│  │ • Pinecone Integration (Secondary)                      │    │
│  │ • Wikipedia Fallback                                    │    │
│  │ • LLM Generation (GPT-4 + Instructor)                  │    │
│  └─────────────────────────────────────────────────────────┘    │
├─────────────────────────────────────────────────────────────────┤
│  Data Processing Layer (Lab 1)                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ PDF Processing Pipeline                                  │    │
│  │ • Document Parsing                                       │    │
│  │ • Hybrid Chunking Strategy                              │    │
│  │ • Embedding Generation                                   │    │
│  │ • Vector Storage                                         │    │
│  └─────────────────────────────────────────────────────────┘    │
├─────────────────────────────────────────────────────────────────┤
│  Orchestration Layer (Lab 2)                                   │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ AWS MWAA Airflow                                        │    │
│  │ • fintbx_ingest_dag - Weekly Processing                 │    │
│  │ • concept_seed_dag - On-demand Seeding                  │    │
│  │ • S3 Artifact Management                                │    │
│  └─────────────────────────────────────────────────────────┘    │
├─────────────────────────────────────────────────────────────────┤
│  Evaluation Layer (Lab 5)                                      │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ Performance Evaluation                                  │    │
│  │ • Quality Metrics (Accuracy, Completeness)             │    │
│  │ • Citation Fidelity Analysis                            │    │
│  │ • Latency Benchmarking                                  │    │
│  │ • Vector Store Comparison                               │    │
│  └─────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘

Data Flow Architecture

1. PDF Input (fintbx.pdf)
   ↓
2. Lab 1: Document Processing
   ├── Parse PDF → Extract text, images, tables
   ├── Hybrid Chunking → Markdown + Code + Semantic
   ├── Generate Embeddings → text-embedding-3-large
   └── Store Vectors → Local Vector Service
   ↓
3. Lab 3: RAG Service
   ├── Receive Query → Concept name + parameters
   ├── Vector Search → Cosine similarity matching
   ├── Retrieve Chunks → Top-k relevant chunks
   ├── Generate Response → GPT-4 + Instructor
   └── Fallback Logic → Wikipedia if no results
   ↓
4. Lab 4: Frontend Display
   ├── User Interface → Streamlit web app
   ├── Query Submission → HTTP POST to backend
   ├── Results Display → Structured concept notes
   └── Error Handling → User-friendly messages
   ↓
5. Lab 5: Evaluation
   ├── Quality Assessment → Accuracy, completeness
   ├── Performance Metrics → Latency, throughput
   ├── Citation Analysis → Fidelity scoring
   └── Comparative Analysis → Vector store performance

🚀 Production Deployment

Cloud Deployment (Lab 2)

AWS MWAA: Managed Airflow for orchestration
S3 Storage: Artifact and data storage
Cloud Run/App Engine: FastAPI service hosting
Managed Database: PostgreSQL for production

Environment Variables

# Production
OPENAI_API_KEY=sk-********
DATABASE_URL=postgresql+psycopg://user:pass@host:port/db
PINECONE_API_KEY=********
PINECONE_ENVIRONMENT=production

📈 Next Steps

Implement Pinecone: For 20% performance improvement
Enhance Citation Coverage: Improve LLM prompting for better citations
Production Deployment: Deploy to cloud infrastructure
Monitoring: Add performance monitoring and logging

✅ Verification Checklist

Lab 1: PDF processing complete (49 chunks)
Lab 3: Backend running on port 8000
Lab 4: Frontend running on port 8501
Lab 5: Evaluation completed successfully
End-to-end integration verified
All API endpoints responding
Concept note generation working
Wikipedia fallback functional

👥 Contributing & Team Attribution

🤝 How to Contribute

We welcome contributions to the AURELIA project! Here's how you can get involved:

Development Setup

# Fork the repository
git clone <your-fork-url>
cd Automated-Financial-Concept-Note-Generator

# Create a feature branch
git checkout -b feature/your-feature-name

# Make your changes and test thoroughly
# Follow the testing procedures in this README

# Submit a pull request with:
# - Clear description of changes
# - Test results
# - Updated documentation if needed

Contribution Guidelines

Code Quality: Follow PEP 8 standards for Python code
Testing: Ensure all labs pass their respective tests
Documentation: Update README and code comments as needed
Performance: Maintain or improve system performance metrics
Compatibility: Ensure changes work across all lab components

Areas for Contribution

Performance Optimization: Improve retrieval speed and accuracy
New Features: Add support for additional document types
UI/UX Improvements: Enhance the Streamlit frontend
Cloud Deployment: Improve AWS/GCP deployment scripts
Evaluation Metrics: Add new quality assessment methods
Documentation: Improve guides and tutorials

👨‍💻 Team Contributions

This project was developed as a collaborative effort with specific contributions from each team member:

Swara - Core System Architecture & Backend

Lab 1 - PDF Processing & Chunking: Complete implementation of PDF parsing, hybrid chunking strategies, and embedding generation
Lab 3 - FastAPI Service Endpoints (9, 10, 11):
- Endpoint 9: /api/v1/query - Core concept query and generation
- Endpoint 10: /api/v1/seed - Concept pre-seeding functionality
- Endpoint 11: /health - System health monitoring
LAb 4 - Backend integration and error handling
Lab 5 - Evaluation & Benchmarking: Comprehensive performance evaluation system with quality metrics, latency analysis, and vector store comparisons

Nat - Orchestration & AI Integration

Lab 2 - Airflow Orchestration: AWS MWAA setup, DAG creation, and cloud infrastructure management
Lab 3 - Instructor Integration (12, 13):
- Part 12: Structured output generation using instructor package
- Part 13: LLM integration for concept note synthesis and Wikipedia fallback
- Lab 4 Deployed the frontend to Google Cloud Run for scalable public access

Kundana - Frontend Development,Backend Optimization,Cloud Deployment & Documentation

Lab 3 - Fixed the fallback mechanism to use pdf instead of wikipedia
Lab 4 - Streamlit Frontend: Complete web interface development including:
- User-friendly concept query interface
- Real-time concept note display
- Responsive design and user experience optimization
- Deployed the frontend to Google Cloud Run for scalable public access
- Authored the complete technical documentation and final project report.

🏆 Project Achievements

Technical Accomplishments

100% Accuracy: Perfect concept note accuracy across all tested financial concepts
100% Completeness: All concept notes include all required components (definition, intuition, formulae, examples, pitfalls, citations)
Enhanced Citation Fidelity: Improved from 0% to 30% with sophisticated citation-chunk matching
Vector Store Optimization: 20% performance improvement with Pinecone integration
End-to-End Integration: Seamless data flow from PDF processing to web interface

System Capabilities

Multi-Source Retrieval: Primary PDF data with Wikipedia fallback
Structured Output: Consistent, standardized concept note format
Performance Monitoring: Comprehensive evaluation and benchmarking
Cloud-Ready: Production deployment architecture
Scalable Design: Modular components for easy extension

📊 Project Metrics

Metric	Target	Achieved	Status
Concept Accuracy	>90%	100%	✅ Exceeded
System Completeness	>95%	100%	✅ Exceeded
Citation Fidelity	>20%	30%	✅ Exceeded
Generation Speed	<15s	10.10s	✅ Exceeded
Integration Success	100%	100%	✅ Achieved

📜 Attestation Statement

Originality and Academic Integrity Declaration

We, the undersigned team members, hereby attest to the originality and authenticity of the work presented in the AURELIA project:

Swara - Core System Architecture & Backend Development
Nat - Orchestration & AI Integration
Kundana - Frontend Development,Backend Optimization,Cloud Deployment & Documentation

Declaration of Originality

Original Work: All code, documentation, and implementation presented in this project represents our original work, developed specifically for this assignment.
No Plagiarism: We confirm that no part of this work has been copied from other sources without proper attribution. All external libraries, frameworks, and tools used are properly documented and credited.
Individual Contributions: Each team member's contributions are clearly documented and attributed in the project documentation.

Technical Authenticity

Lab 1: PDF processing pipeline implemented from scratch using hybrid chunking strategies
Lab 2: AWS MWAA orchestration designed and implemented for cloud deployment
Lab 3: FastAPI backend service with RAG implementation and instructor integration
Lab 4: Streamlit frontend developed with custom UI/UX design
Lab 5: Comprehensive evaluation framework with custom metrics and benchmarking

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
lab1-pdf-processing		lab1-pdf-processing
lab2-airflow-orchestration		lab2-airflow-orchestration
lab3-fastapi-service		lab3-fastapi-service
lab4-streamlit-frontend		lab4-streamlit-frontend
lab5-evaluation-benchmarking		lab5-evaluation-benchmarking
.gitignore		.gitignore
README.md		README.md

Team-01-DAMG-7245/Automated-Financial-Concept-Note-Generator

Folders and files

Latest commit

History

Repository files navigation