A Graph-based Retrieval Augmented Generation (GraphRAG) implementation using Ollama or OpenAI API support LLMs and Neo4j Graph database. This project processes documents, extracts entities and relationships using LLMs, and stores the knowledge graph in Neo4j for advanced question-answering capabilities.
here is demo in neo4j
- Document processing and chunking
- LLM-based knowledge graph extraction
- Fix for handling string
targetattributes during graph document preprocessing - Neo4j integration for graph storage
- Vector embeddings for semantic search
- Entity extraction capabilities
- Progress tracking for long-running operations
- Batch processing with progress indicators
- Modern Next.js frontend for interactive visualization and management
If you don't have a Neo4j environment, you can easily set up your own using Docker:
- Create a
docker-compose.ymlfile in the project root with the following content:
version: '3'
services:
neo4j:
image: neo4j:5.13.0
container_name: neo4j-graphrag
ports:
- "7474:7474" # HTTP
- "7687:7687" # Bolt
volumes:
- ./neo4j/data:/data
- ./neo4j/logs:/logs
- ./neo4j/import:/import
- ./neo4j/plugins:/plugins
environment:
- NEO4J_AUTH=neo4j/your_password # Change this password
- NEO4J_dbms_memory_heap_initial__size=1G
- NEO4J_dbms_memory_heap_max__size=2G
- NEO4J_dbms_memory_pagecache_size=1G
# Enable vector index support
- NEO4J_dbms_security_procedures_unrestricted=gds.*,apoc.*,vectorize.*
- NEO4J_dbms_security_procedures_allowlist=gds.*,apoc.*,vectorize.*
# Install Neo4j plugins (APOC, GDS, Vectorize)
- NEO4J_PLUGINS=["apoc", "graph-data-science", "n10s"]- Start the Neo4j container:
docker-compose up -d- Access the Neo4j Browser at http://localhost:7474 to verify the installation
- Python 3.8+
- Ollama with models:
- qwen2.5 (default LLM model)
- nomic-embed-text (for embeddings)
- OpenAI API support models:
- Neo4j database instance
- Required Python packages:
- langchain and langchain_experimental
- neo4j
- pydantic
- tqdm
- fastapi
- uvciorn
- pypdf
recomend using uv for package management
- Clone this repository:
git clone https://github.com/weijunjiang123/GraphRAG-with-Ollama.git
cd GraphRAG-with-Ollama- Install required packages with uv:
uv sync-
Set up Neo4j database instance (local or cloud),if you don't setup checkout this
-
Make sure Ollama is running with the required models:
ollama pull qwen2.5
ollama pull nomic-embed-textor you can config api key in .env
copy the .env.example to .env
cp .env.example .envModify the following variables in .env to match your environment:
checkout this for detail
Run the main script to process a document and build the knowledge graph:
uv run main.pyThe process includes:
- Loading and processing documents
- Converting documents to graph format
- Saving extracted graph documents
- Initializing Neo4j graph
- Adding graph documents to Neo4j
- Creating vector and fulltext indices
- Setting up entity extraction
The frontend is located in the web/ directory and built with Next.js. You can use it for interactive visualization and management of the knowledge graph.
- Enter the frontend directory:
cd web- Install dependencies:
npm install
# or
yarn install
# or
pnpm install
# or
bun install- Start the development server:
npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev- Open your browser and visit http://localhost:3000
- Build the frontend static files:
npm run build- Start the production server:
npm start- Or deploy the
.nextoroutdirectory to Vercel, Netlify, or any static hosting service.
This project implements a GraphRAG approach:
- Document Processing: Text documents are loaded and split into manageable chunks.
- Knowledge Graph Extraction: An LLM identifies entities and relationships from text.
- Graph Storage: The extracted knowledge is stored in Neo4j as a graph.
- Vector Embeddings: Document chunks are embedded for semantic search.
- Retrieval: When querying, the system can use both graph traversal and vector similarity.
- Entity Extraction: A separate chain extracts entities from arbitrary text.
GraphRAG-with-Llama-3.1/
โโโ .env.example # Example environment variables
โโโ .gitignore # Specifies intentionally untracked files that Git should ignore
โโโ main.py # Main script to run the application
โโโ README.md # Documentation for the project
โโโ requirements.txt # List of Python dependencies
โโโ src/ # Source code directory
โ โโโ config.py # Configuration settings
โ โโโ core/ # Core logic and modules
โ โ โโโ document_processor.py # Handles document loading and chunking
โ โ โโโ embeddings.py # Manages embeddings creation and vector index
โ โ โโโ entity_extraction.py# Extracts entities from text
โ โ โโโ graph_transformer.py# Converts documents to graph format
โ โ โโโ neo4j_manager.py # Manages Neo4j database operations
โ โ โโโ ... # Other core modules
โ โโโ utils.py # Utility functions
โโโ data/ # Directory for storing data files
โ โโโ ... # Documents to be processed
โโโ results/ # Directory for storing output files
โ โโโ ... # Extracted graph documents
โโโ web/ # Next.js ๅ็ซฏ้กน็ฎ็ฎๅฝ
โ โโโ app/ # Next.js ้กต้ขไธ็ปไปถ
โ โโโ public/ # ้ๆ่ตๆบ
โ โโโ package.json # ๅ็ซฏไพ่ตไธ่ๆฌ
โ โโโ ... # ๅ
ถไปๅ็ซฏ็ธๅ
ณๆไปถ
โโโ ... # Other directories and filesContributions are welcome! Please fork the repository and submit a pull request with your changes.
https://github.com/Coding-Crashkurse/GraphRAG-with-Llama-3.1


