parent
65c98367a5
commit
f5ac81cf2a
@ -0,0 +1,7 @@
|
||||
https://github.com/HKUDS/RAG-Anything
|
||||
|
||||
# 创建虚拟环境
|
||||
conda create -n raganything python=3.10
|
||||
|
||||
# 激活虚拟环境
|
||||
conda activate raganything
|
@ -1,842 +0,0 @@
|
||||
<div align="center">
|
||||
|
||||
<div style="margin: 20px 0;">
|
||||
<img src="./assets/logo.png" width="120" height="120" alt="RAG-Anything Logo" style="border-radius: 20px; box-shadow: 0 8px 32px rgba(0, 217, 255, 0.3);">
|
||||
</div>
|
||||
|
||||
# 🚀 RAG-Anything: All-in-One RAG System
|
||||
|
||||
<div align="center">
|
||||
<img src="https://readme-typing-svg.herokuapp.com?font=Orbitron&size=24&duration=3000&pause=1000&color=00D9FF¢er=true&vCenter=true&width=600&lines=Welcome+to+RAG-Anything;Next-Gen+Multimodal+RAG+System;Powered+by+Advanced+AI+Technology" alt="Typing Animation" />
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 15px; padding: 25px; text-align: center;">
|
||||
<p>
|
||||
<a href='https://github.com/HKUDS/RAG-Anything'><img src='https://img.shields.io/badge/🔥Project-Page-00d9ff?style=for-the-badge&logo=github&logoColor=white&labelColor=1a1a2e'></a>
|
||||
<a href='https://arxiv.org/abs/2410.05779'><img src='https://img.shields.io/badge/📄arXiv-2410.05779-ff6b6b?style=for-the-badge&logo=arxiv&logoColor=white&labelColor=1a1a2e'></a>
|
||||
<a href='https://github.com/HKUDS/LightRAG'><img src='https://img.shields.io/badge/⚡Based%20on-LightRAG-4ecdc4?style=for-the-badge&logo=lightning&logoColor=white&labelColor=1a1a2e'></a>
|
||||
</p>
|
||||
<p>
|
||||
<a href="https://github.com/HKUDS/RAG-Anything/stargazers"><img src='https://img.shields.io/github/stars/HKUDS/RAG-Anything?color=00d9ff&style=for-the-badge&logo=star&logoColor=white&labelColor=1a1a2e' /></a>
|
||||
<img src="https://img.shields.io/badge/🐍Python-3.9+-4ecdc4?style=for-the-badge&logo=python&logoColor=white&labelColor=1a1a2e">
|
||||
<a href="https://pypi.org/project/raganything/"><img src="https://img.shields.io/pypi/v/raganything.svg?style=for-the-badge&logo=pypi&logoColor=white&labelColor=1a1a2e&color=ff6b6b"></a>
|
||||
</p>
|
||||
<p>
|
||||
<a href="https://discord.gg/yF2MmDJyGJ"><img src="https://img.shields.io/badge/💬Discord-Community-7289da?style=for-the-badge&logo=discord&logoColor=white&labelColor=1a1a2e"></a>
|
||||
<a href="https://github.com/HKUDS/RAG-Anything/issues/7"><img src="https://img.shields.io/badge/💬WeChat-Group-07c160?style=for-the-badge&logo=wechat&logoColor=white&labelColor=1a1a2e"></a>
|
||||
</p>
|
||||
<p>
|
||||
<a href="README_zh.md"><img src="https://img.shields.io/badge/🇨🇳中文版-1a1a2e?style=for-the-badge"></a>
|
||||
<a href="README.md"><img src="https://img.shields.io/badge/🇺🇸English-1a1a2e?style=for-the-badge"></a>
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<div style="width: 100%; height: 2px; margin: 20px 0; background: linear-gradient(90deg, transparent, #00d9ff, transparent);"></div>
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<a href="#-quick-start" style="text-decoration: none;">
|
||||
<img src="https://img.shields.io/badge/Quick%20Start-Get%20Started%20Now-00d9ff?style=for-the-badge&logo=rocket&logoColor=white&labelColor=1a1a2e">
|
||||
</a>
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## 🎉 News
|
||||
- [X] [2025.07.04]🎯📢 RAGAnything now supports query with multimodal content, enabling enhanced retrieval-augmented generation with integrated text, images, tables, and equations processing.
|
||||
- [X] [2025.07.03]🎯📢 RAGAnything has reached 1K🌟 stars on GitHub! Thank you for your support and contributions.
|
||||
|
||||
---
|
||||
|
||||
## 🌟 System Overview
|
||||
|
||||
*Next-Generation Multimodal Intelligence*
|
||||
|
||||
<div style="background: linear-gradient(135deg, #1a1a2e 0%, #16213e 50%, #0f3460 100%); border-radius: 15px; padding: 25px; margin: 20px 0; border: 2px solid #00d9ff; box-shadow: 0 0 30px rgba(0, 217, 255, 0.3);">
|
||||
|
||||
Modern documents increasingly contain diverse multimodal content—text, images, tables, equations, charts, and multimedia—that traditional text-focused RAG systems cannot effectively process. **RAG-Anything** addresses this challenge as a comprehensive **All-in-One Multimodal Document Processing RAG system** built on [LightRAG](https://github.com/HKUDS/LightRAG).
|
||||
|
||||
As a unified solution, RAG-Anything **eliminates the need for multiple specialized tools**. It provides **seamless processing and querying across all content modalities** within a single integrated framework. Unlike conventional RAG approaches that struggle with non-textual elements, our all-in-one system delivers **comprehensive multimodal retrieval capabilities**.
|
||||
|
||||
Users can query documents containing **interleaved text**, **visual diagrams**, **structured tables**, and **mathematical formulations** through **one cohesive interface**. This consolidated approach makes RAG-Anything particularly valuable for academic research, technical documentation, financial reports, and enterprise knowledge management where rich, mixed-content documents demand a **unified processing framework**.
|
||||
|
||||
<img src="assets/rag_anything_framework.png" alt="RAG-Anything" />
|
||||
|
||||
</div>
|
||||
|
||||
### 🎯 Key Features
|
||||
|
||||
<div style="background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%); border-radius: 15px; padding: 25px; margin: 20px 0;">
|
||||
|
||||
- **🔄 End-to-End Multimodal Pipeline** - Complete workflow from document ingestion and parsing to intelligent multimodal query answering
|
||||
- **📄 Universal Document Support** - Seamless processing of PDFs, Office documents, images, and diverse file formats
|
||||
- **🧠 Specialized Content Analysis** - Dedicated processors for images, tables, mathematical equations, and heterogeneous content types
|
||||
- **🔗 Multimodal Knowledge Graph** - Automatic entity extraction and cross-modal relationship discovery for enhanced understanding
|
||||
- **⚡ Adaptive Processing Modes** - Flexible MinerU-based parsing or direct multimodal content injection workflows
|
||||
- **🎯 Hybrid Intelligent Retrieval** - Advanced search capabilities spanning textual and multimodal content with contextual understanding
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Algorithm & Architecture
|
||||
|
||||
<div style="background: linear-gradient(135deg, #0f0f23 0%, #1a1a2e 100%); border-radius: 15px; padding: 25px; margin: 20px 0; border-left: 5px solid #00d9ff;">
|
||||
|
||||
### Core Algorithm
|
||||
|
||||
**RAG-Anything** implements an effective **multi-stage multimodal pipeline** that fundamentally extends traditional RAG architectures to seamlessly handle diverse content modalities through intelligent orchestration and cross-modal understanding.
|
||||
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<div style="width: 100%; max-width: 600px; margin: 20px auto; padding: 20px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2);">
|
||||
<div style="display: flex; justify-content: space-around; align-items: center; flex-wrap: wrap; gap: 20px;">
|
||||
<div style="text-align: center;">
|
||||
<div style="font-size: 24px; margin-bottom: 10px;">📄</div>
|
||||
<div style="font-size: 14px; color: #00d9ff;">Document Parsing</div>
|
||||
</div>
|
||||
<div style="font-size: 20px; color: #00d9ff;">→</div>
|
||||
<div style="text-align: center;">
|
||||
<div style="font-size: 24px; margin-bottom: 10px;">🧠</div>
|
||||
<div style="font-size: 14px; color: #00d9ff;">Content Analysis</div>
|
||||
</div>
|
||||
<div style="font-size: 20px; color: #00d9ff;">→</div>
|
||||
<div style="text-align: center;">
|
||||
<div style="font-size: 24px; margin-bottom: 10px;">🔍</div>
|
||||
<div style="font-size: 14px; color: #00d9ff;">Knowledge Graph</div>
|
||||
</div>
|
||||
<div style="font-size: 20px; color: #00d9ff;">→</div>
|
||||
<div style="text-align: center;">
|
||||
<div style="font-size: 24px; margin-bottom: 10px;">🎯</div>
|
||||
<div style="font-size: 14px; color: #00d9ff;">Intelligent Retrieval</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
### 1. Document Parsing Stage
|
||||
|
||||
<div style="background: linear-gradient(90deg, #1a1a2e 0%, #16213e 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #4ecdc4;">
|
||||
|
||||
The system provides high-fidelity document extraction through adaptive content decomposition. It intelligently segments heterogeneous elements while preserving contextual relationships. Universal format compatibility is achieved via specialized optimized parsers.
|
||||
|
||||
**Key Components:**
|
||||
|
||||
- **⚙️ MinerU Integration**: Leverages [MinerU](https://github.com/opendatalab/MinerU) for high-fidelity document structure extraction and semantic preservation across complex layouts.
|
||||
|
||||
- **🧩 Adaptive Content Decomposition**: Automatically segments documents into coherent text blocks, visual elements, structured tables, mathematical equations, and specialized content types while preserving contextual relationships.
|
||||
|
||||
- **📁 Universal Format Support**: Provides comprehensive handling of PDFs, Office documents (DOC/DOCX/PPT/PPTX/XLS/XLSX), images, and emerging formats through specialized parsers with format-specific optimization.
|
||||
|
||||
</div>
|
||||
|
||||
### 2. Multi-Modal Content Understanding & Processing
|
||||
|
||||
<div style="background: linear-gradient(90deg, #16213e 0%, #0f3460 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #ff6b6b;">
|
||||
|
||||
The system automatically categorizes and routes content through optimized channels. It uses concurrent pipelines for parallel text and multimodal processing. Document hierarchy and relationships are preserved during transformation.
|
||||
|
||||
**Key Components:**
|
||||
|
||||
- **🎯 Autonomous Content Categorization and Routing**: Automatically identify, categorize, and route different content types through optimized execution channels.
|
||||
|
||||
- **⚡ Concurrent Multi-Pipeline Architecture**: Implements concurrent execution of textual and multimodal content through dedicated processing pipelines. This approach maximizes throughput efficiency while preserving content integrity.
|
||||
|
||||
- **🏗️ Document Hierarchy Extraction**: Extracts and preserves original document hierarchy and inter-element relationships during content transformation.
|
||||
|
||||
</div>
|
||||
|
||||
### 3. Multimodal Analysis Engine
|
||||
|
||||
<div style="background: linear-gradient(90deg, #0f3460 0%, #1a1a2e 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #00d9ff;">
|
||||
|
||||
The system deploys modality-aware processing units for heterogeneous data modalities:
|
||||
|
||||
**Specialized Analyzers:**
|
||||
|
||||
- **🔍 Visual Content Analyzer**:
|
||||
- Integrate vision model for image analysis.
|
||||
- Generates context-aware descriptive captions based on visual semantics.
|
||||
- Extracts spatial relationships and hierarchical structures between visual elements.
|
||||
|
||||
- **📊 Structured Data Interpreter**:
|
||||
- Performs systematic interpretation of tabular and structured data formats.
|
||||
- Implements statistical pattern recognition algorithms for data trend analysis.
|
||||
- Identifies semantic relationships and dependencies across multiple tabular datasets.
|
||||
|
||||
- **📐 Mathematical Expression Parser**:
|
||||
- Parses complex mathematical expressions and formulas with high accuracy.
|
||||
- Provides native LaTeX format support for seamless integration with academic workflows.
|
||||
- Establishes conceptual mappings between mathematical equations and domain-specific knowledge bases.
|
||||
|
||||
- **🔧 Extensible Modality Handler**:
|
||||
- Provides configurable processing framework for custom and emerging content types.
|
||||
- Enables dynamic integration of new modality processors through plugin architecture.
|
||||
- Supports runtime configuration of processing pipelines for specialized use cases.
|
||||
|
||||
</div>
|
||||
|
||||
### 4. Multimodal Knowledge Graph Index
|
||||
|
||||
<div style="background: linear-gradient(90deg, #1a1a2e 0%, #16213e 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #4ecdc4;">
|
||||
|
||||
The multi-modal knowledge graph construction module transforms document content into structured semantic representations. It extracts multimodal entities, establishes cross-modal relationships, and preserves hierarchical organization. The system applies weighted relevance scoring for optimized knowledge retrieval.
|
||||
|
||||
**Core Functions:**
|
||||
|
||||
- **🔍 Multi-Modal Entity Extraction**: Transforms significant multimodal elements into structured knowledge graph entities. The process includes semantic annotations and metadata preservation.
|
||||
|
||||
- **🔗 Cross-Modal Relationship Mapping**: Establishes semantic connections and dependencies between textual entities and multimodal components. This is achieved through automated relationship inference algorithms.
|
||||
|
||||
- **🏗️ Hierarchical Structure Preservation**: Maintains original document organization through "belongs_to" relationship chains. These chains preserve logical content hierarchy and sectional dependencies.
|
||||
|
||||
- **⚖️ Weighted Relationship Scoring**: Assigns quantitative relevance scores to relationship types. Scoring is based on semantic proximity and contextual significance within the document structure.
|
||||
|
||||
</div>
|
||||
|
||||
### 5. Modality-Aware Retrieval
|
||||
|
||||
<div style="background: linear-gradient(90deg, #16213e 0%, #0f3460 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #ff6b6b;">
|
||||
|
||||
The hybrid retrieval system combines vector similarity search with graph traversal algorithms for comprehensive content retrieval. It implements modality-aware ranking mechanisms and maintains relational coherence between retrieved elements to ensure contextually integrated information delivery.
|
||||
|
||||
**Retrieval Mechanisms:**
|
||||
|
||||
- **🔀 Vector-Graph Fusion**: Integrates vector similarity search with graph traversal algorithms. This approach leverages both semantic embeddings and structural relationships for comprehensive content retrieval.
|
||||
|
||||
- **📊 Modality-Aware Ranking**: Implements adaptive scoring mechanisms that weight retrieval results based on content type relevance. The system adjusts rankings according to query-specific modality preferences.
|
||||
|
||||
- **🔗 Relational Coherence Maintenance**: Maintains semantic and structural relationships between retrieved elements. This ensures coherent information delivery and contextual integrity.
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
*Initialize Your AI Journey*
|
||||
|
||||
<div align="center">
|
||||
<img src="https://user-images.githubusercontent.com/74038190/212284158-e840e285-664b-44d7-b79b-e264b5e54825.gif" width="400">
|
||||
</div>
|
||||
|
||||
### Installation
|
||||
|
||||
#### Option 1: Install from PyPI (Recommended)
|
||||
|
||||
```bash
|
||||
# Basic installation
|
||||
pip install raganything
|
||||
|
||||
# With optional dependencies for extended format support:
|
||||
pip install 'raganything[all]' # All optional features
|
||||
pip install 'raganything[image]' # Image format conversion (BMP, TIFF, GIF, WebP)
|
||||
pip install 'raganything[text]' # Text file processing (TXT, MD)
|
||||
pip install 'raganything[image,text]' # Multiple features
|
||||
```
|
||||
|
||||
#### Option 2: Install from Source
|
||||
|
||||
```bash
|
||||
git clone https://github.com/HKUDS/RAG-Anything.git
|
||||
cd RAG-Anything
|
||||
pip install -e .
|
||||
|
||||
# With optional dependencies
|
||||
pip install -e '.[all]'
|
||||
```
|
||||
|
||||
#### Optional Dependencies
|
||||
|
||||
- **`[image]`** - Enables processing of BMP, TIFF, GIF, WebP image formats (requires Pillow)
|
||||
- **`[text]`** - Enables processing of TXT and MD files (requires ReportLab)
|
||||
- **`[all]`** - Includes all Python optional dependencies
|
||||
|
||||
> **⚠️ Office Document Processing Requirements:**
|
||||
> - Office documents (.doc, .docx, .ppt, .pptx, .xls, .xlsx) require **LibreOffice** installation
|
||||
> - Download from [LibreOffice official website](https://www.libreoffice.org/download/download/)
|
||||
> - **Windows**: Download installer from official website
|
||||
> - **macOS**: `brew install --cask libreoffice`
|
||||
> - **Ubuntu/Debian**: `sudo apt-get install libreoffice`
|
||||
> - **CentOS/RHEL**: `sudo yum install libreoffice`
|
||||
|
||||
**Check MinerU installation:**
|
||||
|
||||
```bash
|
||||
# Verify installation
|
||||
mineru --version
|
||||
|
||||
# Check if properly configured
|
||||
python -c "from raganything import RAGAnything; rag = RAGAnything(); print('✅ MinerU installed properly' if rag.check_mineru_installation() else '❌ MinerU installation issue')"
|
||||
```
|
||||
|
||||
Models are downloaded automatically on first use. For manual download, refer to [MinerU Model Source Configuration](https://github.com/opendatalab/MinerU/blob/master/README.md#22-model-source-configuration).
|
||||
|
||||
### Usage Examples
|
||||
|
||||
#### 1. End-to-End Document Processing
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from raganything import RAGAnything
|
||||
from lightrag.llm.openai import openai_complete_if_cache, openai_embed
|
||||
from lightrag.utils import EmbeddingFunc
|
||||
|
||||
async def main():
|
||||
# Initialize RAGAnything
|
||||
rag = RAGAnything(
|
||||
working_dir="./rag_storage",
|
||||
llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
|
||||
"gpt-4o-mini",
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
api_key="your-api-key",
|
||||
**kwargs,
|
||||
),
|
||||
vision_model_func=lambda prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs: openai_complete_if_cache(
|
||||
"gpt-4o",
|
||||
"",
|
||||
system_prompt=None,
|
||||
history_messages=[],
|
||||
messages=[
|
||||
{"role": "system", "content": system_prompt} if system_prompt else None,
|
||||
{"role": "user", "content": [
|
||||
{"type": "text", "text": prompt},
|
||||
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}}
|
||||
]} if image_data else {"role": "user", "content": prompt}
|
||||
],
|
||||
api_key="your-api-key",
|
||||
**kwargs,
|
||||
) if image_data else openai_complete_if_cache(
|
||||
"gpt-4o-mini",
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
api_key="your-api-key",
|
||||
**kwargs,
|
||||
),
|
||||
embedding_func=EmbeddingFunc(
|
||||
embedding_dim=3072,
|
||||
max_token_size=8192,
|
||||
func=lambda texts: openai_embed(
|
||||
texts,
|
||||
model="text-embedding-3-large",
|
||||
api_key=api_key,
|
||||
base_url=base_url,
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
# Process a document
|
||||
await rag.process_document_complete(
|
||||
file_path="path/to/your/document.pdf",
|
||||
output_dir="./output",
|
||||
parse_method="auto"
|
||||
)
|
||||
|
||||
# Query the processed content
|
||||
# Pure text query - for basic knowledge base search
|
||||
text_result = await rag.aquery(
|
||||
"What are the main findings shown in the figures and tables?",
|
||||
mode="hybrid"
|
||||
)
|
||||
print("Text query result:", text_result)
|
||||
|
||||
# Multimodal query with specific multimodal content
|
||||
multimodal_result = await rag.aquery_with_multimodal(
|
||||
"Explain this formula and its relevance to the document content",
|
||||
multimodal_content=[{
|
||||
"type": "equation",
|
||||
"latex": "P(d|q) = \\frac{P(q|d) \\cdot P(d)}{P(q)}",
|
||||
"equation_caption": "Document relevance probability"
|
||||
}],
|
||||
mode="hybrid"
|
||||
)
|
||||
print("Multimodal query result:", multimodal_result)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
#### 2. Direct Multimodal Content Processing
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from lightrag import LightRAG
|
||||
from raganything.modalprocessors import ImageModalProcessor, TableModalProcessor
|
||||
|
||||
async def process_multimodal_content():
|
||||
# Initialize LightRAG
|
||||
rag = LightRAG(
|
||||
working_dir="./rag_storage",
|
||||
# ... your LLM and embedding configurations
|
||||
)
|
||||
await rag.initialize_storages()
|
||||
|
||||
# Process an image
|
||||
image_processor = ImageModalProcessor(
|
||||
lightrag=rag,
|
||||
modal_caption_func=your_vision_model_func
|
||||
)
|
||||
|
||||
image_content = {
|
||||
"img_path": "path/to/image.jpg",
|
||||
"img_caption": ["Figure 1: Experimental results"],
|
||||
"img_footnote": ["Data collected in 2024"]
|
||||
}
|
||||
|
||||
description, entity_info = await image_processor.process_multimodal_content(
|
||||
modal_content=image_content,
|
||||
content_type="image",
|
||||
file_path="research_paper.pdf",
|
||||
entity_name="Experimental Results Figure"
|
||||
)
|
||||
|
||||
# Process a table
|
||||
table_processor = TableModalProcessor(
|
||||
lightrag=rag,
|
||||
modal_caption_func=your_llm_model_func
|
||||
)
|
||||
|
||||
table_content = {
|
||||
"table_body": """
|
||||
| Method | Accuracy | F1-Score |
|
||||
|--------|----------|----------|
|
||||
| RAGAnything | 95.2% | 0.94 |
|
||||
| Baseline | 87.3% | 0.85 |
|
||||
""",
|
||||
"table_caption": ["Performance Comparison"],
|
||||
"table_footnote": ["Results on test dataset"]
|
||||
}
|
||||
|
||||
description, entity_info = await table_processor.process_multimodal_content(
|
||||
modal_content=table_content,
|
||||
content_type="table",
|
||||
file_path="research_paper.pdf",
|
||||
entity_name="Performance Results Table"
|
||||
)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(process_multimodal_content())
|
||||
```
|
||||
|
||||
#### 3. Batch Processing
|
||||
|
||||
```python
|
||||
# Process multiple documents
|
||||
await rag.process_folder_complete(
|
||||
folder_path="./documents",
|
||||
output_dir="./output",
|
||||
file_extensions=[".pdf", ".docx", ".pptx"],
|
||||
recursive=True,
|
||||
max_workers=4
|
||||
)
|
||||
```
|
||||
|
||||
#### 4. Custom Modal Processors
|
||||
|
||||
```python
|
||||
from raganything.modalprocessors import GenericModalProcessor
|
||||
|
||||
class CustomModalProcessor(GenericModalProcessor):
|
||||
async def process_multimodal_content(self, modal_content, content_type, file_path, entity_name):
|
||||
# Your custom processing logic
|
||||
enhanced_description = await self.analyze_custom_content(modal_content)
|
||||
entity_info = self.create_custom_entity(enhanced_description, entity_name)
|
||||
return await self._create_entity_and_chunk(enhanced_description, entity_info, file_path)
|
||||
```
|
||||
|
||||
#### 5. Query Options
|
||||
|
||||
RAG-Anything provides two types of query methods:
|
||||
|
||||
**Pure Text Queries** - Direct knowledge base search using LightRAG:
|
||||
```python
|
||||
# Different query modes for text queries
|
||||
text_result_hybrid = await rag.aquery("Your question", mode="hybrid")
|
||||
text_result_local = await rag.aquery("Your question", mode="local")
|
||||
text_result_global = await rag.aquery("Your question", mode="global")
|
||||
text_result_naive = await rag.aquery("Your question", mode="naive")
|
||||
|
||||
# Synchronous version
|
||||
sync_text_result = rag.query("Your question", mode="hybrid")
|
||||
```
|
||||
|
||||
**Multimodal Queries** - Enhanced queries with multimodal content analysis:
|
||||
```python
|
||||
# Query with table data
|
||||
table_result = await rag.aquery_with_multimodal(
|
||||
"Compare these performance metrics with the document content",
|
||||
multimodal_content=[{
|
||||
"type": "table",
|
||||
"table_data": """Method,Accuracy,Speed
|
||||
RAGAnything,95.2%,120ms
|
||||
Traditional,87.3%,180ms""",
|
||||
"table_caption": "Performance comparison"
|
||||
}],
|
||||
mode="hybrid"
|
||||
)
|
||||
|
||||
# Query with equation content
|
||||
equation_result = await rag.aquery_with_multimodal(
|
||||
"Explain this formula and its relevance to the document content",
|
||||
multimodal_content=[{
|
||||
"type": "equation",
|
||||
"latex": "P(d|q) = \\frac{P(q|d) \\cdot P(d)}{P(q)}",
|
||||
"equation_caption": "Document relevance probability"
|
||||
}],
|
||||
mode="hybrid"
|
||||
)
|
||||
```
|
||||
|
||||
#### 6. Loading Existing LightRAG Instance
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from raganything import RAGAnything
|
||||
from lightrag import LightRAG
|
||||
from lightrag.llm.openai import openai_complete_if_cache, openai_embed
|
||||
from lightrag.utils import EmbeddingFunc
|
||||
import os
|
||||
|
||||
async def load_existing_lightrag():
|
||||
# First, create or load an existing LightRAG instance
|
||||
lightrag_working_dir = "./existing_lightrag_storage"
|
||||
|
||||
# Check if previous LightRAG instance exists
|
||||
if os.path.exists(lightrag_working_dir) and os.listdir(lightrag_working_dir):
|
||||
print("✅ Found existing LightRAG instance, loading...")
|
||||
else:
|
||||
print("❌ No existing LightRAG instance found, will create new one")
|
||||
|
||||
# Create/Load LightRAG instance with your configurations
|
||||
lightrag_instance = LightRAG(
|
||||
working_dir=lightrag_working_dir,
|
||||
llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
|
||||
"gpt-4o-mini",
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
api_key="your-api-key",
|
||||
**kwargs,
|
||||
),
|
||||
embedding_func=EmbeddingFunc(
|
||||
embedding_dim=3072,
|
||||
max_token_size=8192,
|
||||
func=lambda texts: openai_embed(
|
||||
texts,
|
||||
model="text-embedding-3-large",
|
||||
api_key=api_key,
|
||||
base_url=base_url,
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
# Initialize storage (this will load existing data if available)
|
||||
await lightrag_instance.initialize_storages()
|
||||
|
||||
# Now initialize RAGAnything with the existing LightRAG instance
|
||||
rag = RAGAnything(
|
||||
lightrag=lightrag_instance, # Pass the existing LightRAG instance
|
||||
# Only need vision model for multimodal processing
|
||||
vision_model_func=lambda prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs: openai_complete_if_cache(
|
||||
"gpt-4o",
|
||||
"",
|
||||
system_prompt=None,
|
||||
history_messages=[],
|
||||
messages=[
|
||||
{"role": "system", "content": system_prompt} if system_prompt else None,
|
||||
{"role": "user", "content": [
|
||||
{"type": "text", "text": prompt},
|
||||
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}}
|
||||
]} if image_data else {"role": "user", "content": prompt}
|
||||
],
|
||||
api_key="your-api-key",
|
||||
**kwargs,
|
||||
) if image_data else openai_complete_if_cache(
|
||||
"gpt-4o-mini",
|
||||
prompt,
|
||||
system_prompt=system_prompt,
|
||||
history_messages=history_messages,
|
||||
api_key="your-api-key",
|
||||
**kwargs,
|
||||
)
|
||||
# Note: working_dir, llm_model_func, embedding_func, etc. are inherited from lightrag_instance
|
||||
)
|
||||
|
||||
# Query the existing knowledge base
|
||||
result = await rag.query_with_multimodal(
|
||||
"What data has been processed in this LightRAG instance?",
|
||||
mode="hybrid"
|
||||
)
|
||||
print("Query result:", result)
|
||||
|
||||
# Add new multimodal documents to the existing LightRAG instance
|
||||
await rag.process_document_complete(
|
||||
file_path="path/to/new/multimodal_document.pdf",
|
||||
output_dir="./output"
|
||||
)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(load_existing_lightrag())
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Examples
|
||||
|
||||
*Practical Implementation Demos*
|
||||
|
||||
<div align="center">
|
||||
<img src="https://user-images.githubusercontent.com/74038190/212257455-13e3e01e-d6a6-45dc-bb92-3ab87b12dfc1.gif" width="300">
|
||||
</div>
|
||||
|
||||
The `examples/` directory contains comprehensive usage examples:
|
||||
|
||||
- **`raganything_example.py`**: End-to-end document processing with MinerU
|
||||
- **`modalprocessors_example.py`**: Direct multimodal content processing
|
||||
- **`office_document_test.py`**: Office document parsing test with MinerU (no API key required)
|
||||
- **`image_format_test.py`**: Image format parsing test with MinerU (no API key required)
|
||||
- **`text_format_test.py`**: Text format parsing test with MinerU (no API key required)
|
||||
|
||||
**Run examples:**
|
||||
|
||||
```bash
|
||||
# End-to-end processing
|
||||
python examples/raganything_example.py path/to/document.pdf --api-key YOUR_API_KEY
|
||||
|
||||
# Direct modal processing
|
||||
python examples/modalprocessors_example.py --api-key YOUR_API_KEY
|
||||
|
||||
# Office document parsing test (MinerU only)
|
||||
python examples/office_document_test.py --file path/to/document.docx
|
||||
|
||||
# Image format parsing test (MinerU only)
|
||||
python examples/image_format_test.py --file path/to/image.bmp
|
||||
|
||||
# Text format parsing test (MinerU only)
|
||||
python examples/text_format_test.py --file path/to/document.md
|
||||
|
||||
# Check LibreOffice installation
|
||||
python examples/office_document_test.py --check-libreoffice --file dummy
|
||||
|
||||
# Check PIL/Pillow installation
|
||||
python examples/image_format_test.py --check-pillow --file dummy
|
||||
|
||||
# Check ReportLab installation
|
||||
python examples/text_format_test.py --check-reportlab --file dummy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
*System Optimization Parameters*
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Create a `.env` file (refer to `.env.example`):
|
||||
|
||||
```bash
|
||||
OPENAI_API_KEY=your_openai_api_key
|
||||
OPENAI_BASE_URL=your_base_url # Optional
|
||||
```
|
||||
|
||||
> **Note**: API keys are only required for full RAG processing with LLM integration. The parsing test files (`office_document_test.py` and `image_format_test.py`) only test MinerU functionality and do not require API keys.
|
||||
|
||||
### MinerU Configuration
|
||||
|
||||
MinerU 2.0 uses a simplified configuration approach:
|
||||
|
||||
```bash
|
||||
# MinerU 2.0 uses command-line parameters instead of config files
|
||||
# Check available options:
|
||||
mineru --help
|
||||
|
||||
# Common configurations:
|
||||
mineru -p input.pdf -o output_dir -m auto # Automatic parsing mode
|
||||
mineru -p input.pdf -o output_dir -m ocr # OCR-focused parsing
|
||||
mineru -p input.pdf -o output_dir -b pipeline --device cuda # GPU acceleration
|
||||
```
|
||||
|
||||
You can also configure MinerU through RAGAnything parameters:
|
||||
|
||||
```python
|
||||
# Configure parsing behavior
|
||||
await rag.process_document_complete(
|
||||
file_path="document.pdf",
|
||||
parse_method="auto", # or "ocr", "txt"
|
||||
device="cuda", # GPU acceleration
|
||||
backend="pipeline", # parsing backend
|
||||
lang="en" # language optimization
|
||||
)
|
||||
```
|
||||
|
||||
> **Note**: MinerU 2.0 no longer uses the `magic-pdf.json` configuration file. All settings are now passed as command-line parameters or function arguments.
|
||||
|
||||
### Processing Requirements
|
||||
|
||||
Different content types require specific optional dependencies:
|
||||
|
||||
- **Office Documents** (.doc, .docx, .ppt, .pptx, .xls, .xlsx): Install [LibreOffice](https://www.libreoffice.org/download/download/)
|
||||
- **Extended Image Formats** (.bmp, .tiff, .gif, .webp): Install with `pip install raganything[image]`
|
||||
- **Text Files** (.txt, .md): Install with `pip install raganything[text]`
|
||||
|
||||
> **📋 Quick Install**: Use `pip install raganything[all]` to enable all format support (Python dependencies only - LibreOffice still needs separate installation)
|
||||
|
||||
---
|
||||
|
||||
## 🧪 Supported Content Types
|
||||
|
||||
### Document Formats
|
||||
|
||||
- **PDFs** - Research papers, reports, presentations
|
||||
- **Office Documents** - DOC, DOCX, PPT, PPTX, XLS, XLSX
|
||||
- **Images** - JPG, PNG, BMP, TIFF, GIF, WebP
|
||||
- **Text Files** - TXT, MD
|
||||
|
||||
### Multimodal Elements
|
||||
|
||||
- **Images** - Photographs, diagrams, charts, screenshots
|
||||
- **Tables** - Data tables, comparison charts, statistical summaries
|
||||
- **Equations** - Mathematical formulas in LaTeX format
|
||||
- **Generic Content** - Custom content types via extensible processors
|
||||
|
||||
*For installation of format-specific dependencies, see the [Configuration](#-configuration) section.*
|
||||
|
||||
---
|
||||
|
||||
## 📖 Citation
|
||||
|
||||
*Academic Reference*
|
||||
|
||||
<div align="center">
|
||||
<div style="width: 60px; height: 60px; margin: 20px auto; position: relative;">
|
||||
<div style="width: 100%; height: 100%; border: 2px solid #00d9ff; border-radius: 50%; position: relative;">
|
||||
<div style="position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); font-size: 24px; color: #00d9ff;">📖</div>
|
||||
</div>
|
||||
<div style="position: absolute; bottom: -5px; left: 50%; transform: translateX(-50%); width: 20px; height: 20px; background: white; border-right: 2px solid #00d9ff; border-bottom: 2px solid #00d9ff; transform: rotate(45deg);"></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
If you find RAG-Anything useful in your research, please cite our paper:
|
||||
|
||||
```bibtex
|
||||
@article{guo2024lightrag,
|
||||
title={LightRAG: Simple and Fast Retrieval-Augmented Generation},
|
||||
author={Zirui Guo and Lianghao Xia and Yanhua Yu and Tu Ao and Chao Huang},
|
||||
year={2024},
|
||||
eprint={2410.05779},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.IR}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Projects
|
||||
|
||||
*Ecosystem & Extensions*
|
||||
|
||||
<div align="center">
|
||||
<table>
|
||||
<tr>
|
||||
<td align="center">
|
||||
<a href="https://github.com/HKUDS/LightRAG">
|
||||
<div style="width: 100px; height: 100px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2); display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
|
||||
<span style="font-size: 32px;">⚡</span>
|
||||
</div>
|
||||
<b>LightRAG</b><br>
|
||||
<sub>Simple and Fast RAG</sub>
|
||||
</a>
|
||||
</td>
|
||||
<td align="center">
|
||||
<a href="https://github.com/HKUDS/VideoRAG">
|
||||
<div style="width: 100px; height: 100px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2); display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
|
||||
<span style="font-size: 32px;">🎥</span>
|
||||
</div>
|
||||
<b>VideoRAG</b><br>
|
||||
<sub>Extreme Long-Context Video RAG</sub>
|
||||
</a>
|
||||
</td>
|
||||
<td align="center">
|
||||
<a href="https://github.com/HKUDS/MiniRAG">
|
||||
<div style="width: 100px; height: 100px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2); display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
|
||||
<span style="font-size: 32px;">✨</span>
|
||||
</div>
|
||||
<b>MiniRAG</b><br>
|
||||
<sub>Extremely Simple RAG</sub>
|
||||
</a>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## ⭐ Star History
|
||||
|
||||
*Community Growth Trajectory*
|
||||
|
||||
<div align="center">
|
||||
<a href="https://star-history.com/#HKUDS/RAG-Anything&Date">
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=HKUDS/RAG-Anything&type=Date&theme=dark" />
|
||||
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=HKUDS/RAG-Anything&type=Date" />
|
||||
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=HKUDS/RAG-Anything&type=Date" style="border-radius: 15px; box-shadow: 0 0 30px rgba(0, 217, 255, 0.3);" />
|
||||
</picture>
|
||||
</a>
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Contribution
|
||||
|
||||
*Join the Innovation*
|
||||
|
||||
<div align="center">
|
||||
We thank all our contributors for their valuable contributions.
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<a href="https://github.com/HKUDS/RAG-Anything/graphs/contributors">
|
||||
<img src="https://contrib.rocks/image?repo=HKUDS/RAG-Anything" style="border-radius: 15px; box-shadow: 0 0 20px rgba(0, 217, 255, 0.3);" />
|
||||
</a>
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
<div align="center" style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 15px; padding: 30px; margin: 30px 0;">
|
||||
<div>
|
||||
<img src="https://user-images.githubusercontent.com/74038190/212284100-561aa473-3905-4a80-b561-0d28506553ee.gif" width="500">
|
||||
</div>
|
||||
<div style="margin-top: 20px;">
|
||||
<a href="https://github.com/HKUDS/RAG-Anything" style="text-decoration: none;">
|
||||
<img src="https://img.shields.io/badge/⭐%20Star%20us%20on%20GitHub-1a1a2e?style=for-the-badge&logo=github&logoColor=white">
|
||||
</a>
|
||||
<a href="https://github.com/HKUDS/RAG-Anything/issues" style="text-decoration: none;">
|
||||
<img src="https://img.shields.io/badge/🐛%20Report%20Issues-ff6b6b?style=for-the-badge&logo=github&logoColor=white">
|
||||
</a>
|
||||
<a href="https://github.com/HKUDS/RAG-Anything/discussions" style="text-decoration: none;">
|
||||
<img src="https://img.shields.io/badge/💬%20Discussions-4ecdc4?style=for-the-badge&logo=github&logoColor=white">
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div align="center">
|
||||
<div style="width: 100%; max-width: 600px; margin: 20px auto; padding: 20px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2);">
|
||||
<div style="display: flex; justify-content: center; align-items: center; gap: 15px;">
|
||||
<span style="font-size: 24px;">⭐</span>
|
||||
<span style="color: #00d9ff; font-size: 18px;">Thank you for visiting RAG-Anything!</span>
|
||||
<span style="font-size: 24px;">⭐</span>
|
||||
</div>
|
||||
<div style="margin-top: 10px; color: #00d9ff; font-size: 16px;">Building the Future of Multimodal AI</div>
|
||||
</div>
|
||||
</div>
|
Loading…
Reference in new issue