DocsRay

π Live Demo (Base Model)
A powerful Universal Document Question-Answering System that uses advanced embedding models and multimodal LLMs with Coarse-to-Fine search (RAG) approach. Features seamless MCP (Model Context Protocol) integration with Claude Desktop, comprehensive directory management capabilities, visual content analysis, and intelligent hybrid OCR system.
π Quick Start
DocsRay now features automatic setup! Simply install and it will handle dependencies and download the lite model automatically.
pip install docsray
That's it! DocsRay will automatically:
- Install system dependencies
- Download the lite model (~3GB)
- Configure the environment
Manual Setup (if automatic setup fails)
If the automatic setup doesn't work properly, you can run the setup manually:
pip install docsray
docsray setup
docsray download-models --model-type lite
Optional Components
Audio/Video Processing (Optional)
sudo apt-get install ffmpeg
brew install ffmpeg
sudo yum install epel-release
sudo yum install ffmpeg
choco install ffmpeg
Additional Format Support
sudo apt-get install pandoc
brew install pandoc
sudo apt-get install fonts-nanum fonts-nanum-coding fonts-nanum-extra
Tesseract OCR (for enhanced OCR performance)
sudo apt-get install tesseract-ocr tesseract-ocr-kor
brew install tesseract tesseract-lang
Start Using DocsRay
docsray web
docsray api
docsray configure-claude
π Core Features
- π§ Advanced RAG System: Coarse-to-Fine search for accurate document retrieval
- ποΈ Multimodal AI: Visual content analysis using Gemma-3 vision capabilities
- π Hybrid OCR: Intelligent selection between AI-powered OCR and Pytesseract
- β‘ Adaptive Performance: Automatically optimizes based on system resources
- π― Flexible Model Selection: Choose between lite (4b), base (12b), and pro (27b) models
- π MCP Integration: Seamless integration with Claude Desktop
- π Multiple Interfaces: Web UI, API server, CLI, and MCP server
- π Universal Document Support: 30+ file formats with automatic conversion
- π Multi-Language: Korean, English, and other languages supported
π― What's New
v1.9.0: Enhanced Document Conversion
- LibreOffice Integration: Better quality conversions for Office documents when LibreOffice is installed
- Improved Format Support: Enhanced handling of DOCX, XLSX, PPTX, ODT, ODS, ODP, HWP/HWPX
v1.8.0: Multimedia Support
- Video/Audio Processing: Extract and analyze content from video and audio files
- Automatic Setup: DocsRay now automatically installs dependencies and downloads models
Recent Updates
- Auto-restart capability for all servers
- Enhanced embedding method (v1.7.0) - requires reindexing existing documents
For detailed changelog, see CHANGELOG.md
π Usage Guide
Model Management
docsray download-models --model-type lite
docsray download-models --model-type base
docsray download-models --model-type pro
docsray download-models --model-type base --force
docsray download-models --check
Document Processing
docsray process document.pdf --model-type base
docsray process report.docx --timeout 300
docsray process spreadsheet.xlsx --no-visuals
docsray ask document.pdf "What are the key findings?"
docsray ask report.docx "Summarize the conclusions" --model-type pro
Web Interface
docsray web
docsray web --model-type base --port 8080
docsray web --auto-restart
docsray web --auto-restart --max-retries 5
docsray web --timeout 300 --pages 10
API Server
docsray api --port 8000
docsray api --auto-restart
docsray api --auto-restart --timeout 600
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{
"document_path": "/path/to/document.pdf",
"question": "What is the main topic?",
"use_coarse_search": true
}'
curl http://localhost:8000/cache/info
curl -X POST http://localhost:8000/cache/clear
Performance Testing
docsray perf-test document.pdf "What is this about?"
docsray perf-test document.pdf "Analyze key points" \
--iterations 5 --port 8000 --host localhost
docsray perf-test document.pdf "What is this?" --timeout 600
MCP Integration (Claude Desktop)
docsray configure-claude
docsray mcp --auto-restart
π Supported File Formats
Office Documents: Word (.docx, .doc), Excel (.xlsx, .xls), PowerPoint (.pptx, .ppt)
Text Formats: Plain Text (.txt), Markdown (.md), HTML (.html)
Images: JPEG, PNG, GIF, BMP, TIFF, WebP
Korean Documents: HWP (.hwp, .hwpx)
PDFs: Native PDF support with visual analysis
Audio: MP3, WAV, M4A, FLAC, OGG, WMA, AAC (requires ffmpeg)
Video: MP4, AVI, MOV, WMV, FLV, MKV, WebM, M4V, MPG, MPEG (requires ffmpeg)
π οΈ Advanced Configuration
Environment Variables
export DOCSRAY_MODEL_TYPE=base
export DOCSRAY_DISABLE_VISUALS=1
export DOCSRAY_DEBUG=1
export DOCSRAY_HOME=/custom/path
Python API
from docsray import PDFChatBot
from docsray.scripts import pdf_extractor, chunker, build_index, section_rep_builder
extracted = pdf_extractor.extract_content("document.pdf", analyze_visuals=True)
chunks = chunker.process_extracted_file(extracted)
chunk_index = build_index.build_chunk_index(chunks)
sections = section_rep_builder.build_section_reps(extracted["sections"], chunk_index)
chatbot = PDFChatBot(sections, chunk_index)
answer, references = chatbot.answer("What are the key points?")
π§ System Requirements
Hardware Requirements
- CPU Mode: Any system with 4GB+ RAM
- GPU Acceleration: CUDA-compatible GPU or Apple Silicon (MPS)
- Storage: 3-16GB depending on model type chosen
Performance Modes (Auto-detected)
| < 16GB | FAST | Q4 quantized | 8K |
| 16-32GB | STANDARD | Q8 quantized | 16K |
| > 32GB | FULL_FEATURE | F16 precision | 32K |
π Troubleshooting
Common Issues
docsray download-models --check
docsray download-models --force
DOCSRAY_DEBUG=1 docsray web
Performance Issues
- Use
--model-type lite for faster processing
- Enable
--no-visuals for text-only documents
- Increase
--timeout for large documents
- Use auto-restart for stability:
--auto-restart
π Performance Benchmarks
Run your own benchmarks:
docsray perf-test document.pdf "test question" --iterations 10
docsray perf-test document.pdf "test question" --model-type lite
docsray perf-test document.pdf "test question" --model-type base
π€ Contributing
We welcome contributions! Please check our GitHub repository for:
- Bug reports and feature requests
- Code contributions and pull requests
- Documentation improvements
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Open Source Dependencies
DocsRay is built on top of these excellent open source projects:
- llama.cpp - GGML/GGUF model inference (MIT License)
- PyMuPDF - PDF processing (AGPL-3.0 License)
- pdfplumber - PDF text extraction (MIT License)
- FastAPI - Web framework (MIT License)
- Gradio - Web UI components (Apache-2.0 License)
- OpenCV - Image processing (Apache-2.0 License)
- faster-whisper - Audio transcription (MIT License)
- Pandas - Data manipulation (BSD-3-Clause License)
- NumPy - Numerical computing (BSD-3-Clause License)
- scikit-learn - Machine learning utilities (BSD-3-Clause License)
π Links