
Company News
/Security News
Socket Selected for OpenAI's Cybersecurity Grant Program
Socket is an initial recipient of OpenAI's Cybersecurity Grant Program, which commits $10M in API credits to defenders securing open source software.
thelamapi/next-ocr
Advanced tools
Next OCR 8B is an 8-billion parameter model optimized for optical character recognition (OCR) tasks with mathematical and tabular content understanding.
Supports multilingual OCR (Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, Russian...) with high accuracy, including structured documents like tables, forms, and formulas.

| Model | OCR-Bench Accuracy (%) | Multilingual Accuracy (%) | Layout / Table Understanding (%) |
|---|---|---|---|
| Next OCR | 99.0 | 96.8 | 95.3 |
| PaddleOCR | 95.2 | 93.9 | 95.3 |
| Deepseek OCR | 90.6 | 87.4 | 86.1 |
| Tesseract | 92.0 | 88.4 | 72.0 |
| EasyOCR | 90.4 | 84.7 | 78.9 |
| Google Cloud Vision / DocAI | 98.7 | 95.5 | 93.6 |
| Amazon Textract | 94.7 | 86.2 | 86.1 |
| Azure Document Intelligence | 95.1 | 93.6 | 91.4 |
| Model | Handwriting (%) | Scene Text (%) | Complex Tables (%) |
|---|---|---|---|
| Next OCR | 92 | 96 | 91 |
| PaddleOCR | 88 | 92 | 90 |
| Deepseek OCR | 80 | 85 | 83 |
| Tesseract | 75 | 88 | 70 |
| EasyOCR | 78 | 86 | 75 |
| Google Cloud Vision / DocAI | 90 | 95 | 92 |
| Amazon Textract | 85 | 90 | 88 |
| Azure Document Intelligence | 87 | 91 | 89 |
from transformers import AutoTokenizer, AutoModelForVision2Seq
import torch
model_id = "Lamapi/next-ocr"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.float16)
img = Image.open("image.jpg")
# ATTENTION: The content list must include both an image and text.
messages = [
{"role": "system", "content": "You are Next-OCR, an helpful AI assistant trained by Lamapi."},
{
"role": "user",
"content": [
{"type": "image", "image": img},
{"type": "text", "text": "Read the text in this image and summarize it."}
]
}
]
# Apply the chat template correctly
prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)
with torch.no_grad():
generated = model.generate(**inputs, max_new_tokens=256)
print(processor.decode(generated[0], skip_special_tokens=True))
| Feature | Description |
|---|---|
| 🖼️ High-Accuracy OCR | Extracts text from images, documents, and screenshots reliably. |
| 🇹🇷 Multilingual Support | Works with 30+ languages including Turkish. |
| ⚡ Lightweight & Efficient | Optimized for resource-constrained environments. |
| 📄 Layout & Math Awareness | Handles tables, forms, and mathematical formulas. |
| 🏢 Reliable Outputs | Suitable for enterprise document workflows. |
| Specification | Details |
|---|---|
| Base Model | Qwen 3 |
| Parameters | 8 Billion |
| Architecture | Vision + Transformer (OCR LLM) |
| Modalities | Image-to-text |
| Fine-Tuning | OCR datasets with multilingual and math/tabular content |
| Optimizations | Quantization-ready, FP16 support |
| Primary Focus | Text extraction, document understanding, mathematical OCR |
MIT License — free for commercial & non-commercial use.
Next OCR — Compact OCR + math-capable AI, blending accuracy, speed, and multilingual document intelligence.
FAQs
Unknown package
We found that lamapi/next-ocr demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
/Security News
Socket is an initial recipient of OpenAI's Cybersecurity Grant Program, which commits $10M in API credits to defenders securing open source software.

Security News
Socket CEO Feross Aboukhadijeh joins 10 Minutes or Less, a podcast by Ali Rohde, to discuss the recent surge in open source supply chain attacks.

Research
/Security News
Campaign of 108 extensions harvests identities, steals sessions, and adds backdoors to browsers, all tied to the same C2 infrastructure.