
Security News
/Research
npm Phishing Email Targets Developers with Typosquatted Domain
A phishing attack targeted developers using a typosquatted npm domain (npnjs.com) to steal credentials via fake login pages - watch out for similar scams.
Rhubarb is a light-weight Python framework that makes it easy to build document and video understanding applications using Multi-modal Large Language Models (LLMs) and Embedding models. Rhubarb is created from the ground up to work with Amazon Bedrock and supports multiple foundation models including Anthropic Claude V3 Multi-modal Language Models and Amazon Nova models for document and video processing, along with Amazon Titan Multi-modal Embedding model for embeddings.
Visit Rhubarb documentation.
Rhubarb can do multiple document processing tasks such as
Rhubarb comes with built-in system prompts that makes it easy to use it for a number of different document understanding use-cases. You can customize Rhubarb by passing in your own system prompts. It supports exact JSON schema based output generation which makes it easy to integrate into downstream applications.
Start by installing Rhubarb using pip
.
pip install pyrhubarb
Create a boto3
session.
import boto3
session = boto3.Session()
Local file
from rhubarb import DocAnalysis
da = DocAnalysis(file_path="./path/to/doc/doc.pdf",
boto3_session=session)
resp = da.run(message="What is the employee's name?")
resp
With file in Amazon S3
from rhubarb import DocAnalysis
da = DocAnalysis(file_path="s3://path/to/doc/doc.pdf",
boto3_session=session)
resp = da.run(message="What is the employee's name?")
resp
from rhubarb import VideoAnalysis
import boto3
session = boto3.Session()
# Initialize video analysis with a video in S3
va = VideoAnalysis(
file_path="s3://my-bucket/my-video.mp4",
boto3_session=session
)
# Ask questions about the video
response = va.run(message="What is happening in this video?")
print(response)
Rhubarb supports processing documents with more than 20 pages using a sliding window approach. This feature is particularly useful when working with Claude models, which have a limitation of processing only 20 pages at a time.
To enable this feature, set sliding_window_overlap
to a value between 1 and 10 when creating a DocAnalysis
object:
doc_analysis = DocAnalysis(
file_path="path/to/large-document.pdf",
boto3_session=session,
sliding_window_overlap=2 # Number of pages to overlap between windows (1-10)
)
When the sliding window approach is enabled, Rhubarb will:
Note: The sliding window technique is not yet supported for document classification. When using classification with large documents, only the first 20 pages will be considered.
For more details, see the Large Document Processing Cookbook.
For more usage examples see cookbooks.
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.
FAQs
A Python framework for multi-modal document understanding with generative AI
We found that pyrhubarb demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
/Research
A phishing attack targeted developers using a typosquatted npm domain (npnjs.com) to steal credentials via fake login pages - watch out for similar scams.
Security News
Knip hits 500 releases with v5.62.0, refining TypeScript config detection and updating plugins as monthly npm downloads approach 12M.
Security News
The EU Cyber Resilience Act is prompting compliance requests that open source maintainers may not be obligated or equipped to handle.