
Security News
Open Source Maintainers Demand Ability to Block Copilot-Generated Issues and PRs
Open source maintainers are urging GitHub to let them block Copilot from submitting AI-generated issues and pull requests to their repositories.
[!IMPORTANT]
π Now part of Docling!
Easily build document-native generative AI applications, such as RAG, leveraging Docling's efficient PDF extraction and rich data model β while still using your favorite framework, π¦ LlamaIndex or π¦π LangChain.
To use Quackling, simply install quackling
from your package manager, e.g. pip:
pip install quackling
Quackling offers core capabilities (quackling.core
), as well as framework integration components (quackling.llama_index
and quackling.langchain
). Below you find examples of both.
Here is a basic RAG pipeline using LlamaIndex:
[!NOTE] To use as is, first
pip install llama-index-embeddings-huggingface llama-index-llms-huggingface-api
additionally toquackling
to install the models. Otherwise, you can setEMBED_MODEL
&LLM
as desired, e.g. using local models.
import os
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from quackling.llama_index.node_parsers import HierarchicalJSONNodeParser
from quackling.llama_index.readers import DoclingPDFReader
DOCS = ["https://arxiv.org/pdf/2206.01062"]
QUESTION = "How many pages were human annotated?"
EMBED_MODEL = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
LLM = HuggingFaceInferenceAPI(
token=os.getenv("HF_TOKEN"),
model_name="mistralai/Mistral-7B-Instruct-v0.3",
)
index = VectorStoreIndex.from_documents(
documents=DoclingPDFReader(parse_type=DoclingPDFReader.ParseType.JSON).load_data(DOCS),
embed_model=EMBED_MODEL,
transformations=[HierarchicalJSONNodeParser()],
)
query_engine = index.as_query_engine(llm=LLM)
result = query_engine.query(QUESTION)
print(result.response)
# > 80K pages were human annotated
You can also use Quackling as a standalone with any pipeline. For instance, to split the document to chunks based on document structure and returning pointers to Docling document's nodes:
from docling.document_converter import DocumentConverter
from quackling.core.chunkers import HierarchicalChunker
doc = DocumentConverter().convert_single("https://arxiv.org/pdf/2408.09869").output
chunks = list(HierarchicalChunker().chunk(doc))
# > [
# > ChunkWithMetadata(
# > path='$.main-text[4]',
# > text='Docling Technical Report\n[...]',
# > page=1,
# > bbox=[117.56, 439.85, 494.07, 482.42]
# > ),
# > [...]
# > ]
Please read Contributing to Quackling for details.
If you use Quackling in your projects, please consider citing the following:
@techreport{Docling,
author = "Deep Search Team",
month = 8,
title = "Docling Technical Report",
url = "https://arxiv.org/abs/2408.09869",
eprint = "2408.09869",
doi = "10.48550/arXiv.2408.09869",
version = "1.0.0",
year = 2024
}
The Quackling codebase is under MIT license. For individual component usage, please refer to the component licenses found in the original packages.
FAQs
Quackling enables document-native generative AI applications
We found that quackling demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Open source maintainers are urging GitHub to let them block Copilot from submitting AI-generated issues and pull requests to their repositories.
Research
Security News
Malicious Koishi plugin silently exfiltrates messages with hex strings to a hardcoded QQ account, exposing secrets in chatbots across platforms.
Research
Security News
Malicious PyPI checkers validate stolen emails against TikTok and Instagram APIs, enabling targeted account attacks and dark web credential sales.