
Product
Introducing Socket MCP for Claude Desktop
Add secure dependency scanning to Claude Desktop with Socket MCP, a one-click extension that keeps your coding conversations safe from malicious packages.
Library for extracting schemas and building ontologies from documents using LLM
The generate schemas can be used to infer from document to use for tables in a database or for generating knowledge graph.
Before you begin, ensure you have the following installed on your system:
To install Poppler on MacOS, use the following command:
brew install poppler
To install Graphviz on Linux, use the following command:
sudo apt-get install poppler-utils
C:\Program Files\poppler
).bin
directory of the extracted folder to your system's PATH environment variable.To add to PATH:
bin
directory (e.g., C:\Program Files\poppler\bin
).After installation, restart your terminal or command prompt for the changes to take effect. If doesn't work try the magic restart button.
After installing the prerequisites and dependencies, you can start using scrape_schema to extract entities and their schema from PDFs.
Here’s a basic example:
git clone https://github.com/ScrapeGraphAI/scrape_schema
pip install -r requirements.txt
from scrape_schema import FileExtractor, PDFParser
import os
from dotenv import load_dotenv
load_dotenv() # Load environment variables from .env file
api_key = os.getenv("OPENAI_API_KEY")
# Path to your PDF file
pdf_path = "./test.pdf"
# Create an LLMClient instance
llm_client = LLMClient(api_key)
# Create a PDFParser instance with the LLMClient
pdf_parser = PDFParser(llm_client)
# Create a FileExtraxctor instance with the PDF parser
pdf_extractor = FileExtractor(pdf_path, pdf_parser)
# Extract entities from the PDF
entities = pdf_extractor.generate_json_schema()
print(entities)
{
"ROOT": {
"portfolio": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"series": {
"type": "string"
},
"fees": {
"type": "object",
"properties": {
"salesCharges": {
"type": "string"
},
"fundExpenses": {
"type": "object",
"properties": {
"managementExpenseRatio": {
"type": "string"
},
"tradingExpenseRatio": {
"type": "string"
},
"totalExpenses": {
"type": "string"
}
}
},
"trailingCommissions": {
"type": "string"
}
}
},
"withdrawalRights": {
"type": "object",
"properties": {
"timeLimit": {
"type": "string"
},
"conditions": {
"type": "array",
"items": {
"type": "string"
}
}
}
},
"contactInformation": {
"type": "object",
"properties": {
"companyName": {
"type": "string"
},
"address": {
"type": "string"
},
"phone": {
"type": "string"
},
"email": {
"type": "string"
},
"website": {
"type": "string"
}
}
},
"yearByYearReturns": {
"type": "array",
"items": {
"type": "object",
"properties": {
"year": {
"type": "string"
},
"return": {
"type": "string"
}
}
}
},
"bestWorstReturns": {
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string"
},
"return": {
"type": "string"
},
"date": {
"type": "string"
},
"investmentValue": {
"type": "string"
}
}
}
},
"averageReturn": {
"type": "string"
},
"targetInvestors": {
"type": "array",
"items": {
"type": "string"
}
},
"taxInformation": {
"type": "string"
}
}
}
}
}
Feel free to contribute and join our Discord server to discuss with us improvements and give us suggestions!
Please see the contributing guidelines.
FAQs
Library for extracting schemas and building ontologies from documents using LLM
We found that scrapontology demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Add secure dependency scanning to Claude Desktop with Socket MCP, a one-click extension that keeps your coding conversations safe from malicious packages.
Product
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
Application Security
/Security News
Socket CEO Feross Aboukhadijeh and a16z partner Joel de la Garza discuss vibe coding, AI-driven software development, and how the rise of LLMs, despite their risks, still points toward a more secure and innovative future.