
Security News
Crates.io Implements Trusted Publishing Support
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Original project is https://github.com/timClicks/slate . It is not supported Python3. I thank the original writer @timClicks and other contributors.
Slate is a Python package that simplifies the process of extracting text from PDF files. It depends on the PDFMiner package.
Slate provides one class, PDF. PDF takes a file-like object and will extract all text from the document, presentating each page as a string of text::
>>> with open('example.pdf', 'rb') as f:
... doc = slate.PDF(f)
...
>>> doc
[..., ..., ...]
>>> doc[1]
'Text from page 2...'
If your pdf is password protected, pass the password as the second argument::
>>> with open('secrets.pdf', 'rb') as f:
... doc = slate.PDF(f, 'password')
...
>>> doc[0]
"My mother doesn't know this, but..."
If you would like access to the images, font files and other information, then take some time to learn the PDFMiner API.
FAQs
Extract text from PDF documents easily.
We found that slate3k demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Research
/Security News
Undocumented protestware found in 28 npm packages disrupts UI for Russian-language users visiting Russian and Belarusian domains.
Research
/Security News
North Korean threat actors deploy 67 malicious npm packages using the newly discovered XORIndex malware loader.