
Security News
Static vs. Runtime Reachability: Insights from Latioβs On the Record Podcast
The Latio podcast explores how static and runtime reachability help teams prioritize exploitable vulnerabilities and streamline AppSec workflows.
pdfsp
is a Python package that extracts tables from PDF files and saves them to Excel. It also provides a simple Streamlit app for interactive viewing of the extracted data.
pdfplumber
pandas
DataFrames.xlsx
Excel files using openpyxl
streamlit
Make sure you're using Python 3.10 or newer, then install with:
pip install pdfsp -U
# pdf.py
from pdfsp import extract_tables, Options
# Define extraction options
source_folder = "."
output_folder = "output"
combine_tables = True
options = Options(
source_folder=source_folder,
output_folder=output_folder,
combine=combine_tables
)
# Run the table extraction
extract_tables(options)
# Extract all tables from all PDF files in the current folder and save them to the current folder
pdfsp . .
# Extract and COMBINE large tables (spanning multiple pages) into single files, saved to the current folder
pdfsp . . --combine
# Extract and COMBINE tables, skipping the first row of each table (e.g., header rows)
pdfsp . . --combine --skiprows=1
# Extract all tables from PDF files in 'someFolder' and save them to 'SomeOutFolder'
pdfsp someFolder SomeOutFolder
# Extract all tables from 'some.pdf' and save them to the current folder
pdfsp some.pdf .
# Extract all tables from 'some.pdf' and save them to 'toThisFolder'
pdfsp some.pdf toThisFolder
=== π Extraction Summary Report ===
β
Successful Files: 3
- pdfs/report1.pdf β ποΈ 5 tables extracted
- pdfs/summary2.pdf β ποΈ 3 tables extracted
- pdfs/report2.pdf β ποΈ 7 tables extracted
β Failed Files: 1
- pdfs/corrupted.pdf
β οΈ Some files failed to process. See details above.
FAQs
Extracts data from PDF files and saves it to Excel files.
We found that pdfsp demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The Latio podcast explores how static and runtime reachability help teams prioritize exploitable vulnerabilities and streamline AppSec workflows.
Security News
The latest Opengrep releases add Apex scanning, precision rule tuning, and performance gains for open source static code analysis.
Security News
npm now supports Trusted Publishing with OIDC, enabling secure package publishing directly from CI/CD workflows without relying on long-lived tokens.