
Security News
Risky Biz Podcast: Making Reachability Analysis Work in Real-World Codebases
This episode explores the hard problem of reachability analysis, from static analysis limits to handling dynamic languages and massive dependency trees.
Examine all PDF files in lookup directories, remove passwords (if present), rename them, and copy them to a new directory using regular expressions.
gem install pdfh
You need to install pdf handling dependencies in order to use this gem.
brew install qpdf xpdf # < for pdftotext
sudo dnf install -y qpdf poppler-utils
sudo pacman -S qpdf poppler
After installing this gem, create your configuration file in one of the following directories:
~/.config/pdfh.yml
~/pdfh.yml
PDFH_CONFIG_FILE
environment variableExample configuration:
---
lookup_dirs: # Directories where all PDFs will be analyzed
- ~/Downloads
destination_base_path: ~/PDFs # Directory where all matching documents will be copied (MUST exist)
document_types:
- name: My Bank # Description (type)
re_file: '.*MyBankReg\.pdf' # Regular expression to match its filename
re_date: '\d{1,2} de (\w+) de (\d+)' # Date regular expression
pwd: base64_encoded # [OPTIONAL] Password if the document is protected
store_path: "{year}/bank_docs" # Relative path to copy this document
name_template: '{period} {subtype}' # Template for new filename when copied
sub_types: # [OPTIONAL] In case your need an extra category
- name: AccountX # Regular expression to match this subtype
re_date: '\d{1,2} de (\w+)' # [OPTIONAL] Date regular expression
month_offset: -1 # [OPTIONAL] Integer (signed) value to adjust month
zip_types: # [OPTIONAL] Zip files to be processed BEFORE the PDFs
- name: My Bank 2 # Description
re_file: 'Document_MR5664_\d+_\d+.zip' # Regular expression to match its filename
pwd: base64_encoded # [OPTIONAL] Password if the document is protected
[!CAUTION]
pwd
is not encrypted, so be careful with this option. It is stored as a base64 string as a very thin layer of obfuscation. You can useecho -n 'password' | base64
to encode your password.
Store Path and Name Template supported placeholders:
Placeholder | Description | Example |
---|---|---|
{original} | Original filename | MyBankDocument2.pdf |
{period} | Year-Month | 2022-01 |
{year} | Year | 2022 |
{month} | Month | 01 |
{type} | Document type name | My Bank |
{subtype} | Sub type name | AccountX |
{extra} | day if captured/matched | 01 |
period
, year
, month
and {extra}
are calculated from the date captured by the regular expression.
Date text | RegEx | Captured |
---|---|---|
01/02/2025 | (?<d>\d{2}\/(?<m>\d{2})\/(?<y>\d{4}) | d: 01 m: 02 y: 2025 |
072025 - | (?<m>\d{2})(?<y>\d{4}) - | m: 07 y: 2025 |
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run rake install
. To release a new version, run rake bump
, and then run rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
rake install
# step by step
build pdfh.gemspec
gem install pdfh-*
To release a new version, run:
rake bump
rake release
This will create a git tag for the version, push git commits and tags, and upload the .gem
file to rubygems.org.
npm install -g @commitlint/cli @commitlint/config-conventional
commitlint --from origin --to @
Bug reports and pull requests are welcome on GitHub at https://github.com/iax7/pdfh. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
The gem is available as open source under the terms of the MIT License.
Everyone interacting in the Pdfh project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.
FAQs
Unknown package
We found that pdfh demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
This episode explores the hard problem of reachability analysis, from static analysis limits to handling dynamic languages and massive dependency trees.
Security News
/Research
Malicious Nx npm versions stole secrets and wallet info using AI CLI tools; Socket’s AI scanner detected the supply chain attack and flagged the malware.
Security News
CISA’s 2025 draft SBOM guidance adds new fields like hashes, licenses, and tool metadata to make software inventories more actionable.