Socket
Book a DemoInstallSign in
Socket

pyrolysate

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pyrolysate

Parser made to convert lists of emails and urls into JSON and CSV formatted files

pipPyPI
Version
0.12.0
Maintainers
1

Pyrolysate

Pyrolysate is a Python library and CLI tool for parsing and validating URLs and email addresses. It breaks down URLs and emails into their component parts, validates against IANA's official TLD list, and outputs structured data in JSON, CSV, or text format.

The library offers both a programmer-friendly API and a command-line interface, making it suitable for both development integration and quick data processing tasks. It handles single entries or large datasets efficiently using Python's generator functionality, and provides flexible input/output options including file processing with custom delimiters.

Features

URL Parsing

  • Extract scheme, subdomain, domain, TLD, port, path, query, and fragment components
  • Support for complex URL patterns including ports, queries, and fragments
  • Support for IP addresses in URLs
  • Support for both direct input and file processing via CLI or API
  • Output as JSON, CSV, or text format through CLI or API

Email Parsing

  • Extract username, mail server, and domain components
  • Support for both direct input and file processing via CLI or API
  • Output as JSON, CSV, or text format through CLI or API

Top Level Domain Validation

  • Automatic updates from IANA's official TLD list
  • Local TLD file caching for offline use
  • Fallback to common TLDs if both online and local sources fail

Flexible Input/Output

  • Process single or multiple entries
  • Support for government domain emails (.gov.tld)
  • Custom delimiters for file input
  • Multiple output formats with .txt format as default (JSON, CSV, text)
  • Pretty-printed or minified JSON output
  • Console output or file saving options
  • Memory-efficient processing of large datasets using Python generators

Developer Friendly

  • Type hints for better IDE support
  • Comprehensive docstrings
  • Modular design for easy integration
  • Command-line interface for quick testing

🚀 Installation

From PyPI

pip install pyrolysate

For Development

  • Clone the repository
git clone https://github.com/dawnandrew100/pyrolysate.git
cd pyrolysate
  • Create and activate a virtual environment
# Using hatch (recommended)
hatch env create

# Or using venv
python -m venv .venv
# Windows
.venv\Scripts\activate
# Unix/MacOS
source .venv/bin/activate
  • Install in development mode
# Using hatch
hatch run dev

# Or using pip
pip install -e .

Verify Installation

# Using hatch (recommended)
hatch run pyro -u example.com

# Or using the CLI directly
pyro -u example.com

The CLI command pyro will be available after installation. If the command isn't found, ensure Python's Scripts directory is in your PATH.

Usage

Input File Parsing

from pyrolysate import parse_input_file

Parse file with default newline delimiter

urls = parse_input_file("urls.txt")

Parse file with custom delimiter

emails = parse_input_file("emails.csv", delimiter=",")

Supported Outputs

  • JSON (prettified or minified)
  • CSV
  • Text (default)
  • File output with custom naming
  • Console output

Email Parsing

from pyrolysate import email

Parse single email

result = email.parse_email("user@example.com")

Parse multiple emails

emails = ["user1@example.com", "user2@agency.gov.uk"]
result = email.parse_email_array(emails)

Convert to JSON

json_output = email.to_json("user@example.com")
json_output = email.to_json(["user1@example.com", "user2@example.com"])

Save to JSON file

email.to_json_file("output", "user@example.com")
email.to_json_file("output", ["user1@example.com", "user2@test.org"])

Convert to CSV

csv_output = email.to_csv("user@example.com")
csv_output = email.to_csv(["user1@example.com", "user2@test.org"])

Save to CSV file

email.to_csv_file("output", "user@example.com")
email.to_csv_file("output", ["user1@example.com", "user2@test.org"])

URL Parsing

from pyrolysate import url

Parse single URL

result = url.parse_url("https://www.example.com/path?q=test#fragment")

Parse multiple URLs

urls = ["example.com", "https://www.test.org"]
result = url.parse_url_array(urls)

Convert to JSON

json_output = url.to_json("example.com")
json_output = url.to_json(["example.com", "test.org"])

Save to JSON file

url.to_json_file("output", "example.com")
url.to_json_file("output", ["example.com", "test.org"])

Convert to CSV

csv_output = url.to_csv("example.com")
csv_output = url.to_csv(["example.com", "test.org"])

Save to CSV file

url.to_csv_file("output", "example.com")
url.to_csv_file("output", ["example.com", "test.org"])

Command Line Interface

CLI help

pyro -h

Parse single URL

pyro -u example.com

Parse multiple URLs

pyro -u example1.com example2.com

Parse URLs from file (one per line by default)

pyro -u -i urls.txt

Parse URLs from CSV file with comma delimiter

pyro -u -i urls.csv -d ","

Parse multiple emails and save as JSON

pyro -e user1@example.com user2@example.com -j -o output

Parse URLs from file and save as CSV

pyro -u -i urls.txt -c -o parsed_urls

Parse emails from file with comma delimiter

pyro -e -i emails.txt -d "," -o output

Parse emails with non-prettified JSON output

pyro -e user@example.com -j -np

API Reference

Email Class

MethodParametersDescription
parse_email(email_str)email_str: strParses single email address
parse_email_array(emails)emails: list[str]Parses list of email addresses
to_json(emails, prettify=True)emails: str|list[str], prettify: boolConverts to JSON format
to_json_file(file_name, emails, prettify=True)file_name: str, emails: list[str], prettify: boolConverts and saves JSON to file
to_csv(emails)emails: str|list[str]Converts to CSV format
to_csv_file(file_name, emails)file_name: str, emails: list[str]Converts and saves CSV to file

URL Class

MethodParametersDescription
parse_url(url_str, tlds=[])url_str: str, tlds: list[str]Parses single URL
parse_url_array(urls, tlds=[])urls: list[str], tlds: list[str]Parses list of URLs
to_json(urls, prettify=True)urls: str|list[str], prettify: boolConverts to JSON format
to_json_file(file_name, urls, prettify=True)file_name: str, urls: list[str], prettify: boolConverts and saves JSON to file
to_csv(urls)urls: str|list[str]Converts to CSV format
to_csv_file(file_name, urls)file_name: str, urls: list[str]Converts and saves CSV to file
get_tld(path_to_tlds_file='tld.txt')path_to_tlds_file: str = 'tld.txt'Fetches current TLD list from IANA
local_tld_file(file_name)file_name: strFetches and stores get_tld() output as a local txt file

Miscellaneous

MethodParametersDescription
parse_input_file(input_file_name, delimiter='\n')input_file_name: str, delimiter: strParses input file into python list by delimiter

CLI Reference

ArgumentTypeValue when argument is omittedDescription
targetstrNoneEmail or URL string(s) to process
-u, --urlflagFalseSpecify URL input
-e, --emailflagFalseSpecify Email input
-i, --input_filestrNoneInput file name with extension
-o, --output_filestrNoneOutput file name without extension
-c, --csvflagFalseSave output as CSV format
-j, --jsonflagFalseSave output as JSON format
-np, --no_prettifyflagTrueTurn off prettified JSON output
-d, --delimiterstr'\n'Delimiter for input file parsing

Output Formats

Email Parse Output

FieldDescriptionExample
usernamePart before @user
mail_serverDomain before TLDgmail
domainTop-level domaincom

Example output:

{"user@gmail.com": 
    {
    "username": "user",
    "mail_server": "gmail",
    "domain": "com"
    }
}
email,username,mail_server,domain
user@gmail.com,user,gmail,com

URL Parse Output

FieldDescriptionExample
schemeProtocolhttps
subdomainDomain prefixwww
second_level_domainMain domainexample
top_level_domainDomain suffixcom
portPort number443
pathURL pathblog/post
queryQuery parametersq=test
fragmentURL fragmentsection1

Example output:

{"https://www.example.com:443/blog/post?q=test#section1": 
    {
    "scheme": "https",
    "subdomain": "www",
    "second_level_domain": "example",
    "top_level_domain": "com",
    "port": "443",
    "path": "blog/post",
    "query": "q=test",
    "fragment": "section1"
    }
}
url,scheme,subdomain,second_level_domain,top_level_domain,port,path,query,fragment
https://www.example.com:443/blog/post?q=test#section1,https,www,example,com,443,blog/post,q=test,section1

Supported Formats

Email Formats

  • Standard: example@mail.com
  • Government: example@agency.gov.uk

URL Formats

  • Basic: example.com
  • With subdomain: www.example.com
  • With scheme: https://example.com
  • With path: example.com/path/to/file.txt
  • With port: example.com:8080
  • With query: example.com/search?q=test
  • With fragment: example.com#section1
  • IP addresses: 192.168.1.1:8080
  • Government domains: agency.gov.uk
  • Full complex URLs: https://www.example.gov.uk:8080/path?q=test#section1

Outputs

  • Text file (default)
  • JSON file (prettified or minified)
  • CSV file
  • Console output

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.