pyrolysate

Package Overview

Dependencies

Maintainers

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

pyrolysate

Parser made to convert lists of emails and urls into JSON and CSV formatted files

PyPI

Version: 0.12.0

Maintainers: 1

Pyrolysate

Pyrolysate is a Python library and CLI tool for parsing and validating URLs and email addresses. It breaks down URLs and emails into their component parts, validates against IANA's official TLD list, and outputs structured data in JSON, CSV, or text format.

The library offers both a programmer-friendly API and a command-line interface, making it suitable for both development integration and quick data processing tasks. It handles single entries or large datasets efficiently using Python's generator functionality, and provides flexible input/output options including file processing with custom delimiters.

Features

URL Parsing

Extract scheme, subdomain, domain, TLD, port, path, query, and fragment components
Support for complex URL patterns including ports, queries, and fragments
Support for IP addresses in URLs
Support for both direct input and file processing via CLI or API
Output as JSON, CSV, or text format through CLI or API

Email Parsing

Extract username, mail server, and domain components
Support for both direct input and file processing via CLI or API
Output as JSON, CSV, or text format through CLI or API

Top Level Domain Validation

Automatic updates from IANA's official TLD list
Local TLD file caching for offline use
Fallback to common TLDs if both online and local sources fail

Flexible Input/Output

Process single or multiple entries
Support for government domain emails (.gov.tld)
Custom delimiters for file input
Multiple output formats with .txt format as default (JSON, CSV, text)
Pretty-printed or minified JSON output
Console output or file saving options
Memory-efficient processing of large datasets using Python generators

Developer Friendly

Type hints for better IDE support
Comprehensive docstrings
Modular design for easy integration
Command-line interface for quick testing

🚀 Installation

From PyPI

pip install pyrolysate

For Development

Clone the repository

git clone https://github.com/dawnandrew100/pyrolysate.git
cd pyrolysate

Create and activate a virtual environment

# Using hatch (recommended)
hatch env create

# Or using venv
python -m venv .venv
# Windows
.venv\Scripts\activate
# Unix/MacOS
source .venv/bin/activate

Install in development mode

# Using hatch
hatch run dev

# Or using pip
pip install -e .

Verify Installation

# Using hatch (recommended)
hatch run pyro -u example.com

# Or using the CLI directly
pyro -u example.com

The CLI command pyro will be available after installation. If the command isn't found, ensure Python's Scripts directory is in your PATH.

Usage

Input File Parsing

from pyrolysate import parse_input_file

Parse file with default newline delimiter

urls = parse_input_file("urls.txt")

Parse file with custom delimiter

emails = parse_input_file("emails.csv", delimiter=",")

Supported Outputs

JSON (prettified or minified)
CSV
Text (default)
File output with custom naming
Console output

Email Parsing

from pyrolysate import email

Parse single email

result = email.parse_email("user@example.com")

Parse multiple emails

emails = ["user1@example.com", "user2@agency.gov.uk"]
result = email.parse_email_array(emails)

Convert to JSON

json_output = email.to_json("user@example.com")
json_output = email.to_json(["user1@example.com", "user2@example.com"])

Save to JSON file

email.to_json_file("output", "user@example.com")
email.to_json_file("output", ["user1@example.com", "user2@test.org"])

Convert to CSV

csv_output = email.to_csv("user@example.com")
csv_output = email.to_csv(["user1@example.com", "user2@test.org"])

Save to CSV file

email.to_csv_file("output", "user@example.com")
email.to_csv_file("output", ["user1@example.com", "user2@test.org"])

URL Parsing

from pyrolysate import url

Parse single URL

result = url.parse_url("https://www.example.com/path?q=test#fragment")

Parse multiple URLs

urls = ["example.com", "https://www.test.org"]
result = url.parse_url_array(urls)

Convert to JSON

json_output = url.to_json("example.com")
json_output = url.to_json(["example.com", "test.org"])

Save to JSON file

url.to_json_file("output", "example.com")
url.to_json_file("output", ["example.com", "test.org"])

Convert to CSV

csv_output = url.to_csv("example.com")
csv_output = url.to_csv(["example.com", "test.org"])

Save to CSV file

url.to_csv_file("output", "example.com")
url.to_csv_file("output", ["example.com", "test.org"])

Command Line Interface

CLI help

pyro -h

Parse single URL

pyro -u example.com

Parse multiple URLs

pyro -u example1.com example2.com

Parse URLs from file (one per line by default)

pyro -u -i urls.txt

Parse URLs from CSV file with comma delimiter

pyro -u -i urls.csv -d ","

Parse multiple emails and save as JSON

pyro -e user1@example.com user2@example.com -j -o output

Parse URLs from file and save as CSV

pyro -u -i urls.txt -c -o parsed_urls

Parse emails from file with comma delimiter

pyro -e -i emails.txt -d "," -o output

Parse emails with non-prettified JSON output

pyro -e user@example.com -j -np

API Reference

Email Class

Method	Parameters	Description
`parse_email(email_str)`	`email_str: str`	Parses single email address
`parse_email_array(emails)`	`emails: list[str]`	Parses list of email addresses
`to_json(emails, prettify=True)`	`emails: str\|list[str]`, `prettify: bool`	Converts to JSON format
`to_json_file(file_name, emails, prettify=True)`	`file_name: str`, `emails: list[str]`, `prettify: bool`	Converts and saves JSON to file
`to_csv(emails)`	`emails: str\|list[str]`	Converts to CSV format
`to_csv_file(file_name, emails)`	`file_name: str`, `emails: list[str]`	Converts and saves CSV to file

URL Class

Method	Parameters	Description
`parse_url(url_str, tlds=[])`	`url_str: str`, `tlds: list[str]`	Parses single URL
`parse_url_array(urls, tlds=[])`	`urls: list[str]`, `tlds: list[str]`	Parses list of URLs
`to_json(urls, prettify=True)`	`urls: str\|list[str]`, `prettify: bool`	Converts to JSON format
`to_json_file(file_name, urls, prettify=True)`	`file_name: str`, `urls: list[str]`, `prettify: bool`	Converts and saves JSON to file
`to_csv(urls)`	`urls: str\|list[str]`	Converts to CSV format
`to_csv_file(file_name, urls)`	`file_name: str`, `urls: list[str]`	Converts and saves CSV to file
`get_tld(path_to_tlds_file='tld.txt')`	`path_to_tlds_file: str = 'tld.txt'`	Fetches current TLD list from IANA
`local_tld_file(file_name)`	`file_name: str`	Fetches and stores `get_tld()` output as a local txt file

Miscellaneous

Method	Parameters	Description
`parse_input_file(input_file_name, delimiter='\n')`	`input_file_name: str`, `delimiter: str`	Parses input file into python list by delimiter

CLI Reference

Argument	Type	Value when argument is omitted	Description
`target`	`str`	`None`	Email or URL string(s) to process
`-u`, `--url`	`flag`	`False`	Specify URL input
`-e`, `--email`	`flag`	`False`	Specify Email input
`-i`, `--input_file`	`str`	`None`	Input file name with extension
`-o`, `--output_file`	`str`	`None`	Output file name without extension
`-c`, `--csv`	`flag`	`False`	Save output as CSV format
`-j`, `--json`	`flag`	`False`	Save output as JSON format
`-np`, `--no_prettify`	`flag`	`True`	Turn off prettified JSON output
`-d`, `--delimiter`	`str`	`'\n'`	Delimiter for input file parsing

Output Formats

Email Parse Output

Field	Description	Example
username	Part before @	user
mail_server	Domain before TLD	gmail
domain	Top-level domain	com

Example output:

{"user@gmail.com": 
    {
    "username": "user",
    "mail_server": "gmail",
    "domain": "com"
    }
}

email,username,mail_server,domain
user@gmail.com,user,gmail,com

URL Parse Output

Field	Description	Example
scheme	Protocol	https
subdomain	Domain prefix	www
second_level_domain	Main domain	example
top_level_domain	Domain suffix	com
port	Port number	443
path	URL path	blog/post
query	Query parameters	q=test
fragment	URL fragment	section1

Example output:

{"https://www.example.com:443/blog/post?q=test#section1": 
    {
    "scheme": "https",
    "subdomain": "www",
    "second_level_domain": "example",
    "top_level_domain": "com",
    "port": "443",
    "path": "blog/post",
    "query": "q=test",
    "fragment": "section1"
    }
}

url,scheme,subdomain,second_level_domain,top_level_domain,port,path,query,fragment
https://www.example.com:443/blog/post?q=test#section1,https,www,example,com,443,blog/post,q=test,section1

Supported Formats

Email Formats

Standard: example@mail.com
Government: example@agency.gov.uk

URL Formats

Basic: example.com
With subdomain: www.example.com
With scheme: https://example.com
With path: example.com/path/to/file.txt
With port: example.com:8080
With query: example.com/search?q=test
With fragment: example.com#section1
IP addresses: 192.168.1.1:8080
Government domains: agency.gov.uk
Full complex URLs: https://www.example.gov.uk:8080/path?q=test#section1

Outputs

Text file (default)
JSON file (prettified or minified)
CSV file
Console output

FAQs

What is pyrolysate?

Is pyrolysate well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

pyrolysate

Pyrolysate

Features

URL Parsing

Email Parsing

Top Level Domain Validation

Flexible Input/Output

Developer Friendly

🚀 Installation

From PyPI

For Development

Verify Installation

Usage

Input File Parsing

Parse file with default newline delimiter

Parse file with custom delimiter

Supported Outputs

Email Parsing

Parse single email

Parse multiple emails

Convert to JSON

Save to JSON file

Convert to CSV

Save to CSV file

URL Parsing

Parse single URL

Parse multiple URLs

Convert to JSON

Save to JSON file

Convert to CSV

Save to CSV file

Command Line Interface

CLI help

Parse single URL

Parse multiple URLs

Parse URLs from file (one per line by default)

Parse URLs from CSV file with comma delimiter

Parse multiple emails and save as JSON

Parse URLs from file and save as CSV

Parse emails from file with comma delimiter

Parse emails with non-prettified JSON output

API Reference

Email Class

URL Class

Miscellaneous

CLI Reference

Output Formats

Email Parse Output

URL Parse Output

Supported Formats

Email Formats

URL Formats

Outputs

Related posts

DuckDB npm Account Compromised in Continuing Supply Chain Attack

MCP Steering Committee Launches Official MCP Registry in Preview