You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP →

Book a Demo Install Sign in

url-normalize

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

url-normalize

URL normalization for Python

2.2.1

PyPI

Maintainers: 1

url-normalize

A Python library for standardizing and normalizing URLs with support for internationalized domain names (IDN).

Introduction

url-normalize provides a robust URI normalization function that:

Takes care of IDN domains.
Always provides the URI scheme in lowercase characters.
Always provides the host, if any, in lowercase characters.
Only performs percent-encoding where it is essential.
Always uses uppercase A-through-F characters when percent-encoding.
Prevents dot-segments appearing in non-relative URI paths.
For schemes that define a default authority, uses an empty authority if the default is desired.
For schemes that define an empty path to be equivalent to a path of "/", uses "/".
For schemes that define a port, uses an empty port if the default is desired
Ensures all portions of the URI are utf-8 encoded NFC from Unicode strings

Inspired by Sam Ruby's urlnorm.py

Features

IDN Support: Full internationalized domain name handling
Configurable Defaults:
- Customizable default scheme (https by default)
- Configurable default domain for absolute paths
Query Parameter Control:
- Parameter filtering with allowlists
- Support for domain-specific parameter rules
Versatile URL Handling:
- Empty string URLs
- Double slash URLs (//domain.tld)
- Shebang (#!) URLs
Developer Friendly:
- Cross-version Python compatibility (3.8+)
- 100% test coverage
- Modern type hints and string handling

Installation

pip install url-normalize

Usage

Python API

from url_normalize import url_normalize

# Basic normalization (uses https by default)
print(url_normalize("www.foo.com:80/foo"))
# Output: https://www.foo.com/foo

# With custom default scheme
print(url_normalize("www.foo.com/foo", default_scheme="http"))
# Output: http://www.foo.com/foo

# With query parameter filtering enabled
print(url_normalize("www.google.com/search?q=test&utm_source=test", filter_params=True))
# Output: https://www.google.com/search?q=test

# With custom parameter allowlist as a dict
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist={"example.com": ["page", "id"]}
))
# Output: https://example.com?page=1&id=123

# With custom parameter allowlist as a list
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist=["page", "id"]
))
# Output: https://example.com?page=1&id=123

# With default domain for absolute paths
print(url_normalize("/images/logo.png", default_domain="example.com"))
# Output: https://example.com/images/logo.png

# With default domain and custom scheme
print(url_normalize("/images/logo.png", default_scheme="http", default_domain="example.com"))
# Output: http://example.com/images/logo.png

Command-line Usage

You can also use url-normalize from the command line:

$ url-normalize "www.foo.com:80/foo"
# Output: https://www.foo.com/foo

# With custom default scheme
$ url-normalize -s http "www.foo.com/foo"
# Output: http://www.foo.com/foo

# With query parameter filtering
$ url-normalize -f "www.google.com/search?q=test&utm_source=test"
# Output: https://www.google.com/search?q=test

# With custom allowlist
$ url-normalize -f -p page,id "example.com?page=1&id=123&ref=test"
# Output: https://example.com/?page=1&id=123

# With default domain for absolute paths
$ url-normalize -d example.com "/images/logo.png"
# Output: https://example.com/images/logo.png

# With default domain and custom scheme
$ url-normalize -d example.com -s http "/images/logo.png"
# Output: http://example.com/images/logo.png

# Via uv tool/uvx
$ uvx url-normalize www.foo.com:80/foo
# Output: https://www.foo.com:80/foo

Documentation

For a complete history of changes, see CHANGELOG.md.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

Keywords

FAQs

What is url-normalize?

Is url-normalize well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

url-normalize

url-normalize

Table of Contents

Introduction

Features

Installation

Usage

Python API

Command-line Usage

Documentation

Contributing

License

Keywords

Related posts

url-normalize

url-normalize

Table of Contents

Introduction

Features

Installation

Usage

Python API

Command-line Usage

Documentation

Contributing

License

Keywords

Related posts

Meet Socket at Black Hat and DEF CON 2025 in Las Vegas

Open Source CAI Framework Handles Pen Testing Tasks up to 3,600× Faster Than Humans