Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
A Python implementation of RFC 3986
_ including validation and authority
parsing.
Use pip to install rfc3986
like so::
pip install rfc3986
Apache License Version 2.0
_
The following are the two most common use cases envisioned for rfc3986
.
Replacing urlparse
To parse a URI and receive something very similar to the standard library's
``urllib.parse.urlparse``
.. code-block:: python
from rfc3986 import urlparse
ssh = urlparse('ssh://user@git.openstack.org:29418/openstack/glance.git')
print(ssh.scheme) # => ssh
print(ssh.userinfo) # => user
print(ssh.params) # => None
print(ssh.port) # => 29418
To create a copy of it with new pieces you can use ``copy_with``:
.. code-block:: python
new_ssh = ssh.copy_with(
scheme='https'
userinfo='',
port=443,
path='/openstack/glance'
)
print(new_ssh.scheme) # => https
print(new_ssh.userinfo) # => None
# etc.
Strictly Parsing a URI and Applying Validation
To parse a URI into a convenient named tuple, you can simply:
.. code-block:: python
from rfc3986 import uri_reference
example = uri_reference('http://example.com')
email = uri_reference('mailto:user@domain.com')
ssh = uri_reference('ssh://user@git.openstack.org:29418/openstack/keystone.git')
With a parsed URI you can access data about the components:
.. code-block:: python
print(example.scheme) # => http
print(email.path) # => user@domain.com
print(ssh.userinfo) # => user
print(ssh.host) # => git.openstack.org
print(ssh.port) # => 29418
It can also parse URIs with unicode present:
.. code-block:: python
uni = uri_reference(b'http://httpbin.org/get?utf8=\xe2\x98\x83') # ☃
print(uni.query) # utf8=%E2%98%83
With a parsed URI you can also validate it:
.. code-block:: python
if ssh.is_valid():
subprocess.call(['git', 'clone', ssh.unsplit()])
You can also take a parsed URI and normalize it:
.. code-block:: python
mangled = uri_reference('hTTp://exAMPLe.COM')
print(mangled.scheme) # => hTTp
print(mangled.authority) # => exAMPLe.COM
normal = mangled.normalize()
print(normal.scheme) # => http
print(mangled.authority) # => example.com
But these two URIs are (functionally) equivalent:
.. code-block:: python
if normal == mangled:
webbrowser.open(normal.unsplit())
Your paths, queries, and fragments are safe with us though:
.. code-block:: python
mangled = uri_reference('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth')
normal = mangled.normalize()
assert normal == 'hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth'
assert normal == 'http://example.com/Some/reallY/biZZare/pAth'
assert normal != 'http://example.com/some/really/bizzare/path'
If you do not actually need a real reference object and just want to normalize your URI:
.. code-block:: python
from rfc3986 import normalize_uri
assert (normalize_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth') ==
'http://example.com/Some/reallY/biZZare/pAth')
You can also very simply validate a URI:
.. code-block:: python
from rfc3986 import is_valid_uri
assert is_valid_uri('hTTp://exAMPLe.COM/Some/reallY/biZZare/pAth')
Requiring Components
You can validate that a particular string is a valid URI and require
independent components:
.. code-block:: python
from rfc3986 import is_valid_uri
assert is_valid_uri('http://localhost:8774/v2/resource',
require_scheme=True,
require_authority=True,
require_path=True)
# Assert that a mailto URI is invalid if you require an authority
# component
assert is_valid_uri('mailto:user@example.com', require_authority=True) is False
If you have an instance of a ``URIReference``, you can pass the same arguments
to ``URIReference#is_valid``, e.g.,
.. code-block:: python
from rfc3986 import uri_reference
http = uri_reference('http://localhost:8774/v2/resource')
assert uri.is_valid(require_scheme=True,
require_authority=True,
require_path=True)
# Assert that a mailto URI is invalid if you require an authority
# component
mailto = uri_reference('mailto:user@example.com')
assert uri.is_valid(require_authority=True) is False
Alternatives
------------
- `rfc3987 <https://pypi.python.org/pypi/rfc3987/1.3.4>`_
This is a direct competitor to this library, with extra features,
licensed under the GPL.
- `uritools <https://pypi.python.org/pypi/uritools/0.5.1>`_
This can parse URIs in the manner of RFC 3986 but provides no validation and
only recently added Python 3 support.
- Standard library's `urlparse`/`urllib.parse`
The functions in these libraries can only split a URI (valid or not) and
provide no validation.
Contributing
------------
This project follows and enforces the Python Software Foundation's `Code of
Conduct <https://www.python.org/psf/codeofconduct/>`_.
If you would like to contribute but do not have a bug or feature in mind, feel
free to email Ian and find out how you can help.
The git repository for this project is maintained at
https://github.com/python-hyper/rfc3986
.. _RFC 3986: https://datatracker.ietf.org/doc/html/rfc3986/
.. _Apache License Version 2.0: https://www.apache.org/licenses/LICENSE-2.0
FAQs
Validating URI References per RFC 3986
We found that rfc3986 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.