This module provides regular expressions according to RFC 3986 "Uniform Resource Identifier (URI): Generic Syntax" <http://tools.ietf.org/html/rfc3986>
_ and RFC 3987 "Internationalized Resource Identifiers (IRIs)" <http://tools.ietf.org/html/rfc3987>
_, and
utilities for composition and relative resolution of references.
API
match (string, rule='IRI_reference')
Convenience function for checking if string
matches a specific rule.
Returns a match object or None::
>>> assert match('%C7X', 'pct_encoded') is None
>>> assert match('%C7', 'pct_encoded')
>>> assert match('%c7', 'pct_encoded')
parse (string, rule='IRI_reference')
Parses string
according to rule
into a dict of subcomponents.
If `rule` is None, parse an IRI_reference `without validation
<http://tools.ietf.org/html/rfc3986#appendix-B>`_.
If regex_ is available, any rule is supported; with re_, `rule` must be
'IRI_reference' or some special case thereof ('IRI', 'absolute_IRI',
'irelative_ref', 'irelative_part', 'URI_reference', 'URI', 'absolute_URI',
'relative_ref', 'relative_part'). ::
>>> d = parse('http://tools.ietf.org/html/rfc3986#appendix-A',
... rule='URI')
>>> assert all([ d['scheme'] == 'http',
... d['authority'] == 'tools.ietf.org',
... d['path'] == '/html/rfc3986',
... d['query'] == None,
... d['fragment'] == 'appendix-A' ])
compose (**parts)
Returns an URI composed_ from named parts.
.. _composed: http://tools.ietf.org/html/rfc3986#section-5.3
resolve (base, uriref, strict=True, return_parts=False)
Resolves_ an URI reference
relative to a base
URI.
`Test cases <http://tools.ietf.org/html/rfc3986#section-5.4>`_::
>>> base = resolve.test_cases_base
>>> for relative, resolved in resolve.test_cases.items():
... assert resolve(base, relative) == resolved
If `return_parts` is True, returns a dict of named parts instead of
a string.
Examples::
>>> assert resolve('urn:rootless', '../../name') == 'urn:name'
>>> assert resolve('urn:root/less', '../../name') == 'urn:/name'
>>> assert resolve('http://a/b', 'http:g') == 'http:g'
>>> assert resolve('http://a/b', 'http:g', strict=False) == 'http://a/g'
.. _Resolves: http://tools.ietf.org/html/rfc3986#section-5.2
patterns
A dict of regular expressions with useful group names.
Compilable (with regex_ only) without need for any particular compilation
flag.
[bmp_][u]patterns[_no_names]
Alternative versions of patterns
.
[u]nicode strings without group names for the re_ module.
BMP only for narrow builds.
get_compiled_pattern (rule, flags=0)
Returns a compiled pattern object for a rule name or template string.
Usage for validation::
>>> uri = get_compiled_pattern('^%(URI)s$')
>>> assert uri.match('http://tools.ietf.org/html/rfc3986#appendix-A')
>>> assert not get_compiled_pattern('^%(relative_ref)s$').match('#f#g')
>>> from unicodedata import lookup
>>> smp = 'urn:' + lookup('OLD ITALIC LETTER A') # U+00010300
>>> assert not uri.match(smp)
>>> m = get_compiled_pattern('^%(IRI)s$').match(smp)
On narrow builds, non-BMP characters are (incorrectly) excluded::
>>> assert NARROW_BUILD == (not m)
For parsing, some subcomponents are captured in named groups (*only if*
regex_ is available, otherwise see `parse`)::
>>> match = uri.match('http://tools.ietf.org/html/rfc3986#appendix-A')
>>> d = match.groupdict()
>>> if REGEX:
... assert all([ d['scheme'] == 'http',
... d['authority'] == 'tools.ietf.org',
... d['path'] == '/html/rfc3986',
... d['query'] == None,
... d['fragment'] == 'appendix-A' ])
>>> for r in patterns.keys():
... assert get_compiled_pattern(r)
format_patterns (**names)
Returns a dict of patterns (regular expressions) keyed by
rule names for URIs
_ and rule names for IRIs
_.
See also the module level dicts of patterns, and `get_compiled_pattern`.
To wrap a rule in a named capture group, pass it as keyword argument:
rule_name='group_name'. By default, the formatted patterns contain no
named groups.
Patterns are `str` instances (be it in python 2.x or 3.x) containing ASCII
characters only.
Caveats:
- with re_, named capture groups cannot occur on multiple branches of an
alternation
- with re_ before python 3.3, ``\u`` and ``\U`` escapes must be
preprocessed (see `issue3665 <http://bugs.python.org/issue3665>`_)
- on narrow builds, character ranges beyond BMP are not supported
.. _rule names for URIs: http://tools.ietf.org/html/rfc3986#appendix-A
.. _rule names for IRIs: http://tools.ietf.org/html/rfc3987#section-2.2
Dependencies
Some features require regex_.
This package's docstrings are tested on python 2.6, 2.7, and 3.2 to 3.6.
Note that in python<=3.2, characters beyond the Basic Multilingual Plane are
not supported on narrow builds (see issue12729 <http://bugs.python.org/issue12729>
_).
Release notes
version 1.3.8:
- fixed deprecated escape sequence
version 1.3.6:
version 1.3.4:
- allowed for lower case percent encoding
version 1.3.3:
- fixed a bug in
resolve
which left "../" at the beginning of some paths
version 1.3.2:
- convenience function
match
- patterns restricted to the BMP for narrow builds
- adapted doctests for python 3.3
- compatibility with python 2.6 (thanks to Thijs Janssen)
version 1.3.1:
- some re_ compatibility: get_compiled_pattern, parse
- dropped regex_ from setup.py requirements
version 1.3.0:
- python 3.x compatibility
- format_patterns
version 1.2.1:
.. _re: http://docs.python.org/library/re
.. _regex: http://pypi.python.org/pypi/regex
Support
This is free software. You may show your appreciation with a donation
_.
.. _donation: http://danielgerber.net/¤#Thanks-for-python-package-rfc3987