PathSpec
pathspec is a utility library for pattern matching of file paths. So far this
only includes Git's gitignore_ pattern matching.
.. _gitignore: http://git-scm.com/docs/gitignore
Tutorial
Say you have a "Projects" directory and you want to back it up, but only
certain files, and ignore others depending on certain conditions::
>>> from pathspec import PathSpec
>>> # The gitignore-style patterns for files to select, but we're including
>>> # instead of ignoring.
>>> spec_text = """
...
... # This is a comment because the line begins with a hash: "#"
...
... # Include several project directories (and all descendants) relative to
... # the current directory. To reference only a directory you must end with a
... # slash: "/"
... /project-a/
... /project-b/
... /project-c/
...
... # Patterns can be negated by prefixing with exclamation mark: "!"
...
... # Ignore temporary files beginning or ending with "~" and ending with
... # ".swp".
... !~*
... !*~
... !*.swp
...
... # These are python projects so ignore compiled python files from
... # testing.
... !*.pyc
...
... # Ignore the build directories but only directly under the project
... # directories.
... !/*/build/
...
... """
The PathSpec class provides an abstraction around pattern implementations,
and we want to compile our patterns as "gitignore" patterns. You could call it a
wrapper for a list of compiled patterns::
>>> spec = PathSpec.from_lines('gitignore', spec_text.splitlines())
If we wanted to manually compile the patterns, we can use the GitIgnoreBasicPattern
class directly. It is used in the background for "gitignore" which internally
converts patterns to regular expressions::
>>> from pathspec.patterns.gitignore.basic import GitIgnoreBasicPattern
>>> patterns = map(GitIgnoreBasicPattern, spec_text.splitlines())
>>> spec = PathSpec(patterns)
PathSpec.from_lines() is a class method which simplifies that.
If you want to load the patterns from file, you can pass the file object
directly as well::
>>> with open('patterns.list', 'r') as fh:
>>> spec = PathSpec.from_lines('gitignore', fh)
You can perform matching on a whole directory tree with::
>>> matches = set(spec.match_tree_files('path/to/directory'))
Or you can perform matching on a specific set of file paths with::
>>> matches = set(spec.match_files(file_paths))
Or check to see if an individual file matches::
>>> is_matched = spec.match_file(file_path)
There's actually two implementations of "gitignore". The basic implementation is
used by PathSpec and follows patterns as documented by gitignore_.
However, Git's behavior differs from the documented patterns. There's some
edge-cases, and in particular, Git allows including files from excluded
directories which appears to contradict the documentation. GitIgnoreSpec
handles these cases to more closely replicate Git's behavior::
>>> from pathspec import GitIgnoreSpec
>>> spec = GitIgnoreSpec.from_lines(spec_text.splitlines())
You do not specify the style of pattern for GitIgnoreSpec because it should
always use GitIgnoreSpecPattern internally.
Performance
Running lots of regular expression matches against thousands of files in Python
is slow. Alternate regular expression backends can be used to improve
performance. PathSpec and GitIgnoreSpec both accept a backend
parameter to control the backend. The default is "best" to automatically choose
the best available backend. There are currently 3 backends.
The "simple" backend is the default and it simply uses Python's re.Pattern
objects that are normally created. This can be the fastest when there's only 1
or 2 patterns.
The "hyperscan" backend uses the hyperscan_ library. Hyperscan tends to be at
least 2 times faster than "simple", and generally slower than "re2". This can be
faster than "re2" under the right conditions with pattern counts of 1-25.
The "re2" backend uses the google-re2_ library (not to be confused with the
re2 library on PyPI which is unrelated and abandoned). Google's re2 tends to
be significantly faster than "simple", and 3 times faster than "hyperscan" at
high pattern counts.
See benchmarks_backends.md_ for comparisons between native Python regular
expressions and the optional backends.
.. _benchmarks_backends.md: https://github.com/cpburnz/python-pathspec/blob/master/benchmarks_backends.md
.. _google-re2: https://pypi.org/project/google-re2/
.. _hyperscan: https://pypi.org/project/hyperscan/
FAQ
- How do I ignore files like .gitignore?
+++++++++++++++++++++++++++++++++++++++++++
GitIgnoreSpec (and PathSpec) positively match files by default. To find
the files to keep, and exclude files like .gitignore, you need to set
negate=True to flip the results::
>>> from pathspec import GitIgnoreSpec
>>> spec = GitIgnoreSpec.from_lines([...])
>>> keep_files = set(spec.match_tree_files('path/to/directory', negate=True))
>>> ignore_files = set(spec.match_tree_files('path/to/directory'))
License
pathspec is licensed under the Mozilla Public License Version 2.0. See
LICENSE or the FAQ_ for more information.
In summary, you may use pathspec with any closed or open source project
without affecting the license of the larger work so long as you:
.. _Mozilla Public License Version 2.0: http://www.mozilla.org/MPL/2.0
.. _LICENSE: LICENSE
.. _FAQ: http://www.mozilla.org/MPL/2.0/FAQ.html
Source
The source code for pathspec is available from the GitHub repo
cpburnz/python-pathspec_.
.. _cpburnz/python-pathspec: https://github.com/cpburnz/python-pathspec
Installation
pathspec is available for install through PyPI_::
pip install pathspec
pathspec can also be built from source. The following packages will be
required:
pathspec can then be built and installed with::
python -m build
pip install dist/pathspec-*-py3-none-any.whl
The following optional dependencies can be installed:
google-re2_: Enables optional "re2" backend.
hyperscan_: Enables optional "hyperscan" backend.
typing-extensions_: Improves some type hints.
.. _PyPI: http://pypi.python.org/pypi/pathspec
.. _build: https://pypi.org/project/build/
.. _typing-extensions: https://pypi.org/project/typing-extensions/
Documentation
Documentation for pathspec is available on Read the Docs_.
The full change history can be found in CHANGES.rst_ and Change History_.
An upgrade guide is available in UPGRADING.rst_ and Upgrade Guide_.
.. _CHANGES.rst: https://github.com/cpburnz/python-pathspec/blob/master/CHANGES.rst
.. _Change History: https://python-path-specification.readthedocs.io/en/stable/changes.html
.. _Read the Docs: https://python-path-specification.readthedocs.io
.. _UPGRADING.rst: https://github.com/cpburnz/python-pathspec/blob/master/UPGRADING.rst
.. _Upgrade Guide: https://python-path-specification.readthedocs.io/en/stable/upgrading.html
Other Languages
The related project pathspec-ruby_ (by highb) provides a similar library as
a Ruby gem_.
.. _pathspec-ruby: https://github.com/highb/pathspec-ruby
.. _Ruby gem: https://rubygems.org/gems/pathspec
Change History
1.1.1 (2026-04-26)
Improvements:
- Improved type checking with mypy and pyright.
Bug fixes:
- Fixed typing on
PathSpec[TPattern] to PathSpec[TPattern_co].
- Added missing variant type-hint
type[Pattern] to PathSpec.from_lines() parameter pattern_factory.
- Fixed possible type error when using
+ and += operators on PathSpec.
1.1.0 (2026-04-22)
New features:
Issue #108_: Specialize pattern type for PathSpec as PathSpec[TPattern] for better debugging of PathSpec().patterns.
Bug fixes:
Issue #93_: Git discards invalid range notation. GitIgnoreSpecPattern now discards patterns with invalid range notation like Git.
Pull #106_: Fix escape() not escaping backslash characters.
Improvements:
Pull #110_: Nicer debug print outs (and str for regex pattern).
.. _Pull #106: https://github.com/cpburnz/python-pathspec/pull/106
.. _Issue #108: https://github.com/cpburnz/python-pathspec/issues/108
.. _Pull #110: https://github.com/cpburnz/python-pathspec/pull/110
1.0.4 (2026-01-26)
Bug fixes:
Issue #103_: Using re2 fails if pyre2 is also installed.
.. _Issue #103: https://github.com/cpburnz/python-pathspec/issues/103
1.0.3 (2026-01-09)
Bug fixes:
Issue #101_: pyright strict errors with pathspec >= 1.0.0.
Issue #102_: No module named 'tomllib'.
.. _Issue #101: https://github.com/cpburnz/python-pathspec/issues/101
.. _Issue #102: https://github.com/cpburnz/python-pathspec/issues/102
1.0.2 (2026-01-07)
Bug fixes:
- Type hint
collections.abc.Callable does not properly replace typing.Callable until Python 3.9.2.
1.0.1 (2026-01-06)
Bug fixes:
Issue #100_: ValueError(f"{patterns=!r} cannot be empty.") when using black.
.. _Issue #100: https://github.com/cpburnz/python-pathspec/issues/100
1.0.0 (2026-01-05)
Major changes:
Issue #91_: Dropped support of EoL Python 3.8.
- Added concept of backends to allow for faster regular expression matching. The backend can be controlled using the
backend argument to PathSpec(), PathSpec.from_lines(), GitIgnoreSpec(), and GitIgnoreSpec.from_lines().
- Renamed "gitwildmatch" pattern back to "gitignore". The "gitignore" pattern behaves slightly differently when used with
PathSpec (gitignore as documented) than with GitIgnoreSpec (replicates Git's edge cases).
API changes:
- Breaking: protected method
pathspec.pathspec.PathSpec._match_file() (with a leading underscore) has been removed and replaced by backends. This does not affect normal usage of PathSpec or GitIgnoreSpec. Only custom subclasses will be affected. If this breaks your usage, let me know by opening an issue <https://github.com/cpburnz/python-pathspec/issues>_.
- Deprecated: "gitwildmatch" is now an alias for "gitignore".
- Deprecated:
pathspec.patterns.GitWildMatchPattern is now an alias for pathspec.patterns.gitignore.spec.GitIgnoreSpecPattern.
- Deprecated:
pathspec.patterns.gitwildmatch module has been replaced by the pathspec.patterns.gitignore package.
- Deprecated:
pathspec.patterns.gitwildmatch.GitWildMatchPattern is now an alias for pathspec.patterns.gitignore.spec.GitIgnoreSpecPattern.
- Deprecated:
pathspec.patterns.gitwildmatch.GitWildMatchPatternError is now an alias for pathspec.patterns.gitignore.GitIgnorePatternError.
- Removed:
pathspec.patterns.gitwildmatch.GitIgnorePattern has been deprecated since v0.4 (2016-07-15).
- Signature of method
pathspec.pattern.RegexPattern.match_file() has been changed from def match_file(self, file: str) -> RegexMatchResult | None to def match_file(self, file: AnyStr) -> RegexMatchResult | None to reflect usage.
- Signature of class method
pathspec.pattern.RegexPattern.pattern_to_regex() has been changed from def pattern_to_regex(cls, pattern: str) -> tuple[str, bool] to def pattern_to_regex(cls, pattern: AnyStr) -> tuple[AnyStr | None, bool | None] to reflect usage and documentation.
New features:
- Added optional "hyperscan" backend using
hyperscan_ library. It will automatically be used when installed. This dependency can be installed with pip install 'pathspec[hyperscan]'.
- Added optional "re2" backend using the
google-re2_ library. It will automatically be used when installed. This dependency can be installed with pip install 'pathspec[re2]'.
- Added optional dependency on
typing-extensions_ library to improve some type hints.
Bug fixes:
Issue #93_: Do not remove leading spaces.
Issue #95_: Matching for files inside folder does not seem to behave like .gitignore's.
Issue #98_: UnboundLocalError in RegexPattern when initialized with pattern=None.
- Type hint on return value of
pathspec.pattern.RegexPattern.match_file() to match documentation.
Improvements:
- Mark Python 3.13 and 3.14 as supported.
- No-op patterns are now filtered out when matching files, slightly improving performance.
- Fix performance regression in
iter_tree_files() from v0.10.
.. _Issue #38: https://github.com/cpburnz/python-pathspec/issues/38
.. _Issue #91: https://github.com/cpburnz/python-pathspec/issues/91
.. _Issue #93: https://github.com/cpburnz/python-pathspec/issues/93
.. _Issue #95: https://github.com/cpburnz/python-pathspec/issues/95
.. _Issue #98: https://github.com/cpburnz/python-pathspec/issues/98
.. _google-re2: https://pypi.org/project/google-re2/
.. _hyperscan: https://pypi.org/project/hyperscan/
.. _typing-extensions: https://pypi.org/project/typing-extensions/