Launch Week Day 5: Introducing Reachability for PHP.Learn More
Socket
Book a DemoSign in
Socket

pathspec

Package Overview
Dependencies
Maintainers
1
Versions
38
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pathspec

Utility library for gitignore style pattern matching of file paths.

pipPyPI
Version
0.12.1
Maintainers
1

PathSpec

pathspec is a utility library for pattern matching of file paths. So far this only includes Git's gitignore_ pattern matching.

.. _gitignore: http://git-scm.com/docs/gitignore

Tutorial

Say you have a "Projects" directory and you want to back it up, but only certain files, and ignore others depending on certain conditions::

>>> from pathspec import PathSpec
>>> # The gitignore-style patterns for files to select, but we're including
>>> # instead of ignoring.
>>> spec_text = """
...
... # This is a comment because the line begins with a hash: "#"
...
... # Include several project directories (and all descendants) relative to
... # the current directory. To reference only a directory you must end with a
... # slash: "/"
... /project-a/
... /project-b/
... /project-c/
...
... # Patterns can be negated by prefixing with exclamation mark: "!"
...
... # Ignore temporary files beginning or ending with "~" and ending with
... # ".swp".
... !~*
... !*~
... !*.swp
...
... # These are python projects so ignore compiled python files from
... # testing.
... !*.pyc
...
... # Ignore the build directories but only directly under the project
... # directories.
... !/*/build/
...
... """

The PathSpec class provides an abstraction around pattern implementations, and we want to compile our patterns as "gitignore" patterns. You could call it a wrapper for a list of compiled patterns::

>>> spec = PathSpec.from_lines('gitignore', spec_text.splitlines())

If we wanted to manually compile the patterns, we can use the GitIgnoreBasicPattern class directly. It is used in the background for "gitignore" which internally converts patterns to regular expressions::

>>> from pathspec.patterns.gitignore.basic import GitIgnoreBasicPattern
>>> patterns = map(GitIgnoreBasicPattern, spec_text.splitlines())
>>> spec = PathSpec(patterns)

PathSpec.from_lines() is a class method which simplifies that.

If you want to load the patterns from file, you can pass the file object directly as well::

>>> with open('patterns.list', 'r') as fh:
>>>     spec = PathSpec.from_lines('gitignore', fh)

You can perform matching on a whole directory tree with::

>>> matches = set(spec.match_tree_files('path/to/directory'))

Or you can perform matching on a specific set of file paths with::

>>> matches = set(spec.match_files(file_paths))

Or check to see if an individual file matches::

>>> is_matched = spec.match_file(file_path)

There's actually two implementations of "gitignore". The basic implementation is used by PathSpec and follows patterns as documented by gitignore_. However, Git's behavior differs from the documented patterns. There's some edge-cases, and in particular, Git allows including files from excluded directories which appears to contradict the documentation. GitIgnoreSpec handles these cases to more closely replicate Git's behavior::

>>> from pathspec import GitIgnoreSpec
>>> spec = GitIgnoreSpec.from_lines(spec_text.splitlines())

You do not specify the style of pattern for GitIgnoreSpec because it should always use GitIgnoreSpecPattern internally.

Performance

Running lots of regular expression matches against thousands of files in Python is slow. Alternate regular expression backends can be used to improve performance. PathSpec and GitIgnoreSpec both accept a backend parameter to control the backend. The default is "best" to automatically choose the best available backend. There are currently 3 backends.

The "simple" backend is the default and it simply uses Python's re.Pattern objects that are normally created. This can be the fastest when there's only 1 or 2 patterns.

The "hyperscan" backend uses the hyperscan_ library. Hyperscan tends to be at least 2 times faster than "simple", and generally slower than "re2". This can be faster than "re2" under the right conditions with pattern counts of 1-25.

The "re2" backend uses the google-re2_ library (not to be confused with the re2 library on PyPI which is unrelated and abandoned). Google's re2 tends to be significantly faster than "simple", and 3 times faster than "hyperscan" at high pattern counts.

See benchmarks_backends.md_ for comparisons between native Python regular expressions and the optional backends.

.. _benchmarks_backends.md: https://github.com/cpburnz/python-pathspec/blob/master/benchmarks_backends.md .. _google-re2: https://pypi.org/project/google-re2/ .. _hyperscan: https://pypi.org/project/hyperscan/

FAQ

  • How do I ignore files like .gitignore? +++++++++++++++++++++++++++++++++++++++++++

GitIgnoreSpec (and PathSpec) positively match files by default. To find the files to keep, and exclude files like .gitignore, you need to set negate=True to flip the results::

>>> from pathspec import GitIgnoreSpec
>>> spec = GitIgnoreSpec.from_lines([...])
>>> keep_files = set(spec.match_tree_files('path/to/directory', negate=True))
>>> ignore_files = set(spec.match_tree_files('path/to/directory'))

License

pathspec is licensed under the Mozilla Public License Version 2.0. See LICENSE or the FAQ_ for more information.

In summary, you may use pathspec with any closed or open source project without affecting the license of the larger work so long as you:

  • give credit where credit is due,

  • and release any custom changes made to pathspec.

.. _Mozilla Public License Version 2.0: http://www.mozilla.org/MPL/2.0 .. _LICENSE: LICENSE .. _FAQ: http://www.mozilla.org/MPL/2.0/FAQ.html

Source

The source code for pathspec is available from the GitHub repo cpburnz/python-pathspec_.

.. _cpburnz/python-pathspec: https://github.com/cpburnz/python-pathspec

Installation

pathspec is available for install through PyPI_::

pip install pathspec

pathspec can also be built from source. The following packages will be required:

  • build_ (>=0.6.0)

pathspec can then be built and installed with::

python -m build
pip install dist/pathspec-*-py3-none-any.whl

The following optional dependencies can be installed:

  • google-re2_: Enables optional "re2" backend.
  • hyperscan_: Enables optional "hyperscan" backend.
  • typing-extensions_: Improves some type hints.

.. _PyPI: http://pypi.python.org/pypi/pathspec .. _build: https://pypi.org/project/build/ .. _typing-extensions: https://pypi.org/project/typing-extensions/

Documentation

Documentation for pathspec is available on Read the Docs_.

The full change history can be found in CHANGES.rst_ and Change History_.

An upgrade guide is available in UPGRADING.rst_ and Upgrade Guide_.

.. _CHANGES.rst: https://github.com/cpburnz/python-pathspec/blob/master/CHANGES.rst .. _Change History: https://python-path-specification.readthedocs.io/en/stable/changes.html .. _Read the Docs: https://python-path-specification.readthedocs.io .. _UPGRADING.rst: https://github.com/cpburnz/python-pathspec/blob/master/UPGRADING.rst .. _Upgrade Guide: https://python-path-specification.readthedocs.io/en/stable/upgrading.html

Other Languages

The related project pathspec-ruby_ (by highb) provides a similar library as a Ruby gem_.

.. _pathspec-ruby: https://github.com/highb/pathspec-ruby .. _Ruby gem: https://rubygems.org/gems/pathspec

Change History

1.1.1 (2026-04-26)

Improvements:

  • Improved type checking with mypy and pyright.

Bug fixes:

  • Fixed typing on PathSpec[TPattern] to PathSpec[TPattern_co].
  • Added missing variant type-hint type[Pattern] to PathSpec.from_lines() parameter pattern_factory.
  • Fixed possible type error when using + and += operators on PathSpec.

1.1.0 (2026-04-22)

New features:

  • Issue #108_: Specialize pattern type for PathSpec as PathSpec[TPattern] for better debugging of PathSpec().patterns.

Bug fixes:

  • Issue #93_: Git discards invalid range notation. GitIgnoreSpecPattern now discards patterns with invalid range notation like Git.
  • Pull #106_: Fix escape() not escaping backslash characters.

Improvements:

  • Pull #110_: Nicer debug print outs (and str for regex pattern).

.. _Pull #106: https://github.com/cpburnz/python-pathspec/pull/106 .. _Issue #108: https://github.com/cpburnz/python-pathspec/issues/108 .. _Pull #110: https://github.com/cpburnz/python-pathspec/pull/110

1.0.4 (2026-01-26)

Bug fixes:

  • Issue #103_: Using re2 fails if pyre2 is also installed.

.. _Issue #103: https://github.com/cpburnz/python-pathspec/issues/103

1.0.3 (2026-01-09)

Bug fixes:

  • Issue #101_: pyright strict errors with pathspec >= 1.0.0.
  • Issue #102_: No module named 'tomllib'.

.. _Issue #101: https://github.com/cpburnz/python-pathspec/issues/101 .. _Issue #102: https://github.com/cpburnz/python-pathspec/issues/102

1.0.2 (2026-01-07)

Bug fixes:

  • Type hint collections.abc.Callable does not properly replace typing.Callable until Python 3.9.2.

1.0.1 (2026-01-06)

Bug fixes:

  • Issue #100_: ValueError(f"{patterns=!r} cannot be empty.") when using black.

.. _Issue #100: https://github.com/cpburnz/python-pathspec/issues/100

1.0.0 (2026-01-05)

Major changes:

  • Issue #91_: Dropped support of EoL Python 3.8.
  • Added concept of backends to allow for faster regular expression matching. The backend can be controlled using the backend argument to PathSpec(), PathSpec.from_lines(), GitIgnoreSpec(), and GitIgnoreSpec.from_lines().
  • Renamed "gitwildmatch" pattern back to "gitignore". The "gitignore" pattern behaves slightly differently when used with PathSpec (gitignore as documented) than with GitIgnoreSpec (replicates Git's edge cases).

API changes:

  • Breaking: protected method pathspec.pathspec.PathSpec._match_file() (with a leading underscore) has been removed and replaced by backends. This does not affect normal usage of PathSpec or GitIgnoreSpec. Only custom subclasses will be affected. If this breaks your usage, let me know by opening an issue <https://github.com/cpburnz/python-pathspec/issues>_.
  • Deprecated: "gitwildmatch" is now an alias for "gitignore".
  • Deprecated: pathspec.patterns.GitWildMatchPattern is now an alias for pathspec.patterns.gitignore.spec.GitIgnoreSpecPattern.
  • Deprecated: pathspec.patterns.gitwildmatch module has been replaced by the pathspec.patterns.gitignore package.
  • Deprecated: pathspec.patterns.gitwildmatch.GitWildMatchPattern is now an alias for pathspec.patterns.gitignore.spec.GitIgnoreSpecPattern.
  • Deprecated: pathspec.patterns.gitwildmatch.GitWildMatchPatternError is now an alias for pathspec.patterns.gitignore.GitIgnorePatternError.
  • Removed: pathspec.patterns.gitwildmatch.GitIgnorePattern has been deprecated since v0.4 (2016-07-15).
  • Signature of method pathspec.pattern.RegexPattern.match_file() has been changed from def match_file(self, file: str) -> RegexMatchResult | None to def match_file(self, file: AnyStr) -> RegexMatchResult | None to reflect usage.
  • Signature of class method pathspec.pattern.RegexPattern.pattern_to_regex() has been changed from def pattern_to_regex(cls, pattern: str) -> tuple[str, bool] to def pattern_to_regex(cls, pattern: AnyStr) -> tuple[AnyStr | None, bool | None] to reflect usage and documentation.

New features:

  • Added optional "hyperscan" backend using hyperscan_ library. It will automatically be used when installed. This dependency can be installed with pip install 'pathspec[hyperscan]'.
  • Added optional "re2" backend using the google-re2_ library. It will automatically be used when installed. This dependency can be installed with pip install 'pathspec[re2]'.
  • Added optional dependency on typing-extensions_ library to improve some type hints.

Bug fixes:

  • Issue #93_: Do not remove leading spaces.
  • Issue #95_: Matching for files inside folder does not seem to behave like .gitignore's.
  • Issue #98_: UnboundLocalError in RegexPattern when initialized with pattern=None.
  • Type hint on return value of pathspec.pattern.RegexPattern.match_file() to match documentation.

Improvements:

  • Mark Python 3.13 and 3.14 as supported.
  • No-op patterns are now filtered out when matching files, slightly improving performance.
  • Fix performance regression in iter_tree_files() from v0.10.

.. _Issue #38: https://github.com/cpburnz/python-pathspec/issues/38 .. _Issue #91: https://github.com/cpburnz/python-pathspec/issues/91 .. _Issue #93: https://github.com/cpburnz/python-pathspec/issues/93 .. _Issue #95: https://github.com/cpburnz/python-pathspec/issues/95 .. _Issue #98: https://github.com/cpburnz/python-pathspec/issues/98 .. _google-re2: https://pypi.org/project/google-re2/ .. _hyperscan: https://pypi.org/project/hyperscan/ .. _typing-extensions: https://pypi.org/project/typing-extensions/

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts