Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

pcre2

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

pcre2

Python bindings for the PCRE2 regular expression library

  • 0.4.0
  • PyPI
  • Socket score

Maintainers
1

PCRE2.py: Python bindings for the PCRE2 regular expression library

This project contains Python bindings for PCRE2. PCRE2 is the revised API for the Perl-compatible regular expressions (PCRE) library created by Philip Hazel. For original source code, see the official PCRE2 repository.

Installation

From PyPI:

pip install pcre2

If a wheel is not available for your platform, the module will be built from source. Building requires:

  • cmake
  • C compiler toolchain, such as gcc and make
  • libtool
  • Python headers

Usage

Regular expressions are compiled with pcre2.compile() which accepts both unicode strings and bytes-like objects. This returns a Pattern object. Expressions can be compiled with a number of options (combined with the bitwise-or operator) and can be JIT compiled,

>>> import pcre2
>>> expr = r'(?<head>\w+)\s+(?<tail>\w+)'
>>> patn = pcre2.compile(expr, options=pcre2.I, jit=True)
>>> # Patterns can also be JIT compiled after initialization.
>>> patn.jit_compile()

Inspection of Pattern objects is done as follows,

>>> patn.jit_size
980
>>> patn.name_dict()
{1: 'head', 2: 'tail'}
>>> patn.options
524296
>>> # Deeper inspection into options is available.
>>> pcre2.CompileOption.decompose(patn.options)
[<CompileOption.CASELESS: 0x8>, <CompileOption.UTF: 0x80000>]

Once compiled, Pattern objects can be used to match against strings. Matching return a Match object, which has several functions to view results,

>>> subj = 'foo bar buzz bazz'
>>> match = patn.match(subj)
>>> match.substring()
'foo bar'
>>> match.start(), match.end()
(8, 17)

Substitution is also supported, both from Pattern and Match objects,

>>> repl = '$2 $1'
>>> patn.substitute(repl, subj) # Global substitutions by default.
'bar foo bazz buzz'
>>> patn.substitute(repl, subj, suball=False)
'bar foo buzz bazz'
>>> match.expand(repl)
'bar foo buzz bazz'

Additionally, Pattern objects support scanning over subjects for all non-overlapping matches,

>>> for match in patn.scan(subj):
...     print(match.substring('head'))
...
foo
buzz

Performance

PCRE2 provides a fast regular expression library, particularly with JIT compilation enabled. Below are the regex-redux benchmark results included in this repository,

ScriptNumber of runsTotal timeReal timeUser timeSystem time
baseline.py103.0200.3020.0200.086
vanilla.py1051.3805.13811.4080.529
hand_optimized.py1013.1901.3192.8460.344
pcre2_module.py1013.6701.3672.2690.532

Script descriptions are as follows,

ScriptDescription
baseline.pyReads input file and outputs stored expected output
vanilla.pyPure Python version
hand_optimized.pyManually written Python ctypes bindings for shared PCRE2 C library
pcre2_module.pyImplementation using Python bindings written here

Tests were performed on an M2 Macbook Air. Note that to run benchmarks locally, Git LFS must be installed to download the input dataset. Additionally, a Python virtual environment must be created, and the package built with make init and make build respectively. For more information on this benchmark, see The Computer Language Benchmarks Game. See source code of benchmark scripts for details and original sources.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc