Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
This project contains Python bindings for PCRE2. PCRE2 is the revised API for the Perl-compatible regular expressions (PCRE) library created by Philip Hazel. For original source code, see the official PCRE2 repository.
From PyPI:
pip install pcre2
If a wheel is not available for your platform, the module will be built from source. Building requires:
cmake
gcc
and make
libtool
Regular expressions are compiled with pcre2.compile()
which accepts both unicode strings and bytes-like objects.
This returns a Pattern
object.
Expressions can be compiled with a number of options (combined with the bitwise-or operator) and can be JIT compiled,
>>> import pcre2
>>> expr = r'(?<head>\w+)\s+(?<tail>\w+)'
>>> patn = pcre2.compile(expr, options=pcre2.I, jit=True)
>>> # Patterns can also be JIT compiled after initialization.
>>> patn.jit_compile()
Inspection of Pattern
objects is done as follows,
>>> patn.jit_size
980
>>> patn.name_dict()
{1: 'head', 2: 'tail'}
>>> patn.options
524296
>>> # Deeper inspection into options is available.
>>> pcre2.CompileOption.decompose(patn.options)
[<CompileOption.CASELESS: 0x8>, <CompileOption.UTF: 0x80000>]
Once compiled, Pattern
objects can be used to match against strings.
Matching return a Match
object, which has several functions to view results,
>>> subj = 'foo bar buzz bazz'
>>> match = patn.match(subj)
>>> match.substring()
'foo bar'
>>> match.start(), match.end()
(8, 17)
Substitution is also supported, both from Pattern
and Match
objects,
>>> repl = '$2 $1'
>>> patn.substitute(repl, subj) # Global substitutions by default.
'bar foo bazz buzz'
>>> patn.substitute(repl, subj, suball=False)
'bar foo buzz bazz'
>>> match.expand(repl)
'bar foo buzz bazz'
Additionally, Pattern
objects support scanning over subjects for all non-overlapping matches,
>>> for match in patn.scan(subj):
... print(match.substring('head'))
...
foo
buzz
PCRE2 provides a fast regular expression library, particularly with JIT compilation enabled.
Below are the regex-redux
benchmark results included in this repository,
Script | Number of runs | Total time | Real time | User time | System time |
---|---|---|---|---|---|
baseline.py | 10 | 3.020 | 0.302 | 0.020 | 0.086 |
vanilla.py | 10 | 51.380 | 5.138 | 11.408 | 0.529 |
hand_optimized.py | 10 | 13.190 | 1.319 | 2.846 | 0.344 |
pcre2_module.py | 10 | 13.670 | 1.367 | 2.269 | 0.532 |
Script descriptions are as follows,
Script | Description |
---|---|
baseline.py | Reads input file and outputs stored expected output |
vanilla.py | Pure Python version |
hand_optimized.py | Manually written Python ctypes bindings for shared PCRE2 C library |
pcre2_module.py | Implementation using Python bindings written here |
Tests were performed on an M2 Macbook Air.
Note that to run benchmarks locally, Git LFS must be installed to download the input dataset.
Additionally, a Python virtual environment must be created, and the package built
with make init
and make build
respectively.
For more information on this benchmark, see The Computer Language Benchmarks Game.
See source code of benchmark scripts for details and original sources.
FAQs
Python bindings for the PCRE2 regular expression library
We found that pcre2 demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.