๐Ÿšจ Shai-Hulud Strikes Again:834 Packages Compromised.Technical Analysis โ†’
Socket
Book a DemoInstallSign in
Socket

ezregex

Package Overview
Dependencies
Maintainers
1
Versions
38
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

ezregex

A readable and intuitive way to generate Regular Expressions

Source
pipPyPI
Version
3.0.2
Maintainers
1


Unit Tests PyPI Latest Release

EZRegex

A readable and intuitive way to write Regular Expressions without having to know any of the syntax

Table of Contents

Usage

Quickstart

TLDR: This is to regular expressions what CMake is to makefiles (i.e. it's a tool to generate the older tool's syntax)

from ezregex import *

'foo' + number + optional(whitespace) + group(word)
# Or if you prefer the method syntax (they can be mixed)
number.append(whitespace.optional).prepend('foo').append(word.group())

# These match `foo123abc` and `foo123 abc`
# but not `abc123foo` or  `foo bar`

Importing as a named package is recommended if you're using it in a larger project

import ezregex as ez

# ow is part of ez already as "optional chunk of whitespace" (`\s*`)
params = ez.group(ez.at_least_none(ez.ow + ez.word + ez.ow + ez.optional(',') + ez.ow))
# Seperate parts as variables for cleaner patterns
function = ez.word + ez.ow + '(' + params + ')'

function.search('some string containing func( param1 , param2)')

# Boolean test
'some string containing func( param1 , param2)' in function

# The test() method is helpful for debugging, and color codes groups for you
function.test('this should match func(param1,\tparam2 ), foo(), and bar( foo,)')

.test() will print all the matches, color coded to match and group (colors not shown here):

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Testing Regex โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Testing expression:                                                          โ”‚
โ”‚         \w+\s*\(((?:\s*\w+\s*,?\s*)*)\)                                      โ”‚
โ”‚ for matches in:                                                              โ”‚
โ”‚         this should match func(param1,  param2 ), foo(), and bar( foo,)      โ”‚
โ”‚                                                                              โ”‚
โ”‚ Match = "func(param1,   param2 )" (18:39)                                    โ”‚
โ”‚ Unnamed Groups:                                                              โ”‚
โ”‚         1: "param1,     param2 " (23:38)                                     โ”‚
โ”‚                                                                              โ”‚
โ”‚ Match = "foo()" (41:46)                                                      โ”‚
โ”‚ Unnamed Groups:                                                              โ”‚
โ”‚         1: "" (45:45)                                                        โ”‚
โ”‚                                                                              โ”‚
โ”‚ Match = "bar( foo,)" (52:62)                                                 โ”‚
โ”‚ Unnamed Groups:                                                              โ”‚
โ”‚         1: " foo," (56:61)                                                   โ”‚
โ”‚                                                                              โ”‚
โ”‚                                                                              โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Found  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

Check out the gotchas for some common issues and gotchas.

Inverting

The invert function (available as ez.invert(expression), expression.invert(), or ~expression) is useful for debugging. You pass it an expression, and it returns an example of a string that is guaranteed to match the provided expression. It specifically is made for debugging as well, so where possible, it will use actual words and 12345... for sequences of numbers.

Generation

In version v1.7.0 we introduced a new function: generate_regex. It takes in 2 sets of strings, and returns a regular expression that will match everything in the first set and nothing in the second set. It may be a bit crude, but it can be a good starting point if you don't know where to start. It's also really good at regex golf.

Functions vs Methods

As of v2.1.0, elemental methods were added to EZRegex objects. These shadow their function element counterparts exactly and work the same way, they're just for convenience and preference.

For example, these are all equivelent:

# Element functions
optional(whitespace) + group(either(repeat('a'), 'b')) + if_followed_by(word)
# Elemental methods
whitespace.optional.append(literal('a').repeat.or_('b').group).if_followed_by(word)
# Mixed
whitespace.optional + repeat('a').or_('b').group + if_followed_by(word)

Dialects

As of version v1.6.0, the concepts of dialects was introduced. Different languages often have slight variations on the regular expression syntax. As this library is meant to be language independent (even though it's written in Python), you should be able to compile regular expressions to work with other languages as well. To do that, you can simply import all the elements as a sub-package, and they should work identically, although some languages may not have the same features as others.

>>> import ezregex as ez # The python dialect is the defualt dialect
>>> ez.group(digit, name='name') + ez.earlier_group('name')
PythonEZRegex((?P<name>\d)(?P=name), {...})
>>> import ezregex.javascript as ez
>>> ez.group(digit, name='name') + ez.earlier_group('name')
JavascriptEZRegex(/(?<name>\d)\k<name>/, {...})

The currently implemented dialects are:

DialectCompletenessTests pass
Python~100%Yes
JavaScript~90%Yes
PCRE2~60%Yes
R100%Yes
Rust0%No
C#0%No

Just because a dialect is implemented, doesn't mean it has all the features of the language. However, everything implemented is tested, so if you can import it, it's usable.

If you know a particular flavor of regex and would like to contribute, feel free to read the developer documentation and make a pull request! If you would like one that's not implemented yet, you can also add a github issue.

Utilities

All the functions in the Python re library (search, match, sub, etc.) are implemented in the Python dialect, and act identically to their equivalents. If you still want to use the Python re library directly, note that functions like search and sub don't accept EZRegex patterns as valid regex. Be sure to either call .str() (or cast it to a string) or .compile() (to compile to an re.Pattern) when passing to those. Using the member functions however, will be more efficient, as EZRegex caches the compiled re.Pattern internally.

There's also an api function, which acts like an API endpoint for regular expressions. This is used by the EZRegex frontend, as it loads this library locally in the browser. It made sense to put it in the library itself, becasue it could be useful for other purposes.

Aliases

A lot of the EZRegexs have multiple names, either because different names make more sense in different contexts, or simply to allow different formatting. You can see the aliases for each EZRegex in the docs. As a general rule, there are snake_case and camelCase versions for each one, where applicable.

Installation

EZRegex is distributed on PyPI as a pure-python universal wheel with no dependencies and is available on Linux, macOS and Windows and supports Python 3.10+ and PyPy.

pip install ezregex

The import name is the same as the package name:

import ezregex as ez

License

EZRegex is distributed under the MIT License

Contributing

I love contributions! I don't have many rules for contributing. I just ask that if you're going to add a dialect, before you open a PR, please set up tests for it, and make sure they pass. It doesn't have to be fully implemented, but it should at least be a valid framework to build off of.

Credits

This library was written from scratch entirely by Copeland Carter. Inspirations for this project include:

  • PyParsing
    • I stole a bunch of the operators (especially the [] operator) from them, though we happened upon the same basic structure independantly (convergent evolution, anyone?)
  • regular-expressions.info
    • Their reference is where I got a lot of the documentation on other regex dialects
  • human-regex
    • Gave me the idea for including element methods, instead of solely element functions
  • Peter Norvig and Stefan Pochmann
    • Peter Norvig's blog is where I ripped most of the generation code from. All credit goes to him.

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts