Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

ret

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

ret

Tokenizes a string that represents a regular expression.

0.4.2
Source
npm

Version published: 3 years ago

Weekly downloads: 16M; increased by11.65%

Maintainers: 1

Weekly downloads

Created: 13 years ago

What is ret?

The ret npm package is a library for tokenizing regular expressions. This means it can parse regular expressions into a structured format, making it easier to analyze, manipulate, or transform them programmatically. It's particularly useful for developers working with dynamic or complex regular expressions, offering a way to understand and manipulate the patterns in a more granular and controlled manner.

What are ret's main functionalities?

Tokenization of Regular Expressions

This feature allows you to tokenize a regular expression, breaking it down into its constituent parts. The code sample demonstrates how to tokenize a simple regular expression that matches 'hello' or 'world', case-insensitively. The result is a structured representation of the regex, including its type, sub-expressions, and flags.

const ret = require('ret');
const tokens = ret(/hello|world/i);
console.log(tokens);

Analysis of Character Classes

With ret, you can also analyze character classes within regular expressions. The code sample shows how to tokenize a regex that matches any lowercase letter from 'a' to 'z'. The output will detail the structure of the character class, including its range and any specified characters.

const ret = require('ret');
const tokens = ret(/[a-z]/);
console.log(tokens);

Handling of Quantifiers

This functionality allows for the parsing and understanding of quantifiers within regular expressions. The provided code sample tokenizes a regex that matches between two and four digits. The tokenized output includes detailed information about the quantifier, such as its type and the minimum and maximum number of repetitions.

const ret = require('ret');
const tokens = ret(/\d{2,4}/);
console.log(tokens);

Other packages similar to ret

Regular Expression Tokenizer

Tokenizes strings that represent a regular expressions.

Depfu

Usage

const ret = require('ret');

let tokens = ret(/foo|bar/.source);

tokens will contain the following object

{
  "type": ret.types.ROOT
  "options": [
    [ { "type": ret.types.CHAR, "value", 102 },
      { "type": ret.types.CHAR, "value", 111 },
      { "type": ret.types.CHAR, "value", 111 } ],
    [ { "type": ret.types.CHAR, "value",  98 },
      { "type": ret.types.CHAR, "value",  97 },
      { "type": ret.types.CHAR, "value", 114 } ]
  ]
}

Reconstructing Regular Expressions from Tokens

The reconstruct function accepts an any token and returns, as a string, the component of the regular expression that is associated with that token.

import { reconstruct, types } from 'ret'
const tokens = ret(/foo|bar/.source)
const setToken = {
    "type": types.SET,
    "set": [
      { "type": types.CHAR, "value": 97 },
      { "type": types.CHAR, "value": 98 },
      { "type": types.CHAR, "value": 99 }
    ],
    "not": true
  }
reconstruct(tokens)                               // 'foo|bar'
reconstruct({ "type": types.CHAR, "value": 102 }) // 'f'
reconstruct(setToken)                             // '^abc'

Token Types

ret.types is a collection of the various token types exported by ret.

ROOT

Only used in the root of the regexp. This is needed due to the posibility of the root containing a pipe | character. In that case, the token will have an options key that will be an array of arrays of tokens. If not, it will contain a stack key that is an array of tokens.

{
  "type": ret.types.ROOT,
  "stack": [token1, token2...],
}

{
  "type": ret.types.ROOT,
  "options" [
    [token1, token2...],
    [othertoken1, othertoken2...]
    ...
  ],
}

GROUP

Groups contain tokens that are inside of a parenthesis. If the group begins with ? followed by another character, it's a special type of group. A ':' tells the group not to be remembered when exec is used. '=' means the previous token matches only if followed by this group, and '!' means the previous token matches only if NOT followed.

Like root, it can contain an options key instead of stack if there is a pipe.

{
  "type": ret.types.GROUP,
  "remember" true,
  "followedBy": false,
  "notFollowedBy": false,
  "stack": [token1, token2...],
}

{
  "type": ret.types.GROUP,
  "remember" true,
  "followedBy": false,
  "notFollowedBy": false,
  "options" [
    [token1, token2...],
    [othertoken1, othertoken2...]
    ...
  ],
}

POSITION

\b, \B, ^, and $ specify positions in the regexp.

{
  "type": ret.types.POSITION,
  "value": "^",
}

SET

Contains a key set specifying what tokens are allowed and a key not specifying if the set should be negated. A set can contain other sets, ranges, and characters.

{
  "type": ret.types.SET,
  "set": [token1, token2...],
  "not": false,
}

RANGE

Used in set tokens to specify a character range. from and to are character codes.

{
  "type": ret.types.RANGE,
  "from": 97,
  "to": 122,
}

REPETITION

{
  "type": ret.types.REPETITION,
  "min": 0,
  "max": Infinity,
  "value": token,
}

REFERENCE

References a group token. value is 1-9.

{
  "type": ret.types.REFERENCE,
  "value": 1,
}

CHAR

Represents a single character token. value is the character code. This might seem a bit cluttering instead of concatenating characters together. But since repetition tokens only repeat the last token and not the last clause like the pipe, it's simpler to do it this way.

{
  "type": ret.types.CHAR,
  "value": 123,
}

Errors

ret.js will throw errors if given a string with an invalid regular expression. All possible errors are

Invalid group. When a group with an immediate ? character is followed by an invalid character. It can only be followed by !, =, or :. Example: /(?_abc)/
Nothing to repeat. Thrown when a repetitional token is used as the first token in the current clause, as in right in the beginning of the regexp or group, or right after a pipe. Example: /foo|?bar/, /{1,3}foo|bar/, /foo(+bar)/
Unmatched ). A group was not opened, but was closed. Example: /hello)2u/
Unterminated group. A group was not closed. Example: /(1(23)4/
Unterminated character class. A custom character set was not closed. Example: /[abc/

Regular Expression Syntax

Regular expressions follow the JavaScript syntax.

The following latest JavaScript additions are not supported yet:

\p and \P: Unicode property escapes
(?<group>) and \k<group>: Named groups
(?<=) and (?<!): Negative lookbehind assertions

Examples

/abc/

{
  "type": ret.types.ROOT,
  "stack": [
    { "type": ret.types.CHAR, "value": 97 },
    { "type": ret.types.CHAR, "value": 98 },
    { "type": ret.types.CHAR, "value": 99 }
  ]
}

/[abc]/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [
      { "type": ret.types.CHAR, "value": 97 },
      { "type": ret.types.CHAR, "value": 98 },
      { "type": ret.types.CHAR, "value": 99 }
    ],
    "not": false
  }]
}

/[^abc]/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [
      { "type": ret.types.CHAR, "value": 97 },
      { "type": ret.types.CHAR, "value": 98 },
      { "type": ret.types.CHAR, "value": 99 }
    ],
    "not": true
  }]
}

/[a-z]/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [
      { "type": ret.types.RANGE, "from": 97, "to": 122 }
    ],
    "not": false
  }]
}

/\w/

// Similar logic for `\W`, `\d`, `\D`, `\s` and `\S`    
{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [{
      { "type": ret.types.CHAR, "value": 95 },
      { "type": ret.types.RANGE, "from": 97, "to": 122 },
      { "type": ret.types.RANGE, "from": 65, "to": 90 },
      { "type": ret.types.RANGE, "from": 48, "to": 57 }
    }],
    "not": false
  }]
}

/./

// any character but CR, LF, U+2028 or U+2029
{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [ 
      { "type": ret.types.CHAR, "value": 10 },
      { "type": ret.types.CHAR, "value": 13 },
      { "type": ret.types.CHAR, "value": 8232 },
      { "type": ret.types.CHAR, "value": 8233 }
    ],
    "not": true
  }]
}

/a*/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 0,
    "max": Infinity,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/a+/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 1,
    "max": Infinity,
    "value": { "type": ret.types.CHAR, "value": 97 },
  }]
}

/a?/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 0,
    "max": 1,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/a{3}/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 3,
    "max": 3,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/a{3,5}/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 3,
    "max": 5,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/a{3,}/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 3,
    "max": Infinity,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/(a)/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.GROUP, 
    "stack": { "type": ret.types.CHAR, "value": 97 },
    "remember": true
  }]
}

/(?:a)/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.GROUP, 
    "stack": { "type": ret.types.CHAR, "value": 97 },
    "remember": false
  }]
}

/(?=a)/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.GROUP, 
    "stack": { "type": ret.types.CHAR, "value": 97 },
    "remember": false,
    "followedBy": true
  }]
}

/(?!a)/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.GROUP, 
    "stack": { "type": ret.types.CHAR, "value": 97 },
    "remember": false,
    "notFollowedBy": true
  }]
}

/a|b/

{
  "type": ret.types.ROOT,
  "options": [
    [{ "type": ret.types.CHAR, "value": 97 }], 
    [{ "type": ret.types.CHAR, "value": 98 }] 
  ]
}

/(a|b)/

{
  "type": ret.types.ROOT,
  "stack": [
    "type": ret.types.GROUP,
    "remember": true,
    "options": [
      [{ "type": ret.types.CHAR, "value": 97 }], 
      [{ "type": ret.types.CHAR, "value": 98 }] 
    ]
  ]
}

/^/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.POSITION,
    "value": "^"
  }]
}

/$/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.POSITION,
    "value": "$"
  }]
}

/\b/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.POSITION,
    "value": "b"
  }]
}

/\B/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.POSITION,
    "value": "B"
  }]
}

/\1/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.REFERENCE,
    "value": 1
  }]
}

Install

npm install ret

Tests

Tests are written with vows

npm test

Security

To report a security vulnerability, please use the Tidelift security contact. Tidelift will coordinate the fix and disclosure.

Keywords

FAQs

What is ret?

Is ret popular?

Is ret well maintained?

Package last updated on 11 Feb 2022

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

ret

What is ret?

What are ret's main functionalities?

Other packages similar to ret

regexpp

regexp-tree

Regular Expression Tokenizer

Usage

Reconstructing Regular Expressions from Tokens

Token Types

ROOT

GROUP

POSITION

SET

RANGE

REPETITION

REFERENCE

CHAR

Errors

Regular Expression Syntax

Examples

Install

Tests

Security

Keywords

Related posts

Massive npm Malware Campaign Leverages Ethereum Smart Contracts To Evade Detection and Maintain Control

Author Typosquatting on npm: Attackers Impersonate Sindre Sorhus with Malicious ‘chalk-node’ Package

Supply Chain Attack on LottieFiles Player Caused by Compromised npmjs Credentials