Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
character-parser
Advanced tools
Parse JavaScript one character at a time to look for snippets in Templates. This is not a validator, it's just designed to allow you to have sections of JavaScript delimited by brackets robustly.
The character-parser npm package is designed for parsing through strings to identify and handle different characters or sequences of characters. It is particularly useful in scenarios where you need to process code or text to find specific patterns, such as brackets, quotes, or custom-defined sequences. This package can be utilized in developing compilers, interpreters, or any application that requires detailed analysis of text or code.
Bracket Matching
This feature allows you to parse through a string to find matching brackets. It's useful for syntax analysis in code editors or compilers.
const characterParser = require('character-parser');
let content = 'function(a, b) { return a + b; }';
let state = characterParser.defaultState();
for (let i = 0; i < content.length; i++) {
state = characterParser.parseChar(content.charAt(i), state);
if (state.isNesting()) {
console.log('Nesting at index', i);
}
}
String Detection
This feature helps in identifying when the parser is within a string. It's particularly useful for syntax highlighting or escaping strings in code.
const characterParser = require('character-parser');
let content = '"Hello, world!" and 'another string'';
let state = characterParser.defaultState();
for (let i = 0; i < content.length; i++) {
state = characterParser.parseChar(content.charAt(i), state);
if (state.isString()) {
console.log('Inside a string at index', i);
}
}
Acorn is a robust, full-featured JavaScript parser that can parse ECMAScript code. It provides detailed analysis of script structure, which makes it more comprehensive than character-parser for JavaScript-specific projects but potentially heavier for simple character parsing tasks.
Esprima is another JavaScript parser that supports ECMAScript 5.1 and newer versions. It's used for static analysis and code manipulation. Compared to character-parser, Esprima offers more detailed parsing capabilities specific to JavaScript syntax but might be overkill for basic character parsing needs.
Chevrotain is a fast and feature-rich parser building toolkit for JavaScript. Unlike character-parser, which is focused on character-level parsing, Chevrotain provides tools for creating complex parsers and interpreters, making it suitable for building full programming languages or complex domain-specific languages (DSLs).
Parse JavaScript one character at a time to look for snippets in Templates. This is not a validator, it's just designed to allow you to have sections of JavaScript delimited by brackets robustly.
npm install character-parser
Work out how much depth changes:
var state = parse('foo(arg1, arg2, {\n foo: [a, b\n');
assert.deepEqual(state.stack, [')', '}', ']']);
parse(' c, d]\n })', state);
assert.deepEqual(state.stack, []);
Find code up to a custom delimiter:
// EJS-style
var section = parser.parseUntil('foo.bar("%>").baz%> bing bong', '%>');
assert(section.start === 0);
assert(section.end === 17); // exclusive end of string
assert(section.src = 'foo.bar("%>").baz');
var section = parser.parseUntil('<%foo.bar("%>").baz%> bing bong', '%>', {start: 2});
assert(section.start === 2);
assert(section.end === 19); // exclusive end of string
assert(section.src = 'foo.bar("%>").baz');
// Jade-style
var section = parser.parseUntil('#[p= [1, 2][i]]', ']', {start: 2})
assert(section.start === 2);
assert(section.end === 14); // exclusive end of string
assert(section.src === 'p= [1, 2][i]')
// Dumb parsing
// Stop at first delimiter encountered, doesn't matter if it's nested or not
// This is the character-parser@1 default behavior.
var section = parser.parseUntil('#[p= [1, 2][i]]', '}', {start: 2, ignoreNesting: true})
assert(section.start === 2);
assert(section.end === 10); // exclusive end of string
assert(section.src === 'p= [1, 2')
''
Delimiters are ignored if they are inside strings or comments.
All methods may throw an exception in the case of syntax errors. The exception contains an additional code
property that always starts with CHARACTER_PARSER:
that is unique for the error.
Parse a string starting at the index start, and return the state after parsing that string.
If you want to parse one string in multiple sections you should keep passing the resulting state to the next parse operation.
Returns a State
object.
Parses the source until the first occurrence of delimiter
which is not in a string or a comment.
If ignoreLineComment
is true
, it will still count if the delimiter occurs in a line comment.
If ignoreNesting
is true
, it will stop at the first bracket, not taking into account if the bracket part of nesting or not. See example above.
It returns an object with the structure:
{
start: 0,//index of first character of string
end: 13,//index of first character after the end of string
src: 'source string'
}
Parses the single character and returns the state. See parse
for the structure of the returned state object. N.B. character must be a single character not a multi character string.
Get a default starting state.
Returns true
if character
represents punctuation in JavaScript.
Returns true
if name
is a keyword in JavaScript.
Objects whose values can be a frame in the stack
property of a State (documented below).
A state is an object with the following structure
{
stack: [], // stack of detected brackets; the outermost is [0]
regexpStart: false, // true if a slash is just encountered and a REGEXP state has just been added to the stack
escaped: false, // true if in a string and the last character was an escape character
hasDollar: false, // true if in a template string and the last character was a dollar sign
src: '', // the concatenated source string
history: '', // reversed `src`
lastChar: '' // last parsed character
}
stack
property can contain any of the following:
characterParser.TOKEN_TYPES
characterParser.BRACKETS
(the end bracket, not the starting bracket)It also has the following useful methods:
.current()
returns the innermost bracket (i.e. the last stack frame)..isString()
returns true
if the current location is inside a string..isComment()
returns true
if the current location is inside a comment..isNesting([opts])
returns true
if the current location is not at the top level, i.e. if the stack is not empty. If opts.ignoreLineComment
is true
, line comments are not counted as a level, so for // a
it will still return false.All errors thrown by character-parser has a code
property attached to it that allows one to identify what sort of error is thrown. For errors thrown from parse
and parseUntil
, an additional index
property is available.
In character-parser@2, we have changed the APIs quite a bit. These are some notes that will help you transition to the new version.
Instead of keeping depths of different brackets, we are now keeping a stack. We also removed some properties:
state.lineComment → state.current() === parser.TOKEN_TYPES.LINE_COMMENT
state.blockComment → state.current() === parser.TOKEN_TYPES.BLOCK_COMMENT
state.singleQuote → state.current() === parser.TOKEN_TYPES.SINGLE_QUOTE
state.doubleQuote → state.current() === parser.TOKEN_TYPES.DOUBLE_QUOTE
state.regexp → state.current() === parser.TOKEN_TYPES.REGEXP
parseMax
This function has been removed since the usefulness of this function has been questioned. You should find that parseUntil
is a better choice for your task.
parseUntil
The default behavior when the delimiter is a bracket has been changed so that nesting is taken into account to determine if the end is reached.
To preserve the original behavior, pass ignoreNesting: true
as an option.
To see the difference between the new and old behaviors, see the "Usage" section earlier.
parseMaxBracket
This function has been merged into parseUntil
. You can directly rename the function call without any repercussions.
MIT
FAQs
Parse JavaScript one character at a time to look for snippets in Templates. This is not a validator, it's just designed to allow you to have sections of JavaScript delimited by brackets robustly.
The npm package character-parser receives a total of 1,396,022 weekly downloads. As such, character-parser popularity was classified as popular.
We found that character-parser demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.