
Research
Two Malicious Rust Crates Impersonate Popular Logger to Steal Wallet Keys
Socket uncovers malicious Rust crates impersonating fast_log to steal Solana and Ethereum wallet keys from source code.
Language parser generator - fork of PegJS
Important note: My bandwidth is tight, and this project is low in my priorities. I'm open to merge fixes and improvements, but I'm unlikely to investigate issues unless they also affect my own projects. If you have a bug or want to see something, I suggest you to implement a fix yourself. Thanks for the understanding 😃
yarn add arpege
Arpege can be used similarly to PegJS; at the moment I recommend you to take a look at their documentation.
Arpege is distributed with a CLI that makes it easier to interact with your parser. Some useful commands:
# Generates the parser on the standard output
yarn peg path/to/grammar.pegjs
# Generates path/to/grammar.parser.mjs
yarn peg path/to/grammar.pegjs -o
# Generates the parser with a custom path, and its TypeScript definition files
yarn peg path/to/grammar.pegjs -o path/to/grammar.js --with-types
# Generates both path/to/grammar.parser.mjs and path/to/grammar.tokenizer.mjs
yarn peg path/to/grammar.pegjs --mode parser --mode tokenizer
# Generates the parser and tokenizer, and also the TypeScript definition files
yarn peg path/to/grammar.pegjs --mode parser --mode tokenizer --with-types
# Prints the result of the parser called on the given file
yarn peg path/to/grammar.pegjs --input-file ./my-file.txt
# Prints the result of the parser called on the given string
yarn peg path/to/grammar.pegjs --input-data '1 + 2'
SuperSyntax is a VSCode extension that lets you write PegJS grammar to highlight your custom languages. To see an example in practice, add Arpege to your project, install the extension, add the following to your settings.json
file (the ~
means that the grammar will be retrieved from your dependencies):
{
"supersyntax.parsers": {
"math": "~arpege/examples/grammar.math.pegjs"
}
}
Then create a new my-file.math.stx
file and add a mathematical expression. It'll be automatically colored as per the rules defined in the grammar! Note that SuperSyntax also automatically adds support for PegJS syntax highlighting.
Backtick strings are now supported in addition to single and double quotes to bring the language closer to JavaScript; interpolation is not permitted, since the PegJS grammar doesn't have equivalents for variables:
keyword = "if" / "then" / "else" / "endif"
keyword = 'if' / 'then' / 'else' / 'endif'
keyword = `if` / `then` / `else` / `endif`
Actions can now be written using the fat arrow syntax (=>
) instead of the curly braces; the return value must be wrapped in parentheses:
identifier = name:$[a-zA-Z]+ => (name.toUpperCase())
/
in alternativesAlternatives are now allowed to be prefixed with a /
:
keyword =
/ break
/ case
/ catch
/ continue
Arpege will check the labels you assigned into parsing sequences, and generate meaningful return values if you didn't explicitly add one (unlike PegJS, which always returned an array of all elements in the sequence). Three variants are supported:
type = "type(" name:identifier ")" nullable:"?"?
// Returns {name: string, nullable"}
type = "type(" ::identifier ")" ::"?"?
// Returns [string, "?" | null]
identifier = !reservedWord ::identifierString
// Returns [string]
If any of those three variant is detected, all other elements in the parsing sequence will see their results discarded.
@if
The @if
annotation lets you conditionally include a parsing sequence. For example, the following rule that parses hexadecimal numbers will recover from end-of-file parsing errors if the tokenizer
variable is set:
hex_number
= @if(tokenizer) "0x" $
/ "0x" digits:[0-9a-fA-F]+
@separator
The +
and *
operators now accept a @separator
annotation that lets you specify an expression to inject between each elements of the repetition:
identifiers =
@separator(S "," S)
identifier+
Warning
This feature isn't compatible with the
cache: true
option.
Arpege can generate a tokenizer for your grammar by adding the --tokenizer
flag to the command line (note that in this case, the output will be a parser that will return a stream of tokens but won't be able to parse your file as originally intended; in other words, you may have to generate two parsers: one for the regular parsing, and another for the tokenizer).
The tokenizer tries to be smart and automatically detect the places where tokens should be read:
$
operator will turn everything it covers into a single token"eval" "("
generates two tokens, "eval("
a single one)$
to aggregate them into a single tokenTokens all have by default the general type syntax
. You can assign custom token types by using the @token
annotation. For example, the following grammar will define a special decorator
token that will cover both the @
and the identifier that follows (without the need to use $
):
@token("number")
Number = [0-9]+
You may also define parent tokens that will be used to group tokens together. For example, this simplified grammar will define a regex
token that also provide tokens for special characters:
@token("regex")
Regex = "/" chars:RegexChar+ "/"
RegexChar =
/ @token("special") [(){}*?|.^$]
/ @token("text") .
Warning
Support for this feature is experimental. Some bugs may exist, and some APIs may change in the future.
Arpege supports generating .d.ts
files for your parsers by adding the --with-types
flag to the command line. Unlike ts-pegjs
which simply makes the parser
function return any
, Arpege attempts to return types that match what the actual parser would return, by using some introspection mechanisms.
The generated parsers have also access to the following helpers which often prove useful in grammar actions:
tuple(val: [...any])
function will force TypeScript to type the provided input value as a tuple rather than a non-descriptive array (for instance, tuple(["hello", 42])
will be typed [string, number]
instead of Array<string | number>
). Workaround for the lack of as tuple
.Despite its best efforts, Arpege may generate invalid files if recursion is used. For instance, given the following syntax:
Term = Number / "(" ::Term ")"
Number = [0-9]+ => (parseInt(text(), 10))
Arpege will generate the following output:
type TermType = NumberType | TermType;
type NumberType = ReturnType<typeof peg$type$action0>;
TypeScript doesn't support recursive types except in very specific cases, and will degrade TermType
into any
. To avoid this, you must use the @type
annotation which lets you assign a manual type to a node. For example, to ignore the recursive branch, you can manually type it as never
:
Term = Number / @type("never") "(" ::Term ")"
Number = [0-9]+ => (parseInt(text(), 10))
The TermType
type will then become NumberType | never
, which TypeScript will coalesce into simply NumberType
. In more complicated cases, you may have to provide more concrete types or even fallback to unknown
or any
; check the grammar.css.pegjs
and grammar.pegjs.pegjs
files in this repository for some examples.
Warning
This feature isn't compatible with the
cache: true
option.
Parsers have access to a new onRollback
function. This function will be called if Arpege detects that the alternative it follows won't work, and can be used to clean state previously set by your actions. For example:
identifier = name:$[a-z]+ {
identifiers.push(name);
onRollback(() => {
identifiers.pop();
});
}
I wanted to use PegJS on a small language idea I have. However, it proved very difficult to integrate it within VSCode with proper syntactic coloration. I started working on implementing a tokenizer in my own fork, but before that I decided to move the codebase to TypeScript to make this work easier and safer.
Once the tokenizer was done I decided to implement some of the features I felt I was missing, and in the end I got a project that was almost but not entirely backward-compatible with PegJS. Given that I'm already leading the development for Yarn, I unfortunately don't have the bandwidth required to efficiently contribute to an additional community (with the many back-and-forth that usually comes with it), so I decided to fork the project to simplify the contribution process.
Copyright © 2022 Mael Nison
Copyright © 2010-2016 David Majda
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
FAQs
Parser generator for JavaScript
We found that arpege demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Socket uncovers malicious Rust crates impersonating fast_log to steal Solana and Ethereum wallet keys from source code.
Research
A malicious package uses a QR code as steganography in an innovative technique.
Research
/Security News
Socket identified 80 fake candidates targeting engineering roles, including suspected North Korean operators, exposing the new reality of hiring as a security function.