Security News
pnpm 10.0.0 Blocks Lifecycle Scripts by Default
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
@bfwk/expression-completion
Advanced tools
The Expression Completion Engine is a general purpose editor/validator for expressions defined by an ANTLR grammar and with input contexts described by JSON Schema.
The grammar is a superset of the legacy expression grammar (Formula.g4). In order to use a grammar for code completion it is useful to inject various rules to facilitate the discovery of a token's scope wrt completion, so there is some rule redefinition.
This uses the Javascript ANTLR4 runtime.
The build target includes a code generation step, based upon the grammar (UnifiedExpression.g4), to create various grammar specific source files in a non-repository gen/ directory. The code generator is Java-based and saved in the repository in the bin/ directory. Directions for running the code generator are available on the ANTLR site.
Do NOT checkin changes to the grammar without rebuilding and testing first - the generated sources will be altered and, while not saved in the repo, might become incompatible with the other source files.
Note that we use an unchanged reference copy of the legacy parser and lexer grammars which we then munge slightly into a local copy for import into the unified grammars. The cpgrammars.sh script is the only place that is aware of these reference grammars. The rest of the build process uses locally generated versions in place. For now the legacy grammars are copied into our repo, but they could be dynamically recovered from their home repo at some point. The script might need changes to keep the locally generated grammars consistent with our local expectations (documented within the script).
As the grammar is expanded in subsequent releases we might want to consider the use of this code completion engine. This was not used in 232 because of difficulties rolling up the Typescript ANTLR4 runtime upon which it depends. An early cut using this is available on the branch https://git.soma.salesforce.com/BuilderFramework/builder-framework/tree/drobertson/DO-NOT-DELETE-typescript-completion-engine
The completion engine works generally as follows:
We need to navigate the parse tree to find the terminal node at the caret position. Note that this does not have to be at the end of the expression - the user might have clicked into the input field or left-arrowed to a position within the current expression.
This is the bulk of the logic. This is where semantic knowledge is introduced.
In order to capture the semantic state of the expression it is necessary to "massage" the base grammar so that the parser listener is triggered at semantically useful moments. This is easiest done by introducing parser rules at appropriate points within the grammar. This allows such things as sensible completion suggestions after a dot and scoping of candidates within nested field references.
The symbol table that is built by the listener during the parse process is keyed on token. That is how the completion candidates are associated with the input token found during Step 2. The symbol table represents the aggregate semantic knowledge of the expression (wrt completion anyway) at any token position.
The current implementation strives to base itself off the legacy formula grammar without requiring changes to it that would be visible to other consumers. It does this by manipulating the parser and lexer grammars "in-place" through a number of ANTLR tricks.
The current completion engine allows and can be configured for three parser entry points.
These are defined in terms of each other wrt parser and lexer rules so as much as is possible the grammar and supporting code is reused.
There are three things you really have to understand about ANTLR to make sense of the grammar manipulations that are employed:
Local manipulation of the legacy grammars comes in two forms:
Direct edits are applied to the reference copies of the legacy grammars (in the reference/ directory) by shell scripts (in the scripts/ directory) to produce temporary modified legacy grammar files (in the src/ directory). These form the starting point for subsequent manipulation by the override tactics. An example of a direct edit is the removal of Java grammar embeds. These scripts are sensitive and will have to be maintained as changes to the legacy formula grammar are made. They strive not to change the underlying grammar syntax or semantic at all but rather simply aim to make the legacy grammar files usable within the unified grammar structure (pure vs combined grammars etc).
We create new rules within the unified parser and lexer grammars and override legacy rules for a number of reasons:
String interpolation requires the use of a modal lexer. This has downstream consequences that affect the relationships between parser and lexer grammars. In particular, you cannot import a modal lexer into a parser grammar in ANTLR to form a combined grammar - you have to reference the tokens from a pure lexer grammar.
The formula grammar contains a definition of the IDENT token that is inappropriate for our needs. This is an intractable problem if we were to use the legacy lexer grammar as is. This is quite a severe problem. The legacy IDENT rule consumes all parts of a qualified name (ie. 'a.b.c') including the dot separators. It also allows for otherwise questionable identifiers like 'a.1'.
For our purposes we needed to break up this lexer token and redefine the uses of IDENT to use the newly defined rules (see UnifiedLexer2.g4). This works fine, tortuous though it is, but it has the consequence that identifiers within the expression completion engine are more restrictive than those within the legacy formula grammar. This is the only grammatical inconsistency as far as I know.
In order to get the string interpolation modal grammar to work we need a readahead mechanism. This works fine within the grammar for parsing, but we need compensation within the code to account for tokens that were read but not consumed. This is undoubtedly a hack, but at least it's isolated.
FAQs
LBF Expression Completion Engine
We found that @bfwk/expression-completion demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
pnpm 10 blocks lifecycle scripts by default to improve security, addressing supply chain attack risks but sparking debate over compatibility and workflow changes.
Product
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.
Research
Security News
Socket researchers have discovered multiple malicious npm packages targeting Solana private keys, abusing Gmail to exfiltrate the data and drain Solana wallets.