bson-transpilers
Advanced tools
Comparing version 0.13.4 to 0.13.5
@@ -7,3 +7,3 @@ # Contributing to bson-transpilers | ||
to create a parse tree. As `ANTLR` is written in Java, you will need to set up a | ||
few tools before being able to compile this locally. | ||
few tools before being able to compile this locally. | ||
@@ -15,20 +15,5 @@ Make sure you have Java installed: | ||
Download `ANTLR4`: | ||
```shell | ||
$ cd /usr/local/lib && curl -O http://www.antlr.org/download/antlr-4.7.2-complete.jar | ||
``` | ||
You will then need to add it to your `$CLASSPATH`: | ||
```shell | ||
$ export CLASSPATH=".:/usr/local/lib/antlr-4.7.2-complete.jar:$CLASSPATH" | ||
``` | ||
Alias `antlr4` and `grun`: | ||
```shell | ||
$ alias antlr4='java -Xmx500M -cp "/usr/local/lib/antlr-4.7.2-complete.jar:$CLASSPATH" org.antlr.v4.Tool' && alias grun='java org.antlr.v4.gui.TestRig' | ||
``` | ||
_I strongly suggest using an IDE that will help you visualize ANTLR trees (JetBrains has a good plugin). | ||
Otherwise you can use the java version of the grammar and compile it with | ||
`javac <Language>*.java && grun <Language> <StartRule> -gui`. | ||
Otherwise you can use the java version of the grammar and compile it with | ||
`javac <Language>*.java && grun <Language> <StartRule> -gui`. | ||
[This might be helpful](https://github.com/antlr/antlr4/blob/master/doc/getting-started.md)._ | ||
@@ -45,19 +30,21 @@ | ||
- __OUTPUT=:__ comma-separated output languages you want to test. Also called "target" language. | ||
- __MODE=:__ comma-separated names of the test files (without .yaml) that you want to run | ||
- __MODE=:__ comma-separated names of the test files (without .yaml) that you want to run | ||
```shell | ||
OUTPUT=csharp INPUT=shell MODE=native,bson npm run test | ||
OUTPUT=csharp INPUT=shell MODE=native,bson npm run test | ||
``` | ||
# How it works | ||
See also the original presentation: https://drive.google.com/file/d/1jvwtR3k9oBUzIjL4z_VtpHvdWahfcjTK/view | ||
## Compilation Stages | ||
Similar to how many transpilers work, this package parses the input | ||
string into a tree and then generates code from the tree using the [Visitor | ||
Similar to how many transpilers work, this package parses the input | ||
string into a tree and then generates code from the tree using the [Visitor | ||
pattern](https://en.wikipedia.org/wiki/Visitor_pattern). | ||
### Step 1: Parsing | ||
Parsing and tree generation is handled by ANTLR4. | ||
The grammar files are located in the `grammars` folder, and the javascript | ||
Parsing and tree generation is handled by ANTLR4. | ||
The grammar files are located in the `grammars` folder, and the javascript | ||
parser/lexer/etc. generated from the grammar are located in `lib/antlr`. To make | ||
changes to the grammar, you have to modify the `.g4` file in `grammars`, then | ||
changes to the grammar, you have to modify the `.g4` file in `grammars`, then | ||
run `npm run compile`. You should never directly modify files in `lib`. | ||
@@ -70,3 +57,3 @@ | ||
ANTLR generates a "shell" visitor class for each tree in | ||
ANTLR generates a "shell" visitor class for each tree in | ||
`lib/antlr/<grammar name>Visitor.js`. It contains an empty method | ||
@@ -79,3 +66,3 @@ for each node in the parse tree. | ||
Because the project is designed to handle multiple input languages and multiple | ||
Because the project is designed to handle multiple input languages and multiple | ||
output languages, the tree visitation stage is split into parts. The first part | ||
@@ -85,8 +72,8 @@ is handled in the visitor class defined in `codegeneration/<input language>/Visitor.js`. | ||
This visitor class <b>is specific to the input language</b> and can only visit | ||
a tree generated by that grammar. The visitor visits each node and use a | ||
[string template](#templates) defined in either the [symbol table](#symbols) | ||
or the [type table](#types) to generate code in the ouput language. | ||
For expressions that are too complex for a string template, the visitor will call an | ||
`emit` method defined in the [Generator](#step-3:-generator). The general rule is | ||
This visitor class <b>is specific to the input language</b> and can only visit | ||
a tree generated by that grammar. The visitor visits each node and use a | ||
[string template](#templates) defined in either the [symbol table](#symbols) | ||
or the [type table](#types) to generate code in the ouput language. | ||
For expressions that are too complex for a string template, the visitor will call an | ||
`emit` method defined in the [Generator](#step-3:-generator). The general rule is | ||
that emit methods aren't required unless you're doing something very unusual! Or | ||
@@ -96,6 +83,6 @@ if you need to do any tree manipulation, since the templates only have access to the | ||
If the node requires special treatment for all output languages, the visitor will | ||
define a `process<type>` method that will do some pre-processing before calling | ||
the appropriate string template or `emit` method. An example is `processDate` in | ||
the JS visitor, which constructs a date object from the input and passes it to the | ||
If the node requires special treatment for all output languages, the visitor will | ||
define a `process<type>` method that will do some pre-processing before calling | ||
the appropriate string template or `emit` method. An example is `processDate` in | ||
the JS visitor, which constructs a date object from the input and passes it to the | ||
Date template. | ||
@@ -111,6 +98,6 @@ | ||
### Step 3: Generator | ||
The other half of the tree visitation stage. Each ouput language will | ||
have a Generator class defined in `codegeneration/<ouput language>/Generator.js`. | ||
The other half of the tree visitation stage. Each ouput language will | ||
have a Generator class defined in `codegeneration/<ouput language>/Generator.js`. | ||
The Generator class generates code, so it is <b> specific to the ouput language. | ||
</b> The Generator class is a subclass of the input language's visitor class. | ||
</b> The Generator class is a subclass of the input language's visitor class. | ||
So for example, translating between JS and Python, the order of inheritance will be: | ||
@@ -122,3 +109,3 @@ 1. `lib/antlr/ECMAScriptVisitor.js` ["empty" superclass, specific to the tree built by ANTLR] | ||
For nodes that cannot be translated using | ||
For nodes that cannot be translated using | ||
templates, the Generator class will define a method called `emit<type>` which | ||
@@ -131,17 +118,17 @@ takes in a tree node, some optional metadata, and returns the transformed string. | ||
When the visitor in [step #1](#step-1:-parsing) reaches a function call, variable, attribute access, or other "identifier" | ||
When the visitor in [step #1](#step-1:-parsing) reaches a function call, variable, attribute access, or other "identifier" | ||
expression it needs a way of knowing what that symbol evaluates to in order to know if it is valid. | ||
### Symbols | ||
Each input language has it's own set of symbols that are part of the | ||
language. The majority of symbols supported in the input languages are BSON types | ||
(i.e. `Int32`, `ObjectId`, etc) but there are a few native types like `RegExp` and | ||
`Date` that are not BSON-specific. In order for the transpiler to know if a symbol | ||
is undefined, we store symbol information in a | ||
Each input language has it's own set of symbols that are part of the | ||
language. The majority of symbols supported in the input languages are BSON types | ||
(i.e. `Int32`, `ObjectId`, etc) but there are a few native types like `RegExp` and | ||
`Date` that are not BSON-specific. In order for the transpiler to know if a symbol | ||
is undefined, we store symbol information in a | ||
[Symbol Table](https://en.wikipedia.org/wiki/Symbol_table). | ||
#### Symbol Metadata | ||
The visitor uses the symbol table to determine if a symbol is undefined, but the | ||
The visitor uses the symbol table to determine if a symbol is undefined, but the | ||
symbol table also stores some metadata so the visitor can do type and other validity checks. The symbols | ||
are defined in [YAML](https://en.wikipedia.org/wiki/YAML) in the | ||
are defined in [YAML](https://en.wikipedia.org/wiki/YAML) in the | ||
`symbols/<input language>/symbols.yaml` file. A symbol definition looks like: | ||
@@ -184,8 +171,8 @@ | ||
### Types | ||
Each input language also has a set of types that are part of the language. | ||
The set of types that are universal for all languages (i.e. "primitives", | ||
"literals", like `string`, `integer`, etc) are defined in the file | ||
Each input language also has a set of types that are part of the language. | ||
The set of types that are universal for all languages (i.e. "primitives", | ||
"literals", like `string`, `integer`, etc) are defined in the file | ||
`symbols/basic_types.yaml`. | ||
Types that are specific to the input language are defined in `symbols/<input | ||
Types that are specific to the input language are defined in `symbols/<input | ||
language>/types.yaml`. These include BSON types, i.e. classes like `ObjectId`, and | ||
@@ -195,23 +182,23 @@ language-specific types like `RegExp` and `Date`. The types are defined in the same | ||
NOTE: It is important not to mix up symbols and types, especially since they can share | ||
the same identifier and are basically the same thing but we have to make a distinction somewhere | ||
NOTE: It is important not to mix up symbols and types, especially since they can share | ||
the same identifier and are basically the same thing but we have to make a distinction somewhere | ||
because otherwise we will end up with invalid code. | ||
The **symbol** `ObjectId` has attributes like `ObjectId.fromString(...)` | ||
and is a constructor, so `ObjectId()` is valid. The **type** `ObjectId` has | ||
attributes like `ObjectId().toString()` and is *a variable*, so `ObjectId()()` | ||
The **symbol** `ObjectId` has attributes like `ObjectId.fromString(...)` | ||
and is a constructor, so `ObjectId()` is valid. The **type** `ObjectId` has | ||
attributes like `ObjectId().toString()` and is *a variable*, so `ObjectId()()` | ||
is not valid and will error with `ObjectId() is not callable` or similar error. | ||
You can kind of think of types as instantiated symbols, if that's helpful. | ||
So: `ObjectId.toString() and ObjectId().fromString('x')` are both invalid, while | ||
`ObjectId().toString() and ObjectId.fromString('x')` are both valid. | ||
`ObjectId().toString() and ObjectId.fromString('x')` are both valid. | ||
## Templates | ||
The symbol table includes an additional piece of metadata, called a `template`. | ||
These are functions that accept strings and return strings, and are responsible for | ||
These are functions that accept strings and return strings, and are responsible for | ||
doing the string transformations from one language syntax to another language's syntax. | ||
They are defined in `symbols/<ouput language>/templates.yaml`. This is where | ||
They are defined in `symbols/<ouput language>/templates.yaml`. This is where | ||
the majority of code generation happens, so the templates are **specific to the output language**. | ||
Some templates take additional arguments, which are commented in symbols/sample_template.yaml. | ||
Templates can be split into `template` and `argTemplate`. For symbols that are function | ||
calls, the `argsTemplate` is a function that gets applied to the arguments in case they | ||
Templates can be split into `template` and `argTemplate`. For symbols that are function | ||
calls, the `argsTemplate` is a function that gets applied to the arguments in case they | ||
need rearranging between languages. | ||
@@ -232,3 +219,3 @@ | ||
- `codegeneration/<ouput language>/Generator.js` - The generator for the specific output language. | ||
- `lib/symbol-table/<input language>to<ouput language>.js` - The symbol table for | ||
- `lib/symbol-table/<input language>to<ouput language>.js` - The symbol table for | ||
the input+output combination. | ||
@@ -304,3 +291,3 @@ | ||
} | ||
/* ... and every other language that can compile to your language. | ||
/* ... and every other language that can compile to your language. | ||
* Make sure you update the getTree method, as well as the input-language | ||
@@ -338,13 +325,13 @@ * specific visitor and the ANTLR visitor to match the input lang. */ | ||
}; | ||
`` | ||
`` | ||
``` | ||
9. Next thing is tests! You must go through each test file and add the results of | ||
compiling each input into your output language under the `output` field. | ||
compiling each input into your output language under the `output` field. | ||
```yaml | ||
Document: | ||
- input: | ||
Document: | ||
- input: | ||
javascript: "{x: '1'}" | ||
shell: "{x: '1'}" | ||
python: "{'x': '1'}" | ||
output: | ||
output: | ||
javascript: "{\n 'x': '1'\n}" | ||
@@ -351,0 +338,0 @@ python: "{\n 'x': '1'\n}" |
{ | ||
"name": "bson-transpilers", | ||
"version": "0.13.4", | ||
"version": "0.13.5", | ||
"apiVersion": "0.0.1", | ||
@@ -15,8 +15,9 @@ "productName": "BSON Transpilers", | ||
"start": "node index.js", | ||
"precompile": "node download-antlr.js", | ||
"compile": "npm run antlr4-js && npm run antlr4-py && npm run symbol-table", | ||
"antlr4-js": "java -Xmx500M -cp '/usr/local/lib/antlr-4.7.2-complete.jar:$CLASSPATH' org.antlr.v4.Tool -Dlanguage=JavaScript -lib grammars -o lib/antlr -visitor -Xexact-output-dir grammars/ECMAScript.g4", | ||
"antlr4-py": "java -Xmx500M -cp '/usr/local/lib/antlr-4.7.2-complete.jar:$CLASSPATH' org.antlr.v4.Tool -Dlanguage=JavaScript -lib grammars -o lib/antlr -visitor -Xexact-output-dir grammars/Python3.g4", | ||
"antlr4-js": "java -Xmx500M -cp './antlr-4.7.2-complete.jar:$CLASSPATH' org.antlr.v4.Tool -Dlanguage=JavaScript -lib grammars -o lib/antlr -visitor -Xexact-output-dir grammars/ECMAScript.g4", | ||
"antlr4-py": "java -Xmx500M -cp './antlr-4.7.2-complete.jar:$CLASSPATH' org.antlr.v4.Tool -Dlanguage=JavaScript -lib grammars -o lib/antlr -visitor -Xexact-output-dir grammars/Python3.g4", | ||
"symbol-table": "node compile-symbol-table.js", | ||
"test": "npm run symbol-table && mocha", | ||
"prepublish": "npm run compile", | ||
"prepublishOnly": "npm run compile", | ||
"check": "mongodb-js-precommit './codegeneration/**/*{.js,.jsx}' './test/**/*.js' index.js", | ||
@@ -23,0 +24,0 @@ "ci": "npm run check && npm run test" |
@@ -9,2 +9,4 @@ # BSON-Transpilers | ||
See also the original presentation: https://drive.google.com/file/d/1jvwtR3k9oBUzIjL4z_VtpHvdWahfcjTK/view | ||
# Usage | ||
@@ -60,3 +62,3 @@ | ||
### Errors | ||
There are a few different error classes thrown by `bson-transpilers`, each with | ||
There are a few different error classes thrown by `bson-transpilers`, each with | ||
their own error code: | ||
@@ -102,3 +104,3 @@ | ||
#### BsonTranspilersSyntaxError | ||
###### code: E_BSONTRANSPILERS_SYNTAX | ||
###### code: E_BSONTRANSPILERS_SYNTAX | ||
This will throw if you have a syntax error. For example missing a colon in | ||
@@ -114,3 +116,3 @@ Object assignment, or forgetting a comma in array definition: | ||
// ✔: neither of these will throw | ||
// ✔: neither of these will throw | ||
{ key: 'beep' } | ||
@@ -134,4 +136,4 @@ [ 'beep', 'boop', 'beepBoop' ] | ||
###### code: E_BSONTRANSPILERS_UNIMPLEMENTED | ||
If there is a feature in the input code that is not currently supported by the | ||
transpiler. | ||
If there is a feature in the input code that is not currently supported by the | ||
transpiler. | ||
@@ -141,3 +143,3 @@ #### BsonTranspilersRuntimeError | ||
A generic runtime error will be thrown for all errors that are not covered by the | ||
above list of errors. These are usually constructor requirements, for example | ||
above list of errors. These are usually constructor requirements, for example | ||
when using a `RegExp()` an unsupported flag is given: | ||
@@ -155,3 +157,3 @@ | ||
###### code: E_BSONTRANSPILERS_INTERNAL | ||
In the case where something has gone wrong within compilation, and an error has | ||
In the case where something has gone wrong within compilation, and an error has | ||
occured. If you see this error, please create [an issue](https://github.com/mongodb-js/bson-transpilers/issues) on Github! | ||
@@ -158,0 +160,0 @@ |
Network access
Supply chain riskThis module accesses the network.
Found 1 instance in 1 package
New author
Supply chain riskA new npm collaborator published a version of the package for the first time. New collaborators are usually benign additions to a project, but do indicate a change to the security surface area of a package.
Found 1 instance in 1 package
New author
Supply chain riskA new npm collaborator published a version of the package for the first time. New collaborators are usually benign additions to a project, but do indicate a change to the security surface area of a package.
Found 1 instance in 1 package
30326
177
2197616
3
2