CJS Module Lexer
A very fast JS CommonJS module syntax lexer used to detect the most likely list of named exports of a CommonJS module.
Outputs the list of named exports (exports.name = ...
), whether the __esModule
interop flag is used, and possible module reexports (module.exports = require('...')
).
For an example of the performance, Angular 1 (720KiB) is fully parsed in 5ms, in comparison to the fastest JS parser, Acorn which takes over 100ms.
Comprehensively handles the JS language grammar while remaining small and fast. - ~10ms per MB of JS cold and ~5ms per MB of JS warm, see benchmarks for more info.
Usage
npm install cjs-module-lexer
For use in CommonJS:
const { init, parse } = require('cjs-module-lexer');
(async () => {
await init();
const { exports, reexports, esModule } = parse(`
// named exports detection
module.exports.a = 'a';
(function () {
exports.b = 'b';
})();
Object.defineProperty(exports, 'c', { value: 'c' });
/* exports.d = 'not detected'; */
// reexports detection
if (maybe) module.exports = require('./dep1.js');
if (another) module.exports = require('./dep2.js');
// literal exports assignments
module.exports = { a, b: c, d, 'e': f }
// __esModule detection
Object.defineProperty(module.exports, '__esModule', { value: true })
`);
})();
Grammar
CommonJS exports matches are run against the source token stream.
The token grammar is:
IDENTIFIER: As defined by ECMA-262, without support for identifier `\` escapes, filtered to remove strict reserved words:
"implements", "interface", "let", "package", "private", "protected", "public", "static", "yield", "enum"
STRING_LITERAL: A `"` or `'` bounded ECMA-262 string literal.
IDENTIFIER_STRING: ( `"` IDENTIFIER `"` | `'` IDENTIFIER `'` )
COMMENT_SPACE: Any ECMA-262 whitespace, ECMA-262 block comment or ECMA-262 line comment
MODULE_EXPORTS: `module` COMMENT_SPACE `.` COMMENT_SPACE `exports`
EXPORTS_IDENTIFIER: MODULE_EXPORTS_IDENTIFIER | `exports`
EXPORTS_DOT_ASSIGN: EXPORTS_IDENTIFIER COMMENT_SPACE `.` COMMENT_SPACE IDENTIFIER COMMENT_SPACE `=`
EXPORTS_LITERAL_COMPUTED_ASSIGN: EXPORTS_IDENTIFIER COMMENT_SPACE `[` COMMENT_SPACE IDENTIFIER_STRING COMMENT_SPACE `]` COMMENT_SPACE `=`
EXPORTS_LITERAL_PROP: (IDENTIFIER (COMMENT_SPACE `:` COMMENT_SPACE IDENTIFIER)?) | (IDENTIFIER_STRING COMMENT_SPACE `:` COMMENT_SPACE IDENTIFIER)
EXPORTS_MEMBER: EXPORTS_DOT_ASSIGN | EXPORTS_LITERAL_COMPUTED_ASSIGN
EXPORTS_DEFINE: `Object` COMMENT_SPACE `.` COMMENT_SPACE `defineProperty COMMENT_SPACE `(` EXPORTS_IDENTIFIER COMMENT_SPACE `,` COMMENT_SPACE IDENTIFIER_STRING
EXPORTS_LITERAL: MODULE_EXPORTS COMMENT_SPACE `=` COMMENT_SPACE `{` COMMENT_SPACE (EXPORTS_LITERAL_PROP COMMENT_SPACE `,` COMMENT_SPACE)+ `}`
WEBPACK_EXPORTS: `__webpack_exports__` COMMENT_SPACE `,` COMMENT_SPACE IDENTIFIER_STRING
EXPORTS_ASSIGN: MODULE_EXPORTS COMMENT_SPACE `=` COMMENT_SPACE `require` COMMENT_SPACE `(` STRING_LITERAL `)`
- The returned export names are the matched
IDENTIFIER
and IDENTIFIER_STRING
slots for all EXPORTS_MEMBER
, EXPORTS_DEFINE
and EXPORTS_LITERAL
matches. - The reexport specifiers are taken to be the
STRING_LITERAL
slots of all EXPORTS_ASSIGN
matches. - If
WEBPACK_EXPORTS
have matched slots, these IDENTIFIER_STRING
slots are returned instead of any of the export names and reexport names in (1) and (2) above.
Not Supported
- No scope analysis:
(function (exports) {
exports.a = 'a';
})(notExports);
(function (m) {
m.a = 'a';
})(exports);
module.exports
require assignment only handled at the base-level
module.exports = require('./a.js');
if (condition)
module.exports = require('./b.js');
if (condition) {
module.exports = require('./c.js');
}
(function () {
module.exports = require('./d.js');
})();
- No object parsing:
Object.defineProperties(exports, {
a: { value: 'a' },
b: { value: 'b' }
});
module.exports = {
c: 'c',
d: 'd'
}
- Webpack exports heuristic
exports.a = 'a';
exports.b = 'b';
__webpack_require__.d(__webpack_exports__, "WP_A", function() { return setBaseUrl; });
__webpack_require__.d(__webpack_exports__, "WP_B", function() { return setBaseUrl; });
Environment Support
Node.js 10+, and all browsers with Web Assembly support.
Grammar Support
- Token state parses all line comments, block comments, strings, template strings, blocks, parens and punctuators.
- Division operator / regex token ambiguity is handled via backtracking checks against punctuator prefixes, including closing brace or paren backtracking.
- Always correctly parses valid JS source, but may parse invalid JS source without errors.
Benchmarks
Benchmarks can be run with npm run bench
.
Current results:
Cold Run, All Samples
test/samples/*.js (3057 KiB)
> 24ms
Warm Runs (average of 25 runs)
test/samples/angular.js (719 KiB)
> 5.12ms
test/samples/angular.min.js (188 KiB)
> 3.04ms
test/samples/d3.js (491 KiB)
> 4.08ms
test/samples/d3.min.js (274 KiB)
> 2.04ms
test/samples/magic-string.js (34 KiB)
> 0ms
test/samples/magic-string.min.js (20 KiB)
> 0ms
test/samples/rollup.js (902 KiB)
> 5.92ms
test/samples/rollup.min.js (429 KiB)
> 3.08ms
Warm Runs, All Samples (average of 25 runs)
test/samples/*.js (3057 KiB)
> 17.4ms
Building
To build download the WASI SDK from https://github.com/CraneStation/wasi-sdk/releases.
The Makefile assumes that the clang
in PATH corresponds to LLVM 8 (provided by WASI SDK as well, or a standard clang 8 install can be used as well), and that ../wasi-sdk-6
contains the SDK as extracted above, which is important to locate the WASI sysroot.
The build through the Makefile is then run via make lib/lexer.wasm
, which can also be triggered via npm run build-wasm
to create dist/lexer.js
.
On Windows it may be preferable to use the Linux subsystem.
After the Web Assembly build, the CJS build can be triggered via npm run build
.
Optimization passes are run with Binaryen prior to publish to reduce the Web Assembly footprint.
License
MIT