snapdragon-lexer
Converts a string into an array of tokens, with useful methods for looking ahead and behind, capturing, matching, et cetera.
Please consider following this project's author, Jon Schlinkert, and consider starring the project to show your :heart: and support.
Table of Contents
Details
Install
Install with npm:
$ npm install --save snapdragon-lexer
Breaking changes in v2.0!
Please see the changelog for details!
Usage
const Lexer = require('snapdragon-lexer');
const lexer = new Lexer();
lexer.capture('slash', /^\//);
lexer.capture('text', /^\w+/);
lexer.capture('star', /^\*/);
console.log(lexer.tokenize('foo/*'));
API
Create a new Lexer
with the given options
.
Params
input
{String|Object}: (optional) Input string or options. You can also set input directly on lexer.input
after initializing.options
{Object}
Example
const Lexer = require('snapdragon-lexer');
const lexer = new Lexer('foo/bar');
Create a new Token with the given type
and value
.
Params
type
{String|Object}: (required) The type of token to createvalue
{String}: (optional) The captured stringmatch
{Array}: (optional) Match arguments returned from String.match
or RegExp.exec
returns
{Object}: Returns an instance of snapdragon-token
Events
Example
console.log(lexer.token({type: 'star', value: '*'}));
console.log(lexer.token('star', '*'));
console.log(lexer.token('star'));
Returns true if the given value is a snapdragon-token instance.
Params
token
{Object}returns
{Boolean}
Example
const Token = require('snapdragon-token');
lexer.isToken({});
lexer.isToken(new Token({type: 'star', value: '*'}));
Consume the given length from lexer.string
. The consumed value is used to update lexer.consumed
, as well as the current position.
Params
len
{Number}value
{String}: Optionally pass the value being consumed.returns
{String}: Returns the consumed value
Example
lexer.consume(1);
lexer.consume(1, '*');
Use the given regex
to match a substring from lexer.string
. Also validates the regex to ensure that it starts with ^
since matching should always be against the beginning of the string, and throws if the regex matches an empty string, which can cause catastrophic backtracking.
Params
regex
{RegExp}: (required)returns
{Array|null}: Returns the match array from RegExp.exec
or null.
Example
const lexer = new Lexer('foo/bar');
const match = lexer.match(/^\w+/);
console.log(match);
Scan for a matching substring by calling .match() with the given regex
. If a match is found, 1) a token of the specified type
is created, 2) match[0]
is used as token.value
, and 3) the length of match[0]
is sliced from lexer.string
(by calling .consume()).
Params
type
{String}regex
{RegExp}returns
{Object}: Returns a token if a match is found, otherwise undefined.
Events
Example
lexer.string = '/foo/';
console.log(lexer.scan(/^\//, 'slash'));
console.log(lexer.scan(/^\w+/, 'text'));
console.log(lexer.scan(/^\//, 'slash'));
Capture a token of the specified type
using the provide regex
for scanning and matching substrings. Automatically registers a handler when a function is passed as the last argument.
Params
type
{String}: (required) The type of token being captured.regex
{RegExp}: (required) The regex for matching substrings.fn
{Function}: (optional) If supplied, the function will be called on the token before pushing it onto lexer.tokens
.returns
{Object}
Example
lexer.capture('text', /^\w+/);
lexer.capture('text', /^\w+/, token => {
if (token.value === 'foo') {
}
return token;
});
Calls handler type
on lexer.string
.
Params
type
{String}: The handler type to call on lexer.string
returns
{Object}: Returns a token of the given type
or undefined.
Events
Example
const lexer = new Lexer('/a/b');
lexer.capture('slash', /^\//);
lexer.capture('text', /^\w+/);
console.log(lexer.handle('text'));
console.log(lexer.handle('slash'));
console.log(lexer.handle('text'));
Get the next token by iterating over lexer.handlers
and calling each handler on lexer.string
until a handler returns a token. If no handlers return a token, an error is thrown with the substring that couldn't be lexed.
returns
{Object}: Returns the first token returned by a handler, or the first character in the remaining string if options.mode
is set to character
.
Example
const token = lexer.advance();
Tokenizes a string and returns an array of tokens.
Params
input
{String}: The string to tokenize.returns
{Array}: Returns an array of tokens.
Example
lexer.capture('slash', /^\//);
lexer.capture('text', /^\w+/);
const tokens = lexer.tokenize('a/b/c');
console.log(tokens);
Push a token onto the lexer.queue
array.
Params
token
{Object}returns
{Object}: Returns the given token with updated token.index
.
Example
console.log(lexer.queue.length);
lexer.enqueue(new Token('star', '*'));
console.log(lexer.queue.length);
Shift a token from lexer.queue
.
returns
{Object}: Returns the given token with updated token.index
.
Example
console.log(lexer.queue.length);
lexer.dequeue();
console.log(lexer.queue.length);
Lookbehind n
tokens.
Params
n
{Number}returns
{Object}
Example
const token = lexer.lookbehind(2);
Get the previous token.
returns
{Object}: Returns a token.
Example
const token = lexer.prev();
Lookahead n
tokens and return the last token. Pushes any intermediate tokens onto lexer.tokens.
To lookahead a single token, use .peek().
Params
n
{Number}returns
{Object}
Example
const token = lexer.lookahead(2);
Lookahead a single token.
Example
const token = lexer.peek();
Get the next token, either from the queue
or by advancing.
returns
{Object|String}: Returns a token, or (when options.mode
is set to character
) either gets the next character from lexer.queue
, or consumes the next charcter in the string.
Example
const token = lexer.next();
Skip n
tokens or characters in the string. Skipped values are not enqueued.
Params
n
{Number}returns
{Object}: returns the very last lexed/skipped token.
Example
const token = lexer.skip(1);
Skip the given token types
.
Params
types
{String|Array}: One or more token types to skip.returns
{Array}: Returns an array if skipped tokens.
Example
lexer.skipWhile(tok => tok.type !== 'space');
Skip the given token types
.
Params
types
{String|Array}: One or more token types to skip.returns
{Array}: Returns an array if skipped tokens.
Example
lexer.skipWhile(tok => tok.type !== 'space');
Skip the given token types
.
Params
types
{String|Array}: One or more token types to skip.returns
{Array}: Returns an array if skipped tokens
Example
lexer.skipType('space');
lexer.skipType(['newline', 'space']);
Pushes the given value
onto lexer.stash
.
Params
value
{any}returns
{Object}: Returns the Lexer instance.
Events
Example
lexer.append('abc');
lexer.append('/');
lexer.append('*');
lexer.append('.');
lexer.append('js');
console.log(lexer.stash);
Pushes the given token
onto lexer.tokens
and calls .append() to push token.value
onto lexer.stash
. Disable pushing onto the stash by setting lexer.options.append
or token.append
to false
.
Params
token
{Object|String}returns
{Object}: Returns the given token
.
Events
Example
console.log(lexer.tokens.length);
lexer.push(new Token('star', '*'));
console.log(lexer.tokens.length);
console.log(lexer.stash)
Returns true if a token with the given type
is on the stack.
Params
type
{String}: The type to check for.returns
{Boolean}
Example
if (lexer.isInside('bracket') || lexer.isInside('brace')) {
}
Returns the value of a token using the property defined on lexer.options.value
or token.value
.
returns
{String|undefined}
Returns true if lexer.string
and lexer.queue
are empty.
Creates a new Lexer instance with the given options, and copy
the handlers from the current instance to the new instance.
Params
options
{Object}parent
{Object}: Optionally pass a different lexer instance to copy handlers from.returns
{Object}: Returns a new Lexer instance
Throw a formatted error message with details including the cursor position.
Params
msg
{String}: Message to use in the Error.node
{Object}returns
{undefined}
Example
lexer.set('foo', function(tok) {
if (tok.value !== 'foo') {
throw this.error('expected token.value to be "foo"', tok);
}
});
Static method that returns true if the given value is an instance of snapdragon-lexer
.
Params
lexer
{Object}returns
{Boolean}
Example
const Lexer = require('snapdragon-lexer');
const lexer = new Lexer();
console.log(Lexer.isLexer(lexer));
console.log(Lexer.isLexer({}));
Static method for getting or setting the Stack
constructor.
Static method for getting or setting the Token
constructor, used
by lexer.token()
to create a new token.
Static method that returns true if the given value is an instance of snapdragon-token
. This is a proxy to Token#isToken
.
Params
lexer
{Object}returns
{Boolean}
Example
const Token = require('snapdragon-token');
const Lexer = require('snapdragon-lexer');
console.log(Lexer.isToken(new Token({type: 'foo'})));
console.log(Lexer.isToken({}));
.set
Register a handler function.
Params
type
{String}fn
{Function}: The handler function to register.
Example
lexer.set('star', function(token) {
});
As an alternative to .set
, the .capture method will automatically register a handler when a function is passed as the last argument.
.get
Get a registered handler function.
Params
type
{String}fn
{Function}: The handler function to register.
Example
lexer.set('star', function() {
});
const star = handlers.get('star');
Properties
lexer.isLexer
Type: {boolean}
Default: true
(contant)
This property is defined as a convenience, to make it easy for plugins to check for an instance of Lexer.
lexer.input
Type: {string}
Default: ''
The unmodified source string provided by the user.
lexer.string
Type: {string}
Default: ''
The source string minus the part of the string that has already been consumed.
lexer.consumed
Type: {string}
Default: ''
The part of the source string that has been consumed.
lexer.tokens
Type: {array}
Default: `[]
Array of lexed tokens.
lexer.stash
Type: {array}
Default: ['']
(instance of snapdragon-stack)
Array of captured strings. Similar to the lexer.tokens array, but stores strings instead of token objects.
lexer.stack
Type: {array}
Default: `[]
LIFO (last in, first out) array. A token is pushed onto the stack when an "opening" character or character sequence needs to be tracked. When the (matching) "closing" character or character sequence is encountered, the (opening) token is popped off of the stack.
The stack is not used by any lexer methods, it's reserved for the user. Stacks are necessary for creating Abstract Syntax Trees (ASTs), but if you require this functionality it would be better to use a parser such as [snapdragon-parser][snapdragon-parser], with methods and other conveniences for creating an AST.
lexer.queue
Type: {array}
Default: `[]
FIFO (first in, first out) array, for temporarily storing tokens that are created when .lookahead() is called (or a method that calls .lookhead()
, such as .peek()).
Tokens are dequeued when .next() is called.
lexer.loc
Type: {Object}
Default: { index: 0, column: 0, line: 1 }
The updated source string location with the following properties.
index
- 0-indexcolumn
- 0-indexline
- 1-index
The following plugins are available for automatically updating tokens with the location:
Options
options.source
Type: {string}
Default: undefined
The source of the input string. This is typically a filename or file path, but can also be 'string'
if a string or buffer is provided directly.
If lexer.input
is undefined, and options.source
is a string, the lexer will attempt to set lexer.input
by calling fs.readFileSync()
on the value provided on options.source
.
options.mode
Type: {string}
Default: undefined
If options.mode
is character
, instead of calling handlers (which match using regex) the .advance() method will consume and return one character at a time.
options.value
Type: {string}
Default: undefined
Specify the token property to use when the .push method pushes a value onto lexer.stash. The logic works something like this:
lexer.append(token[lexer.options.value || 'value']);
Tokens
See the snapdragon-token documentation for more details.
Plugins
Plugins are registered with the lexer.use()
method and use the following conventions.
Plugin Conventions
Plugins are functions that take an instance of snapdragon-lexer.
However, it's recommended that you always wrap your plugin function in another function that takes an options object. This allow users to pass options when using the plugin. Even if your plugin doesn't take options, it's a best practice for users to always be able to use the same signature.
Example
function plugin(options) {
return function(lexer) {
};
}
lexer.use(plugin());
About
Contributing
Pull requests and stars are always welcome. For bugs and feature requests, please create an issue.
Please read the contributing guide for advice on opening issues, pull requests, and coding standards.
Running Tests
Running and reviewing unit tests is a great way to get familiarized with a library and its API. You can install dependencies and run tests with the following command:
$ npm install && npm test
Building docs
(This project's readme.md is generated by verb, please don't edit the readme directly. Any changes to the readme must be made in the .verb.md readme template.)
To generate the readme, run the following command:
$ npm install -g verbose/verb
Related projects
You might also be interested in these projects:
Author
Jon Schlinkert
License
Copyright © 2018, Jon Schlinkert.
Released under the MIT License.
This file was generated by verb-generate-readme, v0.6.0, on February 16, 2018.