Socket
Socket
Sign inDemoInstall

baa-lexer

Package Overview
Dependencies
0
Maintainers
1
Versions
13
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    baa-lexer

![](img/baa-sheep-lemmling.svg)


Version published
Weekly downloads
0
Maintainers
1
Install size
28.1 kB
Created
Weekly downloads
 

Changelog

Source

0.3.1 (2023-05-26)

Performance Improvements

  • remove redundant rules (c01ca0b)

Features

  • add debug method for human inspection (7f9d13c)

Readme

Source

Original image by lemmling on OpenClipArt.org

Baa!

Baa is a highly-optimised tokenizer/lexer written in TypeScript. It is inspired by moo , but completely rewritten.

It accepts most of moo's configurations, but lacks some features.

  • No support for arrays of keywords.
  • No support for rules that are arrays of rule definitions.
  • No support for regular expressions with unicode flag
  • Less dynamic checks (e.g. silently drops all provided regex flags)

Advantages:

  • Compiles to a reusable concurrency-save lexer instead of creating an iterable object directly (see "Usage").
  • Different token format.
  • Slightly faster than moo (at least not much slower)
  • About 2.2kb of size.
  • Strong typings, including state-names and token-types
  • Understandable code

Note: This was mostly an exercise for me to practice test-driven development and think about architecture a bit. In the end, I tried to optimize speed and build size. I don't think it makes a lot of difference whether you use moo or baa. moo is more popular and may be better supported in the long run. I will use baa in handlebars-ng though.

Installation

Install the baa-lexer with

npm install baa-lexer

Usage

The examples/ show you how to use baa. One of the simple examples is this:

import { baa } from "baa-lexer";

const lexer = baa({
  main: {
    A: "a",
    FALLBACK: { fallback: true },
    B: "b",
  },
});

for (const token of lexer.lex("a b")) {
  console.log(token);
}

This will print in the following tokens:

{ type: 'A',  original: 'a', value: 'a', start: { line: 1, column: 0 }, end: { line: 1, column: 1 } }
{ type: 'FALLBACK', original: ' ', value: ' ', start: { line: 1, column: 1 }, end: { line: 1, column: 2 } }
{ type: 'B', original: 'b', value: 'b', start: { line: 1, column: 2 }, end: { line: 1, column: 3 } }

For a complete list of rules, have a look at the tests

Using types

If you create a type

interface Typings {
  tokenType: "my" | "token" | "types";
  stateName: "my" | "state" | "names";
}

and pass it as generic to the baa function, you will get auto-completion for types within the configuration as well as for the "type" field in the created tokens. The following screenshot highlights all places that are type-checked and auto-completed.

Benchmarks

See performance/ for the exact tests and run then yourself with

yarn perf

These are the results, but be aware that results may vary a lot:

 BENCH  Summary

  moo - performance/moo-baa.bench.ts > moo-baa test: './tests/abab.ts' (+0)
    1.07x faster than baa

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/fallback.ts' (+0)
    1.19x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (+0)
    1.50x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (1)
    1.25x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/handlears-ng.ts' (2)
    1.19x faster than moo

  baa - performance/moo-baa.bench.ts > moo-baa test: './tests/json-regex.ts' (+0)
    1.15x faster than moo

  moo - performance/moo-baa.bench.ts > moo-baa test: './tests/json.ts' (+0)
    1.04x faster than baa

Readable / Extendable code

What bothered me most about moo was that it is just one large JavaScript file, and it took me a long while to understand all the optimizations they implemented.

It tried to take modular approach. Basically the whole program is divided into

  • The Lexer: Responsible for creating an IterableIterator of tokens which then manages state transitions. Uses the TokenFactory to create the actual tokens.
  • The Matcher: Finds the next token match. There are different strategies
    • RegexMatcher: Creates a large regex to find the next match
    • StickySingleCharMatcher: Uses an array to map char-codes to rules. Can only find single-char tokens, but this can be done much faster than with Regex.
  • The StateProcessor: Uses the Matcher to find the next match, interleaves matches for fallback and error rules.
  • The TokenFactory: Keeps track of the current location and creates tokens from matches.
  • The mooAdapter takes a moo-config and combines all those components so that they do what they should.

Advances usage

You do not have to use the mooAdapter though: Most the internal components are exposed, so you can use them yourself. You can create a StateProcess and pass your own Matcher instance to it. You can create a completely new StateProcessor with completely custom logic.

The program could also be extended to allow a custom TokenFactory, applying the token format that you need (but I won't do this unless somebody needs it).

FAQs

Last updated on 26 May 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc