Socket
Socket
Sign inDemoInstall

leac

Package Overview
Dependencies
Maintainers
1
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

leac - npm Package Compare versions

Comparing version 0.5.0 to 0.5.1

4

CHANGELOG.md
# Changelog
## Version 0.5.1
- documentation updates.
## Version 0.5.0

@@ -4,0 +8,0 @@

21

lib/leac.d.ts

@@ -70,3 +70,7 @@ /** Lexer options (not many so far). */

name: string;
/** Matched token won't be added to the output array if this set to `true`. */
/**
* Matched token won't be added to the output array if this set to `true`.
*
* (_Think twice before using this._)
* */
discard?: boolean;

@@ -103,6 +107,9 @@ /**

*
* Can't have the global flag.
* - Can't have the global flag.
*
* All regular expressions are used as sticky,
* you don't have to specify the sticky flag.
* - All regular expressions are used as sticky,
* you don't have to specify the sticky flag.
*
* - Empty matches are considered as non-matches -
* no token will be emitted in that case.
*/

@@ -117,3 +124,3 @@ regex: RegExp;

* Note: the regex has to be able to match the matched substring when taken out of context
* in order for replace to work - boundary/neighbourhood conditions may prevent this.
* in order for replace to work - boundary/neighborhood conditions may prevent this.
*/

@@ -127,4 +134,4 @@ replace?: string;

*
* Rules can have the same name - you can have separate rules
* for keywords and use the same name "keyword" for example.
* Rules can have the same name. For example, you can have
* separate rules for various keywords and use the same name "keyword".
*/

@@ -131,0 +138,0 @@ export declare type Rules = [

{
"name": "leac",
"version": "0.5.0",
"version": "0.5.1",
"description": "Lexer / tokenizer",

@@ -54,20 +54,20 @@ "keywords": [

"@tsconfig/node12": "^1.0.9",
"@types/node": "12.20.25",
"@typescript-eslint/eslint-plugin": "^4.33.0",
"@typescript-eslint/parser": "^4.33.0",
"ava": "^3.15.0",
"concurrently": "^6.3.0",
"denoify": "^0.10.5",
"eslint": "^7.32.0",
"eslint-plugin-jsonc": "^1.7.0",
"@types/node": "12.20.42",
"@typescript-eslint/eslint-plugin": "^5.10.1",
"@typescript-eslint/parser": "^5.10.1",
"ava": "^4.0.1",
"concurrently": "^7.0.0",
"denoify": "^0.11.0",
"eslint": "^8.7.0",
"eslint-plugin-jsonc": "^2.1.0",
"eslint-plugin-tsdoc": "^0.2.14",
"markdownlint-cli2": "^0.3.2",
"markdownlint-cli2": "^0.4.0",
"rimraf": "^3.0.2",
"rollup": "^2.58.3",
"rollup": "^2.66.1",
"rollup-plugin-terser": "^7.0.2",
"ts-node": "^10.4.0",
"tslib": "^2.3.1",
"typedoc": "^0.22.7",
"typedoc-plugin-markdown": "^3.11.3",
"typescript": "~4.4.4"
"typedoc": "^0.22.11",
"typedoc-plugin-markdown": "^3.11.12",
"typescript": "~4.5.5"
},

@@ -85,5 +85,2 @@ "ava": {

],
"nonSemVerExperiments": {
"configurableModuleFormat": true
},
"verbose": true

@@ -90,0 +87,0 @@ },

@@ -6,2 +6,4 @@ # leac

[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/mxxii/leac/blob/main/LICENSE)
[![npm](https://img.shields.io/npm/v/leac?logo=npm)](https://www.npmjs.com/package/leac)
[![deno](https://img.shields.io/badge/deno.land%2Fx%2F-leac-informational?logo=deno)](https://deno.land/x/leac)

@@ -27,3 +29,3 @@ Lexer / tokenizer.

- **Only text tokens, no arbitraty values**. It seems to be a good habit to have tokens that are *trivially* serializable back into valid input string. Don't do the parser's job. There are a couple of convenience features such as the ability to discard matches or string replacements for regular expression rules but that has to be used mindfully.
- **Only text tokens, no arbitrary values**. It seems to be a good habit to have tokens that are *trivially* serializable back into a valid input string. Don't do the parser's job. There are a couple of convenience features such as the ability to discard matches or string replacements for regular expression rules but that has to be used mindfully (more on this below).

@@ -33,11 +35,24 @@

### Node
```shell
> npm i leac
> yarn add leac
```
```ts
import { createLexer, Token } from 'leac';
```
### Deno
```ts
import { createLexer, Token } from 'https://deno.land/x/leac@.../leac.ts';
```
## Examples
- [JSON](https://github.com/mxxii/leac/blob/main/examples/json.ts);
- [Calc](https://github.com/mxxii/leac/blob/main/examples/calc.ts).
- [JSON](https://github.com/mxxii/leac/blob/main/examples/json.ts) ([output snapshot](https://github.com/mxxii/leac/blob/main/test/snapshots/examples.ts.md#json));
- [Calc](https://github.com/mxxii/leac/blob/main/examples/calc.ts) ([output snapshot](https://github.com/mxxii/leac/blob/main/test/snapshots/examples.ts.md#calc)).

@@ -61,2 +76,35 @@ ```typescript

## A word of caution
It is often really tempting to rewrite token on the go. But it can be dangerous unless you are absolutely mindful of all edge cases.
For example, who needs to carry string quotes around, right? Parser will only need the string content...
We'll have to consider following things:
- Regular expressions. Sometimes we want to match strings that can have a length *from zero* and up.
- Tokens are not produced without changing the offset. If something is missing - there is no token.
If we allow a token with zero length - it will cause an infinite loop, as the same rule will be matched at the same offset, again and again.
- Discardable tokens - a convenience feature that may seem harmless at a first glance.
When put together, these things plus some intuition traps can lead to a broken array of tokens.
Strings can be empty, which means the token can be absent. With no content and no quotes the tokens array will most likely make no sense for a parser.
How to avoid potential issues:
- Don't discard anything that you may need to insert back if you try to immediately serialize the tokens array to string. This means whitespace are usually safe to discard while string quotes are not (what can be considered safe will heavily depend on the grammar - you may have a language with significant spaces and insignificant quotes...);
- You can introduce a higher priority rule to capture an empty string (opening quote immediately followed by closing quote) and emit a special token for that. This way empty string between quotes can't occur down the line;
- Match the whole string (content and quotes) with a single regular expression, let the parser deal with it. This can actually lead to a cleaner design than trying to be clever and removing "unnecessary" parts early;
- Match the whole string (content and quotes) with a single regular expression, use capture groups and [replace](https://github.com/mxxii/leac/blob/main/docs/interfaces/RegexRule.md#replace) property. This can produce a non-zero length token with empty text.
Another note about quotes: If the grammar allows for different quotes and you're still willing to get rid of them early - think how you're going to unescape the string later. Make sure you carry the information about the exact string kind in the token name at least - you will need it later.
## What about ...?

@@ -63,0 +111,0 @@

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc