Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

minilex

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

minilex

0.1.0
Rubygems

Version published: 13 years ago

Maintainers: 1

Created: 13 years ago

Source

Minilex

A little lexer toolkit, for basic lexing needs.

It's designed for the cases where parsers do the parsing, and all you need from your lexer is an array of simple tokens.

Usage

Expression = Minilex::Lexer.new do
  skip :whitespace, /\s+/
  tok :number, /\d+(?:\.\d+)?/
  tok :operator, /[\+\=\/\*]/
end

Expression.lex('1 + 2.34')
# => [[:number, '1', 1, 0],
#     [:operator, '+', 1, 3],
#     [:number, '2.34', 1, 5]
#     [:eos]]

To create a lexer with Lex, instantiate a Minilex::Lexer and define rules.

There are two methods for defining rules, skip and tok:

skip takes an id and a pattern. The lexer will ignore all occurrences of the pattern in the input text. The id isn't strictly necessary, but it's nice for readability and is a required argument.

tok also takes an id and a pattern. The lexer will turn all occurrences of the pattern into a token of the form:

[id, value, line, offset]

# id     - the id you provided
# value  - the matched value
# line   - line number
# offset - character position in the line

Overriding the token format

If you'd like to customize the token format, override append_token:

Digits = Minilex::Lexer.new do
  skip :whitespace, /\s+/
  tok :digit, /\d/

  # id    - the id of the matched rule
  # value - the value that was matched
  #
  # You have access to the array of tokens via `tokens` and the current
  # token's position # information via `pos`.
  def append_token(id, value)
    tokens << Integer(value)
  end

  # By default, the lexer will append an end-of-stream token to the end of
  # the tokens array. You can override what the eos token is or even suppress
  # it altogether with the append_eos callback.
  #
  # Here we'll suppress it by doing nothing
  def append_eos
  end
end

digits.lex('1 2 3 4')
# => [1, 2, 3, 4]

Processing values

There's one more thing you can do. It's just for convenience, though I'm not sure it really belongs in something that's supposed to do as little as possible. I might remove it.

The tok method accepts a third optional processor argument, which should name a method on the lexer (you'll have to write the method, of course).

What this will do is give you a chance to get at the matched text before it gets stuffed into a token:

DigitsConverter = Minilex::Lexer.new do
  skip :whitespace, /\s+/
  tok :digit, /\d/, :integer

  def integer(str)
    Integer(str)
  end
end

DigitsConverter.lex('123')
# => [[:digit, 1, 1, 0], [:digit, 2, 1, 1], [:digit, 3, 1, 2], [:eos]]
#              ^                  ^                  ^
#              ^                  ^                  ^
#            These are Integers (would have been Strings)

FAQs

What is minilex?

Is minilex well maintained?

Package last updated on 30 Apr 2012

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

minilex

Minilex

Usage

Overriding the token format

Processing values

Related posts

Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm

Malicious npm Package Typosquats Popular TypeScript ESLint Plugin, Exfiltrates Data and Enables Remote Exploitation