Minilex
A little lexer toolkit, for basic lexing needs.
It's designed for the cases where parsers do the parsing, and all you need from
your lexer is an array of simple tokens.
Usage
Expression = Minilex::Lexer.new do
skip :whitespace, /\s+/
tok :number, /\d+(?:\.\d+)?/
tok :operator, /[\+\=\/\*]/
end
Expression.lex('1 + 2.34')
To create a lexer with Lex, instantiate a Minilex::Lexer
and define rules.
There are two methods for defining rules, skip
and tok
:
skip
takes an id
and a pattern
. The lexer will ignore all occurrences of
the pattern in the input text. The id
isn't strictly necessary, but it's nice
for readability and is a required argument.
tok
also takes an id
and a pattern
. The lexer will turn all occurrences
of the pattern into a token of the form:
[id, value, line, offset]
Overriding the token format
If you'd like to customize the token format, override append_token
:
Digits = Minilex::Lexer.new do
skip :whitespace, /\s+/
tok :digit, /\d/
def append_token(id, value)
tokens << Integer(value)
end
def append_eos
end
end
digits.lex('1 2 3 4')
Processing values
There's one more thing you can do. It's just for convenience, though I'm not
sure it really belongs in something that's supposed to do as little as
possible. I might remove it.
The tok
method accepts a third optional processor
argument, which should
name a method on the lexer (you'll have to write the method, of course).
What this will do is give you a chance to get at the matched text before it
gets stuffed into a token:
DigitsConverter = Minilex::Lexer.new do
skip :whitespace, /\s+/
tok :digit, /\d/, :integer
def integer(str)
Integer(str)
end
end
DigitsConverter.lex('123')