
Security News
Oxlint Introduces Type-Aware Linting Preview
Oxlint’s new preview brings type-aware linting powered by typescript-go, combining advanced TypeScript rules with native-speed performance.
It is however released because it offers quite some nice parsing of ARGV which shall be demonstrated by the following
[speculations](https://rubygems.org/gems/speculate_about)
See [README_spec.rb](spec/speculations/README_spec.rb) for the generated code for details
arg_parser
Given the following argument specification
include L43Peg::Combinators
let :args_spec do
{
start: "--start=(.*)",
end: "(?:--end|-e)=(.*)",
kwd: "--(alpha|beta|gamma)"
}
end
And the assoicated parser
let(:parser) { args_parser(args_spec) }
Then we can parse some input
assert_parse_success(parser, %w[--start=42 --beta -e=44], ast: {start: "42", kwd: "beta", end: "44"}, rest: [])
And we can get the rest in a list of tokens
assert_parse_success(parser, %w[--start=42 --beta -e=44 -s=not_an_arg --end=too_late], ast: {start: "42", kwd: "beta", end: "44"}, rest: %w[-s=not_an_arg --end=too_late])
Also note that multiple values are passed into an array
input = %w[--end=42 --beta -e=44 --beta --end=not_too_late --gamma]
ast = {end: %w[42 44 not_too_late], kwd: %w[beta beta gamma]}
assert_parse_success(parser, input, ast:, rest: [])
When we map the parser
let :int_args do
{
start: "--start=(.*)",
end: "--end=(.*)",
inc: "--inc=(.*)"
}
end
let(:int_arg_parser) {args_parser(int_args, name: "int parser", &:to_i)}
Then we can convert the string valus
assert_parse_success(int_arg_parser, %w[--start=42 --end=44 --inc=2], ast: {start: 42, end: 44, inc: 2}, rest: [])
An argument parser that respects itself provides a means to end argument parsing even if more matches follow.
An exmaple for that is the posix argument --
We can use whatever we want in args_parser
, here is a variation:
Given the specification
let :args do
{
width: "w:(\\d+)",
height: "h:(\\d+)",
__stop: "(::)"
}
end
let(:wh_parser) {args_parser(args, stop: :__stop, &:to_i)}
Then parsing the following input
input = %w[h:42 w:73 :: w:74]
ast = {height: 42, width: 73}
assert_parse_success(wh_parser, input, ast:, rest: %w[w:74])
Above we have seen that we had to include an internal module so to get access to the args_parser
.
Client code might not want to use these intrusive methods and therefore the parsers are also exposed
as module methods
Given an exposed args_parser
let :parser do
L43Peg::Parsers.args_parser(
{
negative: "(-\\d+)",
positive: "\\+?(\\d+)"
},
&:to_i
)
end
But we are also not interested in the internal representation of success and failure of parsing which was
used in the speculations above. Nor do we want to transform our input into the internal representations
as was done above by the helpers. (If you need to see the details of this you can inspect the
file parser_test.rb
in spec/support
)
Then we can uses the interface of L43Peg
L43Peg.parse_tokens(parser, %w[43 -44 +45]) => :ok, result
expect(result).to eq(positive: [43, 45], negative: -44)
And if we get an error the result is as follows
parser = L43Peg::Parsers.char_parser('a')
L43Peg.parse_string(parser, 'b') => :error, message
expect(message).to eq("char \"b\"")
The basic concept is the rgx_parser
Given a rgx_parser
for an identifier
include L43Peg::Parsers
let(:id_parser) { rgx_parser("[[:alpha:]][_[:alnum:]]*") }
Then we can parse strings that start as such
assert_parse_success(id_parser, "l43_peg", ast: "l43_peg")
And we can discard some input from the ast with the aid of captures
sym_parser = rgx_parser(":([[:alpha:]][_[:alnum:]]*)")
assert_parse_success(sym_parser, ":no_colon", ast: "no_colon")
But it can also fail
reason = "input does not match /\\A[[:alpha:]][_[:alnum:]]*/ (in rgx_parser(\"[[:alpha:]][_[:alnum:]]*\"))"
assert_parse_failure(id_parser, "42", reason:)
Oftentimes bugs in PEG parsing are caused by zero width matches, while this is quite obvious with the many
and
opt
or maybe
combinators (N.B. they are not yet implemented, use many(max: 1)
instead)
and they common use patterns with these combinators are safe.
However regular expression parsing might hide zero width matches, and that's whey they will trigger a warning by default
Given an empty match rgex parser
let(:empty_parser) { rgx_parser("a*") }
Then we get a warning when matching an empty string
expect { assert_parse_success(empty_parser, "", ast: "") }
.to output("Warning, parser rgx_parser(\"a*\") succeeds with empty match\n").to_stderr
However this behavior can also be disabled
And therefore
parser = rgx_parser("a*", warn_on_empty: false)
expect { assert_parse_success(parser, "", ast: "") }
.not_to output.to_stderr
Now we can use a list of rgx_parsers
to tokenize a string (in the same way can use tokens_parser
to
quantify elements of an array, but with dynamic bounds)
Given some regexen
let :regexen do
[
[:verb, "<<", nil, ->(*){ [:verb, "<"] }],
[:verb, "\\$(\\$)"],
[:color_and_style, "<(.+?),(.+?)>", :all],
[:color, "<(.+?)>", 1],
[:reset, "\\$"],
[:verb, "[^<$]+"],
]
end
let(:tokenizer) { L43Peg::Combinators.rgx_tokenize(regexen) }
Then we can tokenize some inputs
input = "<red,bold>HELLO$and<<<green>$$<reset>"
ast = [
[:color_and_style, ["<red,bold>", "red", "bold"]],
[:verb, "HELLO"],
[:reset, "$"],
[:verb, "and"],
[:verb, "<"],
[:color, "green"],
[:verb, "$"],
[:color, "reset"]
]
assert_parse_success(tokenizer, input, ast:)
As parsers are by design imbricated functions debugging is not always simple.
Enter the debug_parser
, a parser that debugs parsers by not changing their behavior
by displaying more or less detailed information
Given a parser
include L43Peg::Combinators
let :args do
{
lat: "lat:(\\d+)",
long: "long:(\\d+)",
}
end
let(:geo_parser) {args_parser(args, &:to_i)}
Given a minum debug parser
let(:debugger) {debug_parser(geo_parser, level: :min)}
Then we will get some output
expected =
"Tokens<[\"lat:43\", \"long:2\"]>\nSuccess: @1\n"
expect { parsed_success(debugger, ["lat:43", "long:2"]) }
.to output(expected).to_stderr
Given a default debug parser
let(:debugger) {debug_parser(char_parser("a"))}
Then we will get some output on errors
expected ="Input<\"b\"@1:1>\nFailure: char \"b\" @[1, 1]\n"
expect { parsed_failure(debugger, "b") }
.to output(expected).to_stderr
Given a maxium level parser
let(:max_debugger) { debug_parser(char_parser("b"), level: :max) }
Then we will get this output on errors
expected =
[
"================================================================================",
'Input<col:1 input:"bc" lnb:1 context:{}>',
"================================================================================",
'Success<ast:"b" cache:{} rest:"c">',
"================================================================================",
""
].join("\n")
expect { parsed_success(max_debugger,"bc") }
.to output(expected).to_stderr
Copyright © 2024 Robert Dober robert.dober@gmail.com
GNU AFFERO GENERAL PUBLIC LICENSE, Version 3, 19 November 2007. Please refer to LICENSE for details.
FAQs
Unknown package
We found that l43_peg demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oxlint’s new preview brings type-aware linting powered by typescript-go, combining advanced TypeScript rules with native-speed performance.
Security News
A new site reviews software projects to reveal if they’re truly FOSS, making complex licensing and distribution models easy to understand.
Security News
Astral unveils pyx, a Python-native package registry in beta, designed to speed installs, enhance security, and integrate deeply with uv.