Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
bs-sedlex
For details on purpose, usage, and API of sedlex, scroll down. These sections I've added at the top are specific to ways that installation and usage of the
bs-sedlex
distribution differ from using the upstream release.
This repository contains a fork of the sedlex lexer-generator tooling for OCaml-family languages, packaged for use in projects utilizing BuckleScript (an OCaml-to-JavaScript compiler) and ReasonML (an alternative OCaml syntax targeting that compiler.)
Care is taken in this project to publish pre-compiled binaries of the ppx
syntax-extension component necessary to use sedlex in practice. These are
published to npm as the separate npm package, ppx-sedlex
, versioned in lockstep with
the parent bs-sedlex
package. Instructions for enabling this extension in your BuckleScript
configuration-file, bsconfig.json
, are included below. Don't miss them!
You can safely ignore the installation instructions in the upstream README reproduced below, when compiling to JS using BuckleScript. Instead:
If you're writing an app or a similar end-consumer project, install BuckleScript compiler (a peerDependency of this project) via npm.
$ npm install --save bs-platform
Worh repeating: do not add this dependency to a library. The final application-developer
should generally select the version of the BuckleScript compiler; you don't want users having
duplicated versions of the compiler in their node_modules
. Instead, library developers should
add bs-platform
to both "peerDependencies"
(with a permissive version), and
"devDependencies"
(with a restrictive version):
$ npm install --save-dev bs-platform
"devDependencies": {
...
"bs-platform": "^5.0.0"
},
"peerDependencies": {
+ "bs-platform": "4.x || 5.x" // example. express the versions of BuckleScript you support here.
},
Add the ppx transformer to your "devDependencies"
:
$ npm install --save-dev ppx-sedlex
Add the runtime package (this one!) to your direct "dependencies"
, for both libraries and apps:
$ npm install --save bs-sedlex
Manually add it (the runtime package, bs-sedlex
) to your bsconfig.json
's bs-dependencies
field:
"bs-dependencies": [
...
+ "bs-sedlex"
],
Additionally tell BuckleScript to apply the ppx-sedlex
syntax-transformer over your source-code
by adding a ppx-flags
field at the root level of the same bsconfig.json
. (Note that,
unintuitively, this is not a relative path; it follows the format package-name/file-path
.)
"bs-dependencies": [
...
"bs-sedlex"
],
+"ppx-flags": [
+ "ppx-sedlex/ppx.js"
+],
Write blazing-fast, type-safe, and Unicode-aware / multilingual lexers and parsers galore!
Thanks to SemVer not including a ‘generation’ number, there's really no way I can
reasonably tie this project's version on npm to the upstream version of Sedlex as released by the
community maintainers. As ugly as it is, I've opted to pin the major version of bs-sedlex
, to
the flattened major and minor versions of the upstream project.
I started doing this with Sedlex 2.0; thus, the mapping looks like this:
Sedlex | bs-sedlex |
---|---|
v1.99.4 | v1.99.4 |
v2.0 | v20.0.x |
Correspondingly, this project can't really strictly adhere to SemVer. Tentatively, I intend to use the ‘minor’ field for breaking changes to the port, and the ‘patch’ field for everything else.
I'm dogfooding this port on a parsing-project in JavaScript & ML (Excmd.js, https://excmd.js.org). Feel free to refer to that for a real-world example of compiling industrial-strength OCaml parsing tooling down to JavaScript for the web. Some takeaways follow:
Use Menhir for parser-generation. Seriously. It's got spectacularly clear docs, an entire chapter in Real World OCaml dedicated to it, and a laundry-list of advanced features — everything from automated tooling that explains reported parsing-conflicts to you, neophyte language-developer; to an incremental-parsing API allowing you to implement extremely advanced error-recovery and introspection/reporting tools.
If you want to take that advice, unfortunately, there's no cool, easy port to JavaScript for you,
like this one for Sedlex. 😉 (Maybe I'll publish one someday!) Until one exists, you'll have to
maintain a dualistic build-system that uses the standard OCaml tooling and build-system (i.e.
opam and https://dune.build) to produce the .ml
parsing-automaton, and then feed that
into the BuckleScript build. Maybe you can glean some
ideas from my experiences here.
A major selling-point of sedlex is the deep and thorough Unicode compatibility. Use it! I suggest reading through the Unicode Consortium's documentation on the topics, known as Unicode Standard Annex №. 31, or UAX#31. It goes into more detail than you could ever want to know about a vast number of topics. Get this stuff right!
I (ELLIOTTCABLE) am also very happy to help with any of these topics — I spent a lot of time and
effort figuring this out; and although it'll hopefully improve as the BuckleScript community
grows, until then, there's a lot of minutae to get just right. I'm active on both the
OCaml and ReasonML Discord servers (why there are two, I
cannot fathom); as well as on the Freenode IRC server, in both #ocaml
and
#ELLIOTTCABLE
. Feel free to reach out if you just want to chat about these
topics, or to get more formal support!
Unicode-friendly lexer generator for OCaml.
This package is licensed by LexiFi under the terms of the MIT license.
sedlex was originally written by Alain Frisch alain.frisch@lexifi.com and is now maintained as part of the ocaml-community repositories on github.
The API is documented here.
sedlex is a lexer generator for OCaml, similar to ocamllex, but supporting Unicode. Contrary to ocamllex, lexer specifications for sedlex are embedded in regular OCaml source files.
The lexers work with a new kind of "lexbuf", similar to ocamllex Lexing lexbufs, but designed to support Unicode, and abstracting from a specific encoding. A single lexer can work with arbitrary encodings of the input stream.
sedlex is the successor of the ulex project. Contrary to ulex which was implemented as a Camlp4 syntax extension, sedlex is based on the new "-ppx" technology of OCaml, which allow rewriting OCaml parse trees through external rewriters. (And what a better name than "sed" for a rewriter?)
As any -ppx rewriter, sedlex does not touch the concrete syntax of the language: lexer specifications are written in source file which comply with the standard grammar of OCaml programs. sedlex reuse the syntax for pattern matching in order to describe lexers (regular expressions are encoded within OCaml patterns). A nice consequence is that your editor (vi, emacs, ...) won't get confused (indentation, coloring) and you don't need to learn new priority rules. Moreover, sedlex is compatible with any front-end parsing technology: it works fine even if you use camlp4 or camlp5, with the standard or revised syntax.
sedlex adds a new kind of expression to OCaml: lexer definitions. The syntax for the new construction is:
match%sedlex lexbuf with
| R1 -> e1
...
| Rn -> en
| _ -> def
or:
[%sedlex match lexbuf with
| R1 -> e1
...
| Rn -> en
| _ -> def
]
(The first vertical bar is optional as in any OCaml pattern matching. Guard expressions are not allowed.)
where:
Sedlexing.lexbuf
.Unlike ocamllex, lexers work on stream of Unicode codepoints, not bytes.
The actions can call functions from the Sedlexing module to extract (parts of) the matched lexeme, in the desired encoding.
Regular expressions are syntactically OCaml patterns:
"...."
(string constant): recognize the specified string'....'
(character constant) : recognize the specified characteri
(integer constant) : recognize the specified codepoint'...' .. '...'
: character rangei1 .. i2
: range between two codepointsR1 | R2
: alternationR, R2, ..., Rn
: concatenationStar R
: Kleene star (0 or more repetition)Plus R
: equivalent to R, R*
Opt R
: equivalent to ("" | R)
Rep (R, n)
: equivalent to R{n}
Rep (R, n .. m)
: equivalent to R{n, m}
Chars "..."
: recognize any character in the stringCompl R
: assume that R is a single-character length regexp (see below)
and recognize the complement setSub (R1,R2)
: assume that R is a single-character length regexp (see below)
and recognize the set of items in R1
but not in R2
("subtract")Intersect (R1,R2)
: assume that R
is a single-character length regexp (see
below) and recognize the set of items which are in both R1
and R2
lid
(lowercase identifier) : reference a named regexp (see below)A single-character length regexp is a regexp which does not contain (after expansion of references) concatenation, Star, Plus, Opt or string constants with a length different from one.
Note:
It is possible to define named regular expressions with the following construction, that can appear in place of a structure item:
let lid = [%sedlex.regexp? R]
where lid is the regexp name to be defined and R its definition. The scope of the "lid" regular expression is the rest of the structure, after the definition.
The same syntax can be used for local binding:
let lid = [%sedlex.regexp? R] in
body
The scope of "lid" is the body expression.
sedlex provides a set of predefined regexps:
See the interface of the Sedlexing module for a description of how to create lexbuf values (from strings, stream or channels encoded in Latin1, utf8 or utf16, or from integer arrays or streams representing Unicode code points).
It is possible to work with a custom implementation for lex buffers. To do this, you just have to ensure that a module called Sedlexing is in scope of your lexer specifications, and that it defines at least the following functions: start, next, mark, backtrack. See the interface of the Sedlexing module for more information.
The quick way:
opam install sedlex
Otherwise, the first thing to do is to compile and install sedlex. You need a recent version of OCaml and dune.
make
If you have findlib, you can use it to install and use sedlex. The name of the findlib package is "sedlex".
Installation (after "make"):
make install
Compilation of OCaml files with lexer specifications:
ocamlfind ocamlc -c -package sedlex my_file.ml
When linking, you must also include the sedlex package:
ocamlfind ocamlc -o my_prog -linkpkg -package sedlex my_file.cmo
There is also a sedlex.ppx subpackage containing the code of the ppx filter. This can be used to build custom drivers (combining several ppx transformations in a single process).
You can use sedlex without findlib. To compile, you need to run the source file through -ppx rewriter ppx_sedlex. Moreover, you need to link the application with the runtime support library for sedlex (sedlexing.cma / sedlexing.cmxa).
The examples/
subdirectory contains several samples of sedlex in use.
FAQs
An OCaml lexer generator for Unicode
The npm package bs-sedlex receives a total of 8 weekly downloads. As such, bs-sedlex popularity was classified as not popular.
We found that bs-sedlex demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.