Kramdown Tree-sitter Highlighter
Syntax highlight code with Tree-sitter
via Kramdown.
This is a syntax highlighter plugin for Kramdown that
leverages
Tree-sitter's native syntax highlighter
to highlight code blocks (and spans) when rendering HTML.
Getting started
Requirements and compatibility
This plugin is built for Kramdown and hence requires a
compatible Ruby installation to function. It is also
essentially an adapter for the
Tree-sitter highlight library and
hence also requires a compatible Rust installation to
function. It is officially compatible with the following environments:
- Ruby: 3.0, 3.1, 3.2, 3.3
- Rust: 1.75, 1.76, 1.77, 1.78
- Platforms: MacOS, Linux
Installation
For projects using Bundler for dependency management, run the
following command to both install the gem and add it to the Gemfile:
bundle add kramdown-syntax_tree_sitter
Otherwise, install the gem via the following command:
gem install kramdown-syntax_tree_sitter
Usage
Quickstart
For the following example to function, the
Tree-sitter Python parser library
must be present inside a directory called tree_sitter_parsers
, which in turn must be
located in the home directory. See the subsequent
'Tree-sitter parsers' section for more information.
require 'kramdown'
require 'kramdown/syntax_tree_sitter'
text = <<~MARKDOWN
~~~python
print('Hello, World!')
~~~
MARKDOWN
Kramdown::Document.new(text, syntax_highlighter: :'tree-sitter').to_html
General usage
To successfully highlight input text via Kramdown with this plugin enabled, make sure
that every language referenced has a corresponding Tree-sitter parser library installed.
See the subsequent 'Language identifiers' and
'Tree-sitter parsers' sections for more information.
Usage with Jekyll
This plugin can be used with the popular static site generator
Jekyll. Jekyll projects using Kramdown for Markdown rendering
(the default setting for Jekyll), can enable this plugin by installing the gem (most
likely via Bundler, as described in the 'Installation' section) and
then adding the following lines to the Jekyll project's configuration file
(_config.yml
):
plugins:
- kramdown/syntax_tree_sitter
kramdown:
syntax_highlighter: tree-sitter
Note that this plugin's general usage prerequisites (described in the previous
'General usage' section) still apply when using this plugin with
Jekyll.
Also, there are multiple ways to render highlighted code blocks with Jekyll, as
illustrated in the following table:
Method name | Example | Supported by this plugin |
---|
Fenced code block |
```python
print('Hello, World!')
```
or
~~~python
print('Hello, World!')
~~~
| :white_check_mark: |
Indented code block |
print('Hello, World!')
{: class="language-python" }
| :white_check_mark: |
Liquid highlight tag |
{% highlight python %}
print('Hello, World!')
{% endhighlight %}
| :x: |
Since Jekyll does not defer to Kramdown to render Liquid highlight tags, this plugin
does not support highlighting code using that method. Therefore, code blocks must be
represented in either fenced or indented notation in order to be rendered via this
plugin.
Tree-sitter parsers
Tree-sitter relies on external parser libraries to understand each language grammar.
Thus, in order to syntax highlight a given language using this plugin, that language's
Tree-sitter parser library must be installed to the correct directory on your machine.
This directory is set as ~/tree_sitter_parsers
by default but is also configurable
(see the 'Configuration' section for details).
For most such parser libraries, installation simply amounts to downloading the
repository into the configured Tree-sitter parsers directory.
A partial list of languages for which Tree-sitter parser libraries have been developed
can be found on
the official Tree-sitter website.
Language identifiers
For most popular languages, this plugin recognizes the same language identifiers as used
by the popular highlighter Rouge. For example,
both 'python' and 'py' may be used to identify a code block written in Python. For the
complete list of Rouge language identifiers recognized by this plugin, see
lib/kramdown/syntax_tree_sitter/languages.rb
.
This plugin also automatically supports the language identifier format used by
Tree-sitter, known as as the 'scope' name. For example, the scope name for Python is
'source.python', whereas for HTML it is 'text.html.basic'.
For any identifiers currently supported by Rouge but not by this plugin, feel free to
open an issue or pull request on GitHub to have them included.
CSS styling
Code highlights can be rendered either via inline CSS styles or via CSS classes, which
are applied to each token in the parsed code.
To use CSS classes for highlighting, set the css_classes
Kramdown syntax-highlighting
option to true
. Otherwise, the plugin will apply highlights via inline CSS styles.
The inline CSS styles are derived from Tree-sitter's built-in default highlighting
theme, repeated here for convenience:
Token name | CSS style |
---|
attribute | color: #af0000; font-style: italic; |
comment | color: #8a8a8a; font-style: italic; |
constant | color: #875f00; |
function | color: o#005fd7; |
function.builtin | color: #005fd7; font-weight: bold; |
keyword | color: #5f00d7; |
operator | color: #4e4e4e; font-weight: bold; |
property | color: #af0000; |
string | color: #008700; |
string.special | color: #008787; |
tag | color: #000087; |
type | color: #005f5f; |
type.builtin | color: #005f5f; font-weight: bold; |
variable.builtin | font-weight: bold; |
variable.parameter | text-decoration: underline; |
constant.builtin, number | color: #875f00; font-weight: bold; |
constructor, module | color: #af8700; |
punctuation.bracket, punctuation.delimiter | color: #4e4e4e; |
Any and all token types not represented in this default theme are consequently not
highlighted when using the inline CSS styles option.
The CSS class names are derived directly from Tree-sitter token names by replacing all
full stops ('.') with dashes ('-') and adding the prefix 'ts-'. For example, the CSS
class for 'function.builtin' tokens is 'ts-function-builtin'. The use of CSS classes
allows for customization of highlighting styles, including the ability to highlight more
token types than with the default inline CSS method. Of course, this also requires that
an externally created CSS stylesheet defining the style for each token type is provided
whenever the Kramdown-generated HTML is rendered.
Configuration
This Kramdown plugin currently supports the following options when provided as sub-keys
of the Kramdown option syntax_highlighter_opts
:
Key | Description | Type | Default value |
---|
tree_sitter_parsers_dir | The path to the Tree-sitter language parsers directory. | String | ~/tree_sitter_parsers |
css_classes | Whether to use CSS classes for highlights. | Boolean | false |
Contributing
Contributions are welcome!
Development
To set up a compatible local development environment, please first refer to the
'Requirements and Compatibility' section of this
document.
This project also depends on Git submodules to run some of its tests. Accordingly, make
sure to initialize recursive submodules when cloning the project for development
purposes, for example with the following command:
git clone --recurse-submodules https://github.com/andrewtbiehl/kramdown-syntax_tree_sitter.git
After checking out the project, run bundle install
from within it to install
dependencies. Then run bundle exec rake --tasks
to list all available Rake tasks. Each
task can be invoked via bundle exec rake <task name>
. For example,
bundle exec rake test
runs unit tests and bundle exec rake smoke_test
installs the
gem and runs a smoke test against it.
This project uses GitHub Actions workflows to
facilitate continuous integration. The 'Quality Control' workflow runs any time new
commits are pushed to GitHub on any branch. This workflow runs the rubocop
, test
,
and smoke_test
Rake tasks to verify that new changes meet the project's code quality
standards, so it is strongly recommended that these tasks are first run locally against
new changes before such changes are pushed.
The 'Unreleased' section of the changelog should also be updated accordingly whenever
significant changes are introduced.
Release process
-
Determine the new version number based on the changes being introduced. This project
follows the Semantic Versioning versioning
standard.
-
Make a commit to the trunk branch with the following changes:
-
Update the project's version number (located in
lib/kramdown/syntax_tree_sitter/version.rb
).
-
Update the project's changelog
(CHANGELOG.md
)
with documentation for the new
version by modifying the current 'Unreleased' section accordingly.
-
Add a copy of the following release template to the top of the changelog and update
the 'Unreleased' section link to restart the process for the next release.
## [Unreleased]
### Added
<!-- For new features -->
### Changed
<!-- For changes in existing functionality -->
### Deprecated
<!-- For soon-to-be removed features -->
### Removed
<!-- For now removed features -->
### Fixed
<!-- For any bug fixes -->
### Security
<!-- In case of vulnerabilities -->
-
Draft and publish a new GitHub release
with a new trunk branch tag and title corresponding to the new version number and a
description copied over from the changelog. This will trigger the 'Gem Publication'
GitHub Actions workflow, which will push the new version to
RubyGems.org.
About
Tree-sitter is a modern, general-purpose
parsing library that outclasses many existing tools at the task of syntax highlighting.
This plugin adapts Tree-sitter's native highlighter for
Kramdown, so that Tree-sitter's superior highlighting
capabilities can be easily leveraged in the context of rendering Markdown.
The basic functionality of this plugin was originally presented as a blog post:
"Syntax highlight your Jekyll site with Tree-sitter!".
This article explains the original use case and inspiration for the project, walks
through its implementation, and even provides some fun examples of syntax highlighting
with Tree-sitter.
Disclaimer
Neither this plugin nor its author are affiliated with Tree-sitter or Kramdown in any
way. For any information particular to
Tree-sitter or
Kramdown, please refer to their respective
documentation directly.
License
This project is released under the MIT License.
The text for this license can be found in the project's LICENSE.txt file.