What is tree-sitter?
Tree-sitter is a parser generator tool and an incremental parsing library. It is used to build parsers for programming languages and to parse code into syntax trees. Tree-sitter is designed to be fast and efficient, making it suitable for real-time applications like code editors.
What are tree-sitter's main functionalities?
Parsing Code
This feature allows you to parse source code into a syntax tree. In this example, we parse a simple JavaScript code snippet and print the resulting syntax tree.
const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');
const parser = new Parser();
parser.setLanguage(JavaScript);
const sourceCode = 'const x = 1 + 2;';
const tree = parser.parse(sourceCode);
console.log(tree.rootNode.toString());
Querying Syntax Trees
This feature allows you to query syntax trees using a pattern-matching language. In this example, we query the syntax tree for binary expressions with an identifier on the left and a number on the right.
const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');
const { query } = require('tree-sitter-query');
const parser = new Parser();
parser.setLanguage(JavaScript);
const sourceCode = 'const x = 1 + 2;';
const tree = parser.parse(sourceCode);
const q = query(JavaScript, '(binary_expression left: (identifier) right: (number))');
const matches = q.matches(tree.rootNode);
console.log(matches);
Incremental Parsing
This feature allows you to incrementally parse code, which is useful for real-time applications like code editors. In this example, we first parse a JavaScript code snippet and then update the syntax tree with a modified version of the code.
const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');
const parser = new Parser();
parser.setLanguage(JavaScript);
let sourceCode = 'const x = 1 + 2;';
let tree = parser.parse(sourceCode);
sourceCode = 'const x = 1 + 2 + 3;';
tree = parser.parse(sourceCode, tree);
console.log(tree.rootNode.toString());
Other packages similar to tree-sitter
esprima
Esprima is a high-performance, standard-compliant ECMAScript parser. It parses JavaScript code into an abstract syntax tree (AST). Compared to Tree-sitter, Esprima is specifically focused on JavaScript and does not support incremental parsing.
acorn
Acorn is a small, fast, JavaScript-based JavaScript parser. It generates an abstract syntax tree (AST) and is known for its performance and modularity. Unlike Tree-sitter, Acorn is limited to JavaScript and does not support incremental parsing.
node tree-sitter
Incremental parsers for node
Installation
npm install tree-sitter
Usage
First, you'll need a Tree-sitter grammar for the language you want to parse. There are many existing grammars such as tree-sitter-javascript and tree-sitter-go. You can also develop a new grammar using the Tree-sitter CLI.
Once you've got your grammar, create a parser with that grammar.
const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');
const parser = new Parser();
parser.setLanguage(JavaScript);
Then you can parse some source code,
const sourceCode = 'let x = 1; console.log(x);';
const tree = parser.parse(sourceCode);
and inspect the syntax tree.
console.log(tree.rootNode.toString());
const callExpression = tree.rootNode.child(1).firstChild;
console.log(callExpression);
If your source code changes, you can update the syntax tree. This will take less time than the first parse.
const newSourceCode = 'const x = 1; console.log(x);';
tree.edit({
startIndex: 0,
oldEndIndex: 3,
newEndIndex: 5,
startPosition: {row: 0, column: 0},
oldEndPosition: {row: 0, column: 3},
newEndPosition: {row: 0, column: 5},
});
const newTree = parser.parse(newCode, tree);
Parsing Text From a Custom Data Structure
If your text is stored in a data structure other than a single string, you can parse it by supplying a callback to parse
instead of a string:
const sourceLines = [
'let x = 1;',
'console.log(x);'
];
const tree = parser.parse((index, position) => {
let line = sourceLines[position.row];
if (line) {
return line.slice(position.column);
}
});
Asynchronous Parsing
If you have source code stored in a superstring TextBuffer
, you can parse that source code on a background thread with a Promise
-based interface:
const {TextBuffer} = require('superstring');
async function test() {
const buffer = new TextBuffer('const x= 1; console.log(x);');
const newTree = await parser.parseTextBuffer(buffer, oldTree);
}
Using a background thread can introduce a slight delay, so you may want to allow some work to be done on the main thread, in the hopes that parsing will complete so quickly that you won't even need a background thread:
async function test2() {
const buffer = new TextBuffer('const x= 1; console.log(x);');
const newTree = await parser.parseTextBuffer(buffer, oldTree, {
syncOperationCount: 1000
});
}