What is tree-sitter?
Tree-sitter is a parser generator tool and an incremental parsing library. It is used to build parsers for programming languages and to parse code into syntax trees. Tree-sitter is designed to be fast and efficient, making it suitable for real-time applications like code editors.
What are tree-sitter's main functionalities?
Parsing Code
This feature allows you to parse source code into a syntax tree. In this example, we parse a simple JavaScript code snippet and print the resulting syntax tree.
const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');
const parser = new Parser();
parser.setLanguage(JavaScript);
const sourceCode = 'const x = 1 + 2;';
const tree = parser.parse(sourceCode);
console.log(tree.rootNode.toString());
Querying Syntax Trees
This feature allows you to query syntax trees using a pattern-matching language. In this example, we query the syntax tree for binary expressions with an identifier on the left and a number on the right.
const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');
const { query } = require('tree-sitter-query');
const parser = new Parser();
parser.setLanguage(JavaScript);
const sourceCode = 'const x = 1 + 2;';
const tree = parser.parse(sourceCode);
const q = query(JavaScript, '(binary_expression left: (identifier) right: (number))');
const matches = q.matches(tree.rootNode);
console.log(matches);
Incremental Parsing
This feature allows you to incrementally parse code, which is useful for real-time applications like code editors. In this example, we first parse a JavaScript code snippet and then update the syntax tree with a modified version of the code.
const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');
const parser = new Parser();
parser.setLanguage(JavaScript);
let sourceCode = 'const x = 1 + 2;';
let tree = parser.parse(sourceCode);
sourceCode = 'const x = 1 + 2 + 3;';
tree = parser.parse(sourceCode, tree);
console.log(tree.rootNode.toString());
Other packages similar to tree-sitter
esprima
Esprima is a high-performance, standard-compliant ECMAScript parser. It parses JavaScript code into an abstract syntax tree (AST). Compared to Tree-sitter, Esprima is specifically focused on JavaScript and does not support incremental parsing.
acorn
Acorn is a small, fast, JavaScript-based JavaScript parser. It generates an abstract syntax tree (AST) and is known for its performance and modularity. Unlike Tree-sitter, Acorn is limited to JavaScript and does not support incremental parsing.
node tree-sitter
Incremental parsers for node
Installation
npm install tree-sitter
Usage
First, you'll need a Tree-sitter grammar for the language you want to parse. There are many existing grammars such as tree-sitter-javascript and tree-sitter-go. You can also develop a new grammar using the Tree-sitter CLI.
Once you've got your grammar, create a parser with that grammar.
const Parser = require('tree-sitter');
const JavaScript = require('tree-sitter-javascript');
const parser = new Parser();
parser.setLanguage(JavaScript);
Then you can parse some source code,
const sourceCode = 'let x = 1; console.log(x);';
const tree = parser.parse(sourceCode);
and inspect the syntax tree.
console.log(tree.rootNode.toString());
const callExpression = tree.rootNode.child(1).firstChild;
console.log(callExpression);
If your source code changes, you can update the syntax tree. This will take less time than the first parse.
const newSourceCode = 'const x = 1; console.log(x);';
tree.edit({
startIndex: 0,
oldEndIndex: 3,
newEndIndex: 5,
startPosition: {row: 0, column: 0},
oldEndPosition: {row: 0, column: 3},
newEndPosition: {row: 0, column: 5},
});
const newTree = parser.parse(newSourceCode, tree);
Parsing Text From a Custom Data Structure
If your text is stored in a data structure other than a single string, you can parse it by supplying a callback to parse
instead of a string:
const sourceLines = [
'let x = 1;',
'console.log(x);'
];
const tree = parser.parse((index, position) => {
let line = sourceLines[position.row];
if (line) {
return line.slice(position.column);
}
});