Product
Socket Now Supports uv.lock Files
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.
sql-parser-cst
Advanced tools
SQL Parser which produces a Concrete Syntax Tree (CST).
Unlike a more usual parser which produces an Abstract Syntax Tree (AST), this parser preserves all the syntax elements present in the parsed source code, with the goal of being able to re-create the exact original source code.
Try it live in SQL Explorer.
Supports the following SQL dialects:
Note: This software is in very active development. The syntax tree structure is mostly stable now, though there are bound to be changes as new SQL dialects are added and they contain features that need to be accommodated to the syntax tree.
npm install sql-parser-cst
import { parse, show, cstVisitor } from "sql-parser-cst";
const cst = parse("select * from my_table", {
dialect: "sqlite",
// These are optional:
includeSpaces: true, // Adds spaces/tabs
includeNewlines: true, // Adds newlines
includeComments: true, // Adds comments
includeRange: true, // Adds source code location data
});
// convert all keywords to uppercase
const toUpper = cstVisitor({
keyword: (kw) => {
kw.text = kw.text.toUpperCase();
},
});
toUpper(cst);
// Serialize back to SQL
show(cst); // --> SELECT * FROM my_table
For example, given the following SQL:
/* My query */
SELECT ("first_name" || ' jr.') as fname
-- use important table
FROM persons;
An AST-parser might parse this to the following abstract syntax tree:
{
"type": "select_stmt",
"columns": [
{
"type": "alias",
"expr": {
"type": "binary_expr",
"left": { "type": "column_ref", "column": "first_name" },
"operator": "||",
"right": { "type": "string", "value": " jr." }
},
"alias": "fname"
}
],
"from": [{ "type": "table_ref", "table": "persons" }]
}
Note that the above AST is missing the following information:
AS
or as
was written)In contrast, this CST parser produces the following concrete syntax tree, which preserves all of this information:
{
"type": "program",
"statements": [
{
"type": "select_stmt",
"clauses": [
{
"type": "select_clause",
"selectKw": { "type": "keyword", "text": "SELECT", "name": "SELECT" },
"options": [],
"columns": {
"type": "list_expr",
"items": [
{
"type": "alias",
"expr": {
"type": "paren_expr",
"expr": {
"type": "binary_expr",
"left": { "type": "identifier", "text": "\"first_name\"", "name": "first_name" },
"operator": "||",
"right": { "type": "string_literal", "text": "' jr.'", "value": " jr." }
}
},
"asKw": { "type": "keyword", "text": "as", "name": "AS" },
"alias": { "type": "identifier", "text": "fname", "name": "fname" }
}
]
}
},
{
"type": "from_clause",
"fromKw": { "type": "keyword", "text": "FROM", "name": "FROM" },
"expr": { "type": "identifier", "text": "persons", "name": "persons" },
"leading": [
{ "type": "newline", "text": "\n" },
{ "type": "line_comment", "text": "-- use important table" },
{ "type": "newline", "text": "\n" }
]
}
]
},
{ "type": "empty" }
],
"leading": [
{ "type": "block_comment", "text": "/* My query */" },
{ "type": "newline", "text": "\n" }
]
}
Note the following conventions:
type: keyword
nodes, which are usually
stored in fields named like someNameKw
.type: paren_expr
node.type: list_expr
node.type: empty
node in the end.text
fields.leading
and trailing
fields,
which store comments and newlines immediately before or after that node.
These fields will also contain information about regular spaces/tabs
(e.g. {"type": "space", "text": " \t"}
). This has been left out from this
example for the sake of simplicity.Parses SQL string and returns the CST tree. Takes the following options:
'sqlite' | 'bigquery' | 'mysql' | 'mariadb'
The SQL dialect to parse (required).boolean
When enabled adds range: [number, number]
field to all CST nodes,
which contains the start and end locations of the node.boolean
When enabled adds leading: Whitespace[]
and/or trailing: Whitespace[]
to nodes which are preceded or followed by comments.boolean
Like includeComments
, but includes newlines info to the same fields.boolean
Like includeComments
, but includes horizontal whitespace info to the same fields.("?" | "?nr" | ":name" | "$name" | "@name")[]
Determines the types of bound parameters supported by the parser.
By default a query like SELECT * FROM tbl WHERE id = ?
will result in parse error.
To fix it, use paramTypes: ["?"]
config option.string
Name of the SQL file. This is only used for error-reporting.When parsing fails with syntax error, it throws FormattedSyntaxError
which contains a message like:
Syntax Error: Unexpected "WHERE"
Was expecting to see: "!", "$", "(", "-", ":", "?", "@", "CASE", ...
--> my_db.sql:2:33
|
2 | SELECT * FROM my_table ORDER BY WHERE
| ^
Converts CST back to string.
Important caveat: the CST has to contain whitespace data, meaning,
it was generated with includeComments
, includeNewlines
and includeSpaces
options enabled.
For any valid SQL the following assertion will always hold:
const opts = {
dialect: "sqlite",
includeComments: true,
includeNewlines: true,
includeSpaces: true,
};
show(parse(sql, opts)) === sql; // always true
Generates a function that walks through the whole CST tree and calls
a function in map
whenever it encounters a node with that type.
For example the following code checks that all table and column aliases
use the explicit AS
keyword:
const checkAliases = cstVisitor({
alias: (node) => {
if (!node.asKw) {
throw new Error("All alias definitions must use AS keyword!");
}
},
});
checkAliases(cst);
Transforms the whole CST into some other type T
. The map
object
should contain an entry for each of the CST node types it expects to
encounter (this generally means all of them).
For example, the following implements a toString()
function that
serializes very basic SQL queries like SELECT 1, 2, 3 + 4
:
const toString = cstTransformer({
program: (node) => node.statements.map(toString).join(";"),
select_statement: (node) => node.clauses.map(toString).join(" "),
select_clause: (node) => "SELECT " + node.columns.map(toString).join(", "),
binary_expr: (node) =>
toString(node.left) + " " + node.operator + " " + toString(node.right),
number_literal: (node) => node.text,
});
The builtin show()
function is implemented as such a transform.
Additionally the parser exports lists of reserved keywords for each supported SQL dialect:
sqliteKeywords
, bigqueryKeywords
, mysqlKeywords
, mariadbKeywords
.
These are simple JavaScript objects, useful for doing lookups:
export const sqliteKeywords = {
ABORT: true,
ACTION: true,
ADD: true,
...
};
yarn generate
will generate parser.
The testsuite contains two kinds of tests:
When running the testsuite one always needs to pick a dialect.
For example yarn test:sqlite
or yarn test:mysql
.
Running one of these commands will run the testsuite against the parser
of that dialect. It will execute all the generic tests plus tests
applicable for that dialect.
yarn test
will execute the testsuite for each supported dialect,
covering all the possible combinations.
Start the parser-generator watch process in one terminal:
yarn watch:generate
and the tests watch process in another terminal:
yarn test:sqlite --watch
Note that yarn test --watch
doesn't work.
A separate watch process needs to be started manually for each dialect.
This started as a fork of node-sql-parser, which is based on @flora/sql-parser, which in turn was extracted from Alibaba's nquery module.
There's very little left of the original code though.
FAQs
Parses SQL into Concrete Syntax Tree (CST)
The npm package sql-parser-cst receives a total of 5,100 weekly downloads. As such, sql-parser-cst popularity was classified as popular.
We found that sql-parser-cst demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports uv.lock files to ensure consistent, secure dependency resolution for Python projects and enhance supply chain security.
Research
Security News
Socket researchers have discovered multiple malicious npm packages targeting Solana private keys, abusing Gmail to exfiltrate the data and drain Solana wallets.
Security News
PEP 770 proposes adding SBOM support to Python packages to improve transparency and catch hidden non-Python dependencies that security tools often miss.