![Maven Central Adds Sigstore Signature Validation](https://cdn.sanity.io/images/cgdhsj6q/production/7da3bc8a946cfb5df15d7fcf49767faedc72b483-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Maven Central Adds Sigstore Signature Validation
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
js-ast-tokenizer
Advanced tools
A JavaScript code tokenizer for ease in using with code embeddings and vector storage.
A JavaScript code tokenizer for ease in using with code embeddings and vector storage. This library analyzes JavaScript files or code snippets and provides a structured tokenization output, making it suitable for code analysis, embeddings, and search applications.
To install the library, use npm:
npm install js-ast-tokenizer
Here’s a basic example of how to use the tokenizer:
import tokenizeFileOrCode from 'js-ast-tokenizer';
const result = await tokenizeFileOrCode('path/to/your/file.js');
console.log(result);
import tokenizeFileOrCode from 'js-ast-tokenizer';
const jsCode = `
import { somethingExternal } from 'some-module';
class SomeClass {
constructor() {
this.value = 42;
}
}
const someVariable = 2;
function someFunction() {
return 'Hello, world!';
}
someExternalFunction();
someExternalClass.someMethod();
export { someFunction };
export default SomeClass;
`;
const result = await tokenizeFileOrCode(jsCode);
console.log(result);
The output of the tokenizeFileOrCode
function will be a structured object that represents the tokenized components of your JavaScript code. Below is an example output with the key components, including globalVariables
, externalReferences
, and exports
:
{
"file": "path/to/your/file.js",
"nodes": {
"imports": [["some-module", "import { somethingExternal } from 'some-module';"]],
"classes": [["SomeClass", "class SomeClass { constructor() { this.value = 42; } }"]],
"globalVariables": [
["someVariable", "const someVariable = 2"]
],
"globalFunctions": [
["someFunction", "function someFunction() { return 'Hello, world!'; }"]
],
"exports": [
["someFunction", "export { someFunction }"],
["default", "export default SomeClass"]
],
"externalReferences": [
["someExternalFunction", "someExternalFunction()"],
["someExternalClass.someMethod", "someExternalClass.someMethod()"]
]
},
"content": "...",
"length": 375
}
tokenizeFileOrCode(input: string): Promise<TokenizeResult>
Tokenizes the given input, which can be either a file path or a JavaScript code string.
Promise
that resolves to a TokenizeResult
object containing details about the tokenized JavaScript code.The result object contains the following properties:
interface TokenizeResult {
file: string | null; // Full file path if input is a file, otherwise null
nodes: TokenizedStructure; // Tokenized components of the code
content: string; // The original JavaScript code or file content
length: number; // The length of the input code
}
interface TokenizedStructure {
imports: [string, string][]; // List of [moduleName, importStatement]
classes: [string, string][]; // List of [className, classCode]
globalVariables: [string, string][]; // List of [variableName, declaration]
globalFunctions: [string, string][]; // List of [functionName, functionCode]
exports: [string, string][]; // List of [exportedName, exportCode]
externalReferences: [string, string][]; // List of external references [referenceName, fullExpression]
}
globalVariables
, externalReferences
, and exports
globalVariables: This array captures top-level variables in the file, such as const someVariable = 2
. It includes the variable name and its full declaration. Example:
[
["someVariable", "const someVariable = 2"]
]
externalReferences: This array captures references to variables, functions, or classes that are not defined in the local scope but are used within the code, such as someExternalFunction()
and someExternalClass.someMethod()
. Example:
[
["someExternalFunction", "someExternalFunction()"],
["someExternalClass.someMethod", "someExternalClass.someMethod()"]
]
exports: This array captures any named or default exports from the module. Example:
[
["someFunction", "export { someFunction }"],
["default", "export default SomeClass"]
]
The output of this tokenizer is specifically structured to facilitate integration with code embeddings and vector storage systems. By breaking down code into its components, this library can help developers build searchable embeddings of JavaScript code for tasks like:
The tokenizer uses various Babel plugins to support modern JavaScript features. The following Babel plugins are enabled by default:
You can extend the functionality by adjusting the Babel configuration if necessary.
errorRecovery
mode to gracefully handle parsing errors and attempt to continue.This library is licensed under the MIT License. See the LICENSE file for more details.
SomeClass
, functions as someFunction
, exports as someExport
, and external references as someExternalReference
, providing consistent and clear naming.externalReferences
: Demonstrates variables, functions, or classes that exist outside the current file but are referenced within it.This update aligns with your preference for specific naming conventions while enhancing clarity.
FAQs
A JavaScript code tokenizer for ease in using with code embeddings and vector storage.
We found that js-ast-tokenizer demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.