Llama-Cpp-Node
Llama-Cpp-Node is a Node.js binding for llama.cpp, a C++ library for LLMs (Large Language Models) like wizard models.
This module allows you to load a model file, create a context, encode strings into tokens, evaluate tokens on the context to predict the next token, and decode tokens back to strings.
Prerequisites
Before using llama-cpp-node, please ensure the following prerequisites are met:
- C++ Compiler: A C++ compiler is required to build the underlying llama.cpp library. Make sure you have a compatible C++ compiler installed. For Linux, you may need to install
build-essential
or an equivalent package depending on your distribution. For Windows, you can use Visual Studio with C++ support.
Installation
To install llama-cpp-node, you can use npm:
npm install llama-cpp-node
Note: The latest llama.cpp source code will be automatically downloaded from here during the installation.
Usage
To get started, require the module in your Node.js application:
var llamaCppNode = require('llama-cpp-node');
var { LLAMAModel, LLAMAContext } = llamaCppNode;
Loading a Model File and Creating a Context
Before you can use llama-cpp-node, you need to load a model file and create a context.
The model file should be in the ggml format.
var model = new LLAMAModel('C:/models/13B/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin');
var ctx = new LLAMAContext(model);
Encoding Strings into Tokens
To use the model for predictions, you need to encode input strings into tokens. Tokens are represented as Uint32Array
s.
var prompt = 'You are a 25 old human named ASSISTANT. It follows a transcript between you and your wife named USER.\nASSISTANT:';
var tokens = ctx.encode(prompt);
Evaluating Tokens to Predict the Next Token
After encoding the input string, you can evaluate the tokens on the context to predict the next token.
var nextToken = await ctx.eval(tokens);
Decode Tokens into Strings
To decode tokens back to strings you can use the decode
method.
var tokens = Uint32Array.from([nextToken]);
var tokenStr = ctx.decode(tokens);
Chatbot Example
To create a conversational chatbot, you can use the readline module to communicate with the user.
var llamaCppNode = require('llama-cpp-node');
var { LLAMAModel, LLAMAContext } = llamaCppNode;
var readline = require('readline/promises');
console.log(llamaCppNode.systemInfo());
var model = new LLAMAModel('C:/models/13B/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin');
var ctx = new LLAMAContext(model);
var prompt = 'You are a 25 old human named ASSISTANT. It follows a transcript between you and your wife named USER.\nASSISTANT:';
var tokens = ctx.encode(prompt);
tokens = Array.from(tokens);
tokens.unshift(llamaCppNode.tokenBos());
tokens = Uint32Array.from(tokens);
var rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
interact = async () => {
var line = 'ASSISTANT:';
process.stdout.write(line);
while (1) {
var nextToken = await ctx.eval(tokens);
if (nextToken === llamaCppNode.tokenEos()) {
tokenStr = '\nUSER: '
process.stdout.write(tokenStr);
break;
}
tokens = Uint32Array.from([nextToken]);
var tokenStr = ctx.decode(tokens);
if (tokenStr.startsWith(' ') && line === 'ASSISTANT: ') {
tokenStr = tokenStr.slice(1);
}
process.stdout.write(tokenStr);
if (tokenStr === '') {
process.stdout.write('[' + nextToken + ']');
}
if (tokenStr === '\n') {
line = '';
} else {
line += tokenStr;
}
if (line.toUpperCase().startsWith('USER:')) {
if (!line.endsWith(' ')) {
tokenStr += ' ';
process.stdout.write(' ');
}
break;
}
}
var input = await rl.question('USER: ');
tokens = ctx.encode(tokenStr + input + '\nASSISTANT:');
};
main = async () => {
try {
while (1) {
await interact();
}
} catch (e) {
console.log(e.stack);
}
};
main();
Make sure to replace C:/models/13B/Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_0.bin
with the actual path to your llama model file.
API
llamaCppNode.systemInfo()
This function returns information about the system where llama-cpp-node is running, such as the CPU and GPU information.
new LLAMAModel(modelPath: string)
This class represents a llama model. It takes the path to the model file as a parameter and can be used to create a context.
new LLAMAContext(model: LLAMAModel)
This class represents a context for the llama model. It takes a model instance as a parameter and can be used to encode, evaluate, and decode tokens.
ctx.encode(input: string): Uint32Array
This method takes an input string and encodes it into tokens. It returns a Uint32Array
representing the tokens.
ctx.eval(tokens: Uint32Array): Promise<number>
This method takes a Uint32Array
of tokens and evaluates them on the context to predict the next token. It returns a Promise that resolves to the next predicted token.
ctx.decode(tokens: Uint32Array): string
This method takes a Uint32Array
of tokens and decodes them back into a string. It returns the decoded string.
Contributing
Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue on the GitHub repository.
License
This module is released under the MIT License. See the LICENSE file for more details.