llama-tokenizer-js
Advanced tools
Comparing version 1.1.1 to 1.1.2
{ | ||
"name": "llama-tokenizer-js", | ||
"version": "1.1.1", | ||
"version": "1.1.2", | ||
"description": "JS tokenizer for LLaMA-based LLMs", | ||
@@ -5,0 +5,0 @@ "main": "llama-tokenizer.js", |
@@ -9,5 +9,6 @@ # 🦙 llama-tokenizer-js 🦙 | ||
Features: | ||
## Features | ||
- Easy to use: 0 dependencies, code and data baked into a single file. | ||
- Compatible with most LLaMA-based models (see [Compatibility](#compatibility)) | ||
- Compatible with most LLaMA models (see [Compatibility](#compatibility)) | ||
- Optimized running time: tokenize a sentence in roughly 1ms, or 2000 tokens in roughly 20ms. | ||
@@ -88,9 +89,10 @@ - Optimized bundle size: 670KiB before minification and gzipping (the heaviest part of the tokenizer, merge data, has been compressed into a simple and efficient binary format, and then base64-encoded to bake it into the .js file) | ||
What is this tokenizer compatible with? All LLaMA models which have been trained on top of the checkpoints (model weights) leaked by Facebook in early 2023. | ||
What is this tokenizer compatible with? All LLaMA models which have been trained on top of checkpoints (model weights) released by Facebook in March 2023 ("LLaMA") and July of 2023 ("LLaMA2"). | ||
Examples of compatible models: | ||
- llama2-13b-4bit-gptq | ||
- wizard-vicuna-13b-uncensored-gptq | ||
- manticore-7b-ggml | ||
Incompatible LLaMA models are those which have been trained from scratch, not on top of the checkpoints leaked by Facebook. For example, [OpenLLaMA](https://github.com/openlm-research/open_llama) models are incompatible. | ||
Incompatible LLaMA models are those which have been trained from scratch, not on top of the checkpoints released by Facebook. For example, [OpenLLaMA](https://github.com/openlm-research/open_llama) models are incompatible. | ||
@@ -97,0 +99,0 @@ When you see a new LLaMA model released, this tokenizer is mostly likely compatible with it without any modifications. If you are unsure, try it and see if the token ids are the same (compared to running the model with, for example, oobabooga webui). You can find great test input/output samples by searching for `runTests` inside `llama-tokenizer.js`. |
Sorry, the diff of this file is too big to display
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
License Policy Violation
LicenseThis package is not allowed per your license policy. Review the package's license to ensure compliance.
Found 1 instance in 1 package
683896
2983
153