Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Javascript BPE Encoder Decoder for GPT-2 / GPT-3. The "gpt-3-encoder" module provides functions for encoding and decoding text using the Byte Pair Encoding (BPE) algorithm. It can be used to process text data for input into machine learning models, or to
Also check out the browser demo browser demo
npm install gptoken
const gptoken = require('gptoken')
let tokens = gptoken.encode("hello world, we all share a goal of life")
console.log("Tokens: ", tokens);
console.log("TokenStats: ", JSON.stringify(gptoken.tokenStats(tokens)));
//or browser demo
firefox ../node_modules/gptoken/browser.html
Or check out the full express demo
cd demo_app
npm install
npm start
I have created this to add general gpt helper functionality as well as creat a compleat pakage.
The plan is to clean up and stabilize this original implementation.
I would like to then make this usefully for other models and features.
Improved performance More utilities function.
More research on how to interact with GPT models.
Add a simple elagent openai api integration so this can be a minimal frontend and backend base.
Javascript library for encoding and decoding text using Byte Pair Encoding (BPE), as used in GPT-2 and GPT-3 models by OpenAI. This is a fork of the original python implementation by OpenAI, which can be found here.
This fork includes additional features such as the countTokens and tokenStats functions, as well as updated documentation.
To install with npm:
npm install gptoken
or old
npm install @syonfox/GPT3-encoder
The main interface is defined in index.js or index.d.ts
The code is in Encoder.js
The Encoding data/ maps are in the bpe_data directory this is loaded by Encoder to perfrom the conversion.
There are useful scripts defined in pakage.json
The tests are using jest
and are defined in Encoder.test.js
docs are built using jsdoc npm run doc
and we need to cp browser.* docs/
after build so demo works on github pages
There are 2 demos one using nodejs npm run demo
and one using the browserify version in a html page npm run browser
Compatible with Node >= 12
To use the library in your project, import it as follows:
const gptoken = require('gptoken');
In addition to the original encoding
and decoding
functions, this fork includes the following additional features:
countTokens(text: string): number
This function returns the number of tokens in the provided text, after encoding it using BPE.
tokenStats(text: string): object
This function returns an object containing statistics about the tokens in the provided text, after encoding it using
BPE. The returned object includes the following properties:
count
: the total number of tokens in the text.unique
: the number of unique tokens in the text.frequencies
: an object containing the frequency of each token in the text.postions
: an object mapping tokens to positions in the encoded stringtokens
: same as the output to tokensThis library is compatible with both Node.js
const gptoken = require('gptoken');
and browser environments, we have used webpack to build /dist/bundle.js 1.5 MB including the data. A compiled version for both environments is included in the package.
<script src="/js/gptoken/browser.js"></script>
and
cp -r node_modules/gptoken ./public/js/gptoken
or
app.use('/js/gptoken', express.static(path.join(__dirname, 'node_modules/gptoken')));
'
This library was created as a fork of the original GPT-3-Encoder library by latitudegames.
See browser.html and demo.js Note you may need to include it from the appropriate place in node modules / npm package name
import {encode, decode, countTokens, tokenStats} from "gptoken"
//or note you might need @syonfox/gpt-3-encoder if thats what you npm install
const {encode, decode, countTokens, tokenStats} = require('gptoken')
const str = 'This is an example sentence to try encoding out on!'
const encoded = encode(str)
console.log('Encoded this string looks like: ', encoded)
console.log('We can look at each token and what it represents')
for (let token of encoded) {
console.log({token, string: decode([token])})
}
//example count tokens usage
if (countTokens(str) > 5) {
console.log("String is over five tokens, inconcevable");
}
const decoded = decode(encoded)
console.log('We can decode it back into:\n', decoded)
I have added som other examples to the examples folder. Please take a look at package.json for how to do stuff
//the original repo
git clone https://github.com/syonfox/GPT-3-Encoder.git
cd GPT-3-Encoder
npm install # install dev deps (docs tests build)
npm run test # run tests
npm run docs # build docs
npm run build # builds it for the browser
npm run browser # launches demo in firefox
npm run demo # runs node.js demo
less Encoder.js # the main code is here
firefox ./docs/index.html # view docs locally
npm publish --access public # dev publish to npm
Performance
Built bpe_ranks in 100 ms
// using js loading (probably before cache) Loaded encoder in 121 ms Loaded bpe_ranks in 91 ms
// using fs loading Loaded encoder in 32 ms Loaded bpe_ranks in 44 ms
//back to js loading Loaded encoder in 35 ms Loaded bpe_ranks in 40 ms
More stats that work well with this token representation.
Clean up and keep it simple.
Here are some additional suggestions for improving the GPT-3 Encoder:
FAQs
Javascript BPE Encoder Decoder for GPT-2 / GPT-3. The "gpt-3-encoder" module provides functions for encoding and decoding text using the Byte Pair Encoding (BPE) algorithm. It can be used to process text data for input into machine learning models, or to
The npm package gptoken receives a total of 187 weekly downloads. As such, gptoken popularity was classified as not popular.
We found that gptoken demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.