Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

curated-tokenizers

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

curated-tokenizers

Lightweight piece tokenization library

  • 2.0.0
  • PyPI
  • Socket score

Maintainers
1

🥢 Curated Tokenizers

This Python library provides word-/sentencepiece tokenizers. The following types of tokenizers are currenty supported:

TokenizerBindingExample model
BPEsentencepiece
Byte BPENativeRoBERTa/GPT-2
UnigramsentencepieceXLM-RoBERTa
WordpieceNativeBERT

⚠️ Warning: experimental package

This package is experimental and it is likely that the APIs will change in incompatible ways.

⏳ Install

Curated tokenizers is availble through PyPI:

pip install curated_tokenizers

🚀 Quickstart

The best way to get started with curated tokenizers is through the curated-transformers library. curated-transformers also provides functionality to load tokenization models from Huggingface Hub.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc