Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

token_estimator

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

token_estimator

  • 0.1.2
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

TokenEstimator

TokenEstimator is a Rails gem that provides functionality to count tokens in various file formats and text inputs using different tokenizers.

Installation

Add this line to your application's Gemfile:

gem "token_counter"

And then execute:

bundle install

Methods

count_tokens_from_text

Count tokens from a given text.

    require "token_estimator"

    tokenizer_name = "gpt2"
    estimator = TokenEstimator::Estimator.new(tokenizer_name)

    text = "Your sample text here."
    token_estimation = estimator.count_tokens_from_text(text)

    puts "Token estimation: #{token_estimation}"
count_tokens_from_file

Count tokens from a file. The file type is determined by the file extension.

    require "token_estimator"

    file_path = "spec/fixtures/files/lorem.pdf"
    tokenizer_name = "gpt2"
    estimator = TokenEstimator::Estimator.new(tokenizer_name)

    token_estimation = estimator.count_tokens_from_file(file_path)

    puts "Token estimation: #{token_estimation}"
count_tokens_from_excel_file

Counts tokens from an Excel (.xlsx) file.

count_tokens_from_csv_file

Counts tokens from a CSV file.

count_tokens_from_pdf_file

Counts tokens from a PDF file.

count_tokens_from_txt_file

Counts tokens from a plain text (.txt) file.

count_tokens_from_markdown_file

Counts tokens from a Markdown (.md) file.

count_tokens_from_json_file

Counts tokens from a JSON file.

count_tokens_from_html_file

Counts tokens from an HTML file.

count_tokens_from_json

Counts tokens from a JSON object.

count_tokens_from_html

Counts tokens from an HTML string.

TokenEstimator::Estimator::SUPPORTED_FILE_TYPES

Return the supported file types.

Roadmap

Here is a checklist of the formats we currently support for token counting and those we plan to support in the future:

  • PDF
  • Markdown (.md)
  • CSV
  • Excel (XLSX)
  • JSON
  • Plain Text
  • HTML
  • DOCX (Word Documents)
  • XML
  • RTF (Rich Text Format)
  • PNG
  • JPG

Error Handling

If you try to count tokens from an unsupported file type, the gem will raise an UnsupportedFileTypeError

begin
  token_count = estimator.count_tokens_from_file("path/to/your/file.unsupported")
rescue TokenEstimator::UnsupportedFileTypeError => e
  puts e.message
end

Contributing

Contribution directions go here. You can fork the repository, create a new branch, and submit a pull request for review. Please make sure to write tests for your contributions and follow the coding standards set in the project.

License

The gem is available as open source under the terms of the MIT License.

FAQs

Package last updated on 15 Jul 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc