
Security News
Open Source Maintainers Feeling the Weight of the EU’s Cyber Resilience Act
The EU Cyber Resilience Act is prompting compliance requests that open source maintainers may not be obligated or equipped to handle.
TokenEstimator is a Rails gem that provides functionality to count tokens in various file formats and text inputs using different tokenizers.
Add this line to your application's Gemfile:
gem "token_counter"
And then execute:
bundle install
count_tokens_from_text
Count tokens from a given text.
require "token_estimator"
tokenizer_name = "gpt2"
estimator = TokenEstimator::Estimator.new(tokenizer_name)
text = "Your sample text here."
token_estimation = estimator.count_tokens_from_text(text)
puts "Token estimation: #{token_estimation}"
count_tokens_from_file
Count tokens from a file. The file type is determined by the file extension.
require "token_estimator"
file_path = "spec/fixtures/files/lorem.pdf"
tokenizer_name = "gpt2"
estimator = TokenEstimator::Estimator.new(tokenizer_name)
token_estimation = estimator.count_tokens_from_file(file_path)
puts "Token estimation: #{token_estimation}"
count_tokens_from_excel_file
Counts tokens from an Excel (.xlsx) file.
count_tokens_from_csv_file
Counts tokens from a CSV file.
count_tokens_from_pdf_file
Counts tokens from a PDF file.
count_tokens_from_txt_file
Counts tokens from a plain text (.txt) file.
count_tokens_from_markdown_file
Counts tokens from a Markdown (.md) file.
count_tokens_from_json_file
Counts tokens from a JSON file.
count_tokens_from_html_file
Counts tokens from an HTML file.
count_tokens_from_json
Counts tokens from a JSON object.
count_tokens_from_html
Counts tokens from an HTML string.
TokenEstimator::Estimator::SUPPORTED_FILE_TYPES
Return the supported file types.
Here is a checklist of the formats we currently support for token counting and those we plan to support in the future:
If you try to count tokens from an unsupported file type, the gem will raise an UnsupportedFileTypeError
begin
token_count = estimator.count_tokens_from_file("path/to/your/file.unsupported")
rescue TokenEstimator::UnsupportedFileTypeError => e
puts e.message
end
Contribution directions go here. You can fork the repository, create a new branch, and submit a pull request for review. Please make sure to write tests for your contributions and follow the coding standards set in the project.
The gem is available as open source under the terms of the MIT License.
FAQs
Unknown package
We found that token_estimator demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The EU Cyber Resilience Act is prompting compliance requests that open source maintainers may not be obligated or equipped to handle.
Security News
Crates.io adds Trusted Publishing support, enabling secure GitHub Actions-based crate releases without long-lived API tokens.
Research
/Security News
Undocumented protestware found in 28 npm packages disrupts UI for Russian-language users visiting Russian and Belarusian domains.