Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
TokenEstimator is a Rails gem that provides functionality to count tokens in various file formats and text inputs using different tokenizers.
Add this line to your application's Gemfile:
gem "token_counter"
And then execute:
bundle install
count_tokens_from_text
Count tokens from a given text.
require "token_estimator"
tokenizer_name = "gpt2"
estimator = TokenEstimator::Estimator.new(tokenizer_name)
text = "Your sample text here."
token_estimation = estimator.count_tokens_from_text(text)
puts "Token estimation: #{token_estimation}"
count_tokens_from_file
Count tokens from a file. The file type is determined by the file extension.
require "token_estimator"
file_path = "spec/fixtures/files/lorem.pdf"
tokenizer_name = "gpt2"
estimator = TokenEstimator::Estimator.new(tokenizer_name)
token_estimation = estimator.count_tokens_from_file(file_path)
puts "Token estimation: #{token_estimation}"
count_tokens_from_excel_file
Counts tokens from an Excel (.xlsx) file.
count_tokens_from_csv_file
Counts tokens from a CSV file.
count_tokens_from_pdf_file
Counts tokens from a PDF file.
count_tokens_from_txt_file
Counts tokens from a plain text (.txt) file.
count_tokens_from_markdown_file
Counts tokens from a Markdown (.md) file.
count_tokens_from_json_file
Counts tokens from a JSON file.
count_tokens_from_html_file
Counts tokens from an HTML file.
count_tokens_from_json
Counts tokens from a JSON object.
count_tokens_from_html
Counts tokens from an HTML string.
TokenEstimator::Estimator::SUPPORTED_FILE_TYPES
Return the supported file types.
Here is a checklist of the formats we currently support for token counting and those we plan to support in the future:
If you try to count tokens from an unsupported file type, the gem will raise an UnsupportedFileTypeError
begin
token_count = estimator.count_tokens_from_file("path/to/your/file.unsupported")
rescue TokenEstimator::UnsupportedFileTypeError => e
puts e.message
end
Contribution directions go here. You can fork the repository, create a new branch, and submit a pull request for review. Please make sure to write tests for your contributions and follow the coding standards set in the project.
The gem is available as open source under the terms of the MIT License.
FAQs
Unknown package
We found that token_estimator demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.