![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
= Tokenizer
{RubyGems}[http://rubygems.org/gems/tokenizer] | {Homepage}[http://bu.chsta.be/projects/tokenizer] | {Source Code}[https://github.com/arbox/tokenizer] | {Bug Tracker}[https://github.com/arbox/tokenizer/issues]
{}[https://rubygems.org/gems/tokenizer]
{
}[https://travis-ci.org/arbox/tokenizer]
{
}[https://codeclimate.com/github/arbox/tokenizer]
{
}[https://gemnasium.com/arbox/tokenizer]
== DESCRIPTION A simple multilingual tokenizer -- a linguistic tool intended to split a written text into tokens for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (Human Language Technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals.
Tokenization task involves Sentence Segmentation, Word Segmentation and Boundary Disambiguation for the both tasks.
Use it for tokenization of German, English and Dutch texts.
=== Implemented Algorithms to be ...
== INSTALLATION +Tokenizer+ is provided as a .gem package. Simply install it via {RubyGems}[http://rubygems.org/gems/tokenizer].
To install +tokenizer+ issue the following command: $ gem install tokenizer
If you want to do a system wide installation, do this as root (possibly using +sudo+).
Alternatively use your Gemfile for dependency management.
== SYNOPSIS
You can use +Tokenizer+ in two ways.
As a command line tool: $ echo 'Hi, ich gehe in die Schule!. | tokenize
As a library for embedded tokenization:
require 'tokenizer' de_tokenizer = Tokenizer::WhitespaceTokenizer.new de_tokenizer.tokenize('Ich gehe in die Schule!') => ["Ich", "gehe", "in", "die", "Schule", "!"]
Customizable PRE and POST list
require 'tokenizer' de_tokenizer = Tokenizer::WhitespaceTokenizer.new(:de, { post: Tokenizer::Tokenizer::POST + ['|'] }) de_tokenizer.tokenize('Ich gehe|in die Schule!') => ["Ich", "gehe", "|in", "die", "Schule", "!"]
See documentation in the Tokenizer::WhitespaceTokenizer class for details on particular methods.
== SUPPORT
If you have question, bug reports or any suggestions, please drop me an email :) Any help is deeply appreciated!
== CHANGELOG For details on future plan and working progress see CHANGELOG.rdoc.
== CAUTION This library is work in process! Though the interface is mostly complete, you might face some not implemented features.
Please contact me with your suggestions, bug reports and feature requests.
== LICENSE
+Tokenizer+ is a copyrighted software by Andrei Beliankou, 2011-
You may use, redistribute and change it under the terms provided in the LICENSE.rdoc file.
FAQs
Unknown package
We found that tokenizer demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.