Farsi Normalizer
FarsiProcessor is a ruby gem to normalize and stem Persian/Farsi text
Normalization is defined as:
Stemming is defined as removing these suffixes (+ suffixes of plural form)
Installation
Add this line to your application's Gemfile:
gem "farsi_processor"
And then execute:
$ bundle
Or install it yourself as:
$ gem install farsi_processor
Usage
require 'farsi_processor'
[1] pry(main)> FarsiProcessor.process("ك")
=> "ک"
[2] pry(main)> FarsiProcessor.process("کتاب ها")
=> "کتاب"
[3] pry(main)> FarsiProcessor.process("ك ي", only: ["ك"])
=> "ک ي"
[4] pry(main)> FarsiProcessor.process("ك ي", except: ["ك"])
=> "ك ی"
[5] pry(main)> FarsiProcessor.process('دخترهای', except: ['های'])
=> "دختره"
[6] pry(main)> FarsiProcessor.normalize("ك")
=> "ک"
[7] pry(main)> FarsiProcessor.stem("کتاب ها")
=> "کتاب"
Questions or Problems?
If you have any issues with farsi_processor which you cannot find the solution, please add an issue on GitHub or fork the project and send a pull request.
License
The gem is available as open source under the terms of the MIT License.