Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

ruby-pinyin

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

ruby-pinyin

  • 0.5.0
  • Rubygems
  • Socket score

Version published
Maintainers
1
Created
Source

ruby-pinyin: 支持多音字的汉字转拼音工具

Build Status

ruby-pinyin: zhī chí duō yīn zì de hàn zì zhuǎn pīn yīn gōng jù

ruby-pinyin可以把汉字转化为对应的拼音,并能够较好的处理多音字的情况。比如:

    PinYin.of_string('南京市长江大桥', :ascii)

能够正确的将“长”转为"chang2", 而不是"zhang3".

Features

  • 支持多音字
  • 使用最新的UNICODE数据(6.3.0 published at 2013/02/26)
  • 能够显示数字或者UNICODE音调(eg: 'cao1', 'cāo')
  • 丰富的API
  • 支持中英文标点混合字符串
  • 中文标点转为英文标点
  • 支持自定义读音

Installation

    gem install ruby-pinyin

或者把ruby-pinyin加入你的Gemfile:

    gem 'ruby-pinyin'

Examples

    # encoding: utf-8
    require 'ruby-pinyin'

    # return ['jie', 'cao']
    PinYin.of_string('节操')

    # return ['jie2', 'cao1']
    PinYin.of_string('节操', true)
    PinYin.of_string('节操', :ascii)

    # return ["jié", "cāo"]
    PinYin.of_string('节操', :unicode)

    # 正确处理多音字: return ["nán", "jīng", "shì", "cháng", "jiāng", "dà", "qiáo"]
    PinYin.of_string('南京市长江大桥', :unicode)

    # return %w(gan xie party gan xie guo jia)
    PinYin.of_string('感谢party感谢guo jia')

    # return 'gan-xie-party-gan-xie-guo-jia'
    PinYin.permlink('感谢party感谢guo jia')

    # return 'gxpartygxguojia'
    PinYin.abbr('感谢party感谢guo jia')

    # return 'gan xie party, gan xie guo jia!'
    # PinYin.sentence保留标点符号, 同时用对应英文标点代替中文标点
    PinYin.sentence('感谢party, 感谢guo家!')

    # override readings with your own data file
    PinYin.override_files = [File.expand_path('../my.dat', __FILE__)]

更多的例子和参数请参考测试用例

配置

ruby-pinyin有两个PinYin::Backend: PinYin::Backend::Simple 以及PinYin::Backend::MMSeg. 默认是使用MMSeg backend, 支持多音字识别。如果你不需要多音字识别,或是对内存使用要求很高,或是有其它任何原因想要fallback到Simple backend, 可以如下配置:

PinYin.backend = PinYin::Backend::Simple.new

自定义发音

通过PinYin.override_files可以自定义某些字的发音。自定义的数据以普通文本文件存放,每行定义一个字的发音,以ASCII空格将汉字的unicode编码和拼音隔开。格式可参考lib/ruby-pinyin/data/Mandarin.dat文件。

欢迎任何帮助

如果你喜欢这个项目,请通过(不限)以下方式帮助她!

  • 各种使用
  • 各种宣传
  • 各种报告bug, 提供建议 (github issue tracker)
  • 各种修bug, 实现feature (github pull request)

LICENSE

BSD LICENSE

ruby-pinyin中的拼音数据由作者整理自互联网,你可以在ruby-pinyin之外的地方任意使用,但是请注明数据来自ruby-pinyin :-)

Contributors

FAQs

Package last updated on 21 Apr 2016

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc