Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

enamdict

Package Overview
Dependencies
Maintainers
1
Versions
17
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

enamdict

Efficiently query ENAMDICT using Node.js.

  • 0.1.2
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
24
Maintainers
1
Weekly downloads
 
Created
Source

node-enamdict

A module for efficiently querying name records from ENAMDICT (A Japanese-English mapping of proper names). Specifically this module is designed for the use case of finding a good English/Kana/Kanji mapping for given names and surnames. Finding these mappings can be especially challenging and ENAMDICT appears to have the best available mapping. At this time all other entries in ENAMDICT are ignored (such as place names, full names, company names, etc.).

This utility was created to correct artist names in the romaji-name library, which is used in the Ukiyo-e.org service, created by John Resig. All code is available under an MIT license.

Example Usage:

var enamdict = require("./enamdict");

enamdict.init(function() {
    var entries = enamdict.find("utagawa");
    console.log("Romaji:", entries.romaji());
    console.log("Kana:", entries.kana());
    console.log("Kanji:", entries.kanji());
    console.log("Type:", entries.type());
    
    entries = enamdict.findKanji("曷川")
    console.log("Romaji:", entries.romaji());
    console.log("Kana:", entries.kana());
    console.log("Kanji:", entries.kanji());
    console.log("Type:", entries.type());
});

Sample Output:

# From `find()`
Romaji: Utagawa
Kana: うたがわ
Kanji: [ '哥川', '唄川', '宇多川', '宇田川', '歌川', '詩川', '雅楽川' ]
Type: surname

# From `findKanji()`
Romaji: [ 'katsugawa', 'katsukawa' ]
Kana: [ 'かつがわ', 'かつかわ' ]
Kanji: 曷川
Type: surname

Installation

This package can be installed by running:

npm install enamdict

ENAMDICT Pre-Processing

When this package is installed a copy of ENAMDICT is downloaded from (ftp.monash.edu.au)[http://ftp.monash.edu.au/pub/nihongo/enamdict.gz]. A couple optimizations are performed in order to speed up search time and to decrease the file size of the dictionary.

  • To start, ENAMDICT is converted from a EUC-JP encoding to the more-widely-used UTF-8 encoding.
  • All entries that aren't "surname", "given", "male" (given), "female" (given), or "unknown" are removed.
  • Extraneous non-name details are stripped from the entry (such as the years in which the individual lived).
  • All entries that aren't an individual name part are removed. (e.g. "hiroshige" is kept but "utagawa hiroshige" is removed)
  • Only the "romaji", "kana", "kanji", and "type" fields are preserved, everything else is removed.
  • All the entries are then sorted by their romaji name (to improve lookup performance).

This is all placed into a new enamdict.gz file in the same directory as the enamdict.js script itself. For comparison the old ENAMDICT file is 7.2MB whereas the new one is only 2.8MB.

Methods

.init(callback)

Asynchronously loads the previously-generated reduced ENAMDICT. Must be called before attempting to call .find() or .findKanji().

.find(romajiName)

Finds matching entries by Romaji name (English name). This is the default search mechanism, the search index is optimized for this particular method. Returns an Entries object.

.findKanji(kanjiName)

Finds matching entries by Kanji name (Japanese name). The search index is NOT optimized for this particular method and may be slow. Returns an Entries object.

Models

Entries

The result object returned from the .find() and .findKanji() methods. Holds a collection of entries that are then used in aggregate.

.entries()

Returns an array of objects representing matching entries. The objects have the following properties:

  • romaji: A string holding an English (Romaji) representation of a name.
  • kana: A string holding a Kana representation of a name.
  • kanji: A string holding a Kanji representation of a name.
  • type: A string that represents the type of the name. Possible values are: "surname", "given", or "unknown".
.type()

Returns the most popular type of the name, aggregated from all matching entries. For example if 5 entries were found, three of which were "surname", 1 of which was "given", and 1 of which was "unknown" then this method would return "surname". Returns the same possible values as the type property itself.

.kana()

If a query was done with .find() then this will return a string representing the Kana reading of the name.

If a query was done with .findKanji() then this will return an array of all the possible Kana readings of the Kanji.

.romaji()

If a query was done with .find() then this will return a string representing the Romaji reading of the name.

If a query was done with .findKanji() then this will return an array of all the possible Kana readings of the Kanji.

.kanji()

If a query was done with .find() then this will return an array of all the possible Kanji versions of the name.

If a query was done with .findKanji() then this will return a string representing the Kanji version of the name.

Keywords

FAQs

Package last updated on 23 Aug 2014

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc