Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
The unidecode npm package is used to transliterate Unicode text into plain ASCII characters. This is particularly useful for converting non-Latin scripts into a readable and searchable format.
Basic Transliteration
This feature allows you to convert non-Latin scripts into their closest ASCII representation. For example, Chinese characters are converted to their pinyin equivalents.
const unidecode = require('unidecode');
console.log(unidecode('你好,世界')); // Output: Ni Hao , Shi Jie
Handling Accented Characters
This feature removes accents from Latin characters, making them plain ASCII. This is useful for normalizing text for search or comparison.
const unidecode = require('unidecode');
console.log(unidecode('Café')); // Output: Cafe
Transliteration of Special Characters
This feature converts special characters and ligatures into their ASCII equivalents, making the text more universally readable.
const unidecode = require('unidecode');
console.log(unidecode('Æther')); // Output: AEther
The transliteration package provides similar functionality to unidecode by converting Unicode text to ASCII. It also supports custom transliteration rules, which can be useful for specific use cases.
The diacritics package focuses on removing diacritical marks from characters, similar to unidecode's handling of accented characters. It is simpler and more lightweight, making it suitable for projects that only need this specific functionality.
The slugify package converts strings into URL-friendly slugs, which includes transliterating non-ASCII characters to ASCII. While its primary use case is different, it offers similar transliteration capabilities as unidecode.
Unidecode is JavaScript port of the perl module Text::Unicode. It takes UTF-8 data and tries to represent it in US-ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F). The representation is almost always an attempt at transliteration -- i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system.
See Text::Unicode for the original README file, including methodology and limitations.
Note that all the files named 'x??.js' in data are derived directly from the equivalent perl file, and both sets of files are distributed under the perl license not the BSD license.
$ npm install unidecode
$ node
> var unidecode = require('unidecode');
> unidecode("aéà)àçé");
'aea)ace'
> unidecode("に間違いがないか、再度確認してください。再読み込みしてください。");
'niJian Wei iganaika, Zai Du Que Ren sitekudasai. Zai Du miIp misitekudasai. '
For values that cannot be translated, empty strings are returned. You can override this behavior by passing a custom substitution value as the second argument to unidecode
:
$ node
> var unidecode = require('unidecode');
> unidecode("ab\uFFFFc", "X");
'abXc'
> unidecode("ab\uFFFFc");
'abc'
I maintain this project in my free time, if it helped you please support my work via paypal or bitcoins, thanks a lot!
I accept pull-request !
FAQs
ASCII transliterations of Unicode text
The npm package unidecode receives a total of 253,974 weekly downloads. As such, unidecode popularity was classified as popular.
We found that unidecode demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.