Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Provides various sophisticated regular expressions to work with Emoji in strings, incorporating the latest Unicode / Emoji standards.
Additional features:
Emoji version: 16.0 (September 2024)
CLDR version (used for sub-region flags): 46 (October 2024)
gem "unicode-emoji"
The gem includes multiple Emoji regexes, which are compiled out of various Emoji Unicode data sources.
require "unicode/emoji"
string = "String which contains all types of Emoji sequences:
- Basic Emoji: 😴
- Textual Emoji with Emoji variation (VS16): ▶️
- Emoji with skin tone modifier: 🛌🏽
- Region flag: 🇵🇹
- Sub-Region flag: 🏴
- Keycap sequence: 2️⃣
- Skin tone modifier: 🏻
- Sequence using ZWJ (zero width joiner): 🤾🏽♀️
"
string.scan(Unicode::Emoji::REGEX) # => ["😴", "▶️", "🛌🏽", "🇵🇹", "🏴", "2️⃣", "🏻", "🤾🏽♀️"]
Depending on your exact usecase, you can choose between multiple levels of Emoji detection:
Regex | Description | Example Matches | Example Non-Matches |
---|---|---|---|
Unicode::Emoji::REGEX | Use this one if unsure! Matches (non-textual) Basic Emoji and all kinds of recommended Emoji sequences (RGI/FQE) | 😴 , ▶️ , 🛌🏽 , 🇵🇹 , 2️⃣ , 🏴 , 🤾🏽♀️ , 🏻 | 🤾🏽♀ , 🏌♂️ , 😴︎ , ▶ , 🇵🇵 , 🏴 , 🤠🤢 , 1 , 1⃣ |
Unicode::Emoji::REGEX_VALID | Matches (non-textual) Basic Emoji and all kinds of valid Emoji sequences | 😴 , ▶️ , 🛌🏽 , 🇵🇹 , 2️⃣ , 🏴 , 🏴 , 🤾🏽♀️ , 🤾🏽♀ ,🏌♂️ , 🤠🤢 , 🏻 | 😴︎ , ▶ , 🇵🇵 , 1 , 1⃣ |
Unicode::Emoji::REGEX_WELL_FORMED | Matches (non-textual) Basic Emoji and all kinds of well-formed Emoji sequences | 😴 , ▶️ , 🛌🏽 , 🇵🇹 , 2️⃣ , 🏴 , 🏴 , 🤾🏽♀️ , 🤾🏽♀ ,🏌♂️ , 🤠🤢 , 🇵🇵 , 🏻 | 😴︎ , ▶ , 1 , 1⃣ |
Unicode::Emoji::REGEX_POSSIBLE | Matches all singleton Emoji, all kinds of Emoji sequences, and even non-Emoji singleton components like digits. Only exception: Unqualified keycap sequences are not matched | 😴 , ▶️ , 🛌🏽 , 🇵🇹 , 2️⃣ , 🏴 , 🏴 , 🤾🏽♀️ , 🤾🏽♀ , 🏌♂️ , 🤠🤢 , 🇵🇵 , 😴︎ , ▶ , 🏻 , 1 | 1⃣ |
By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes (except in REGEX_POSSIBLE
). However, if you wish to match for them too, you can include them in your regex by appending the _INCLUDE_TEXT
suffix:
Regex | Description | Example Matches | Example Non-Matches |
---|---|---|---|
Unicode::Emoji::REGEX_INCLUDE_TEXT | REGEX + REGEX_TEXT | 😴 , ▶️ , 🛌🏽 , 🇵🇹 , 2️⃣ , 🏴 , 🤾🏽♀️ , 😴︎ , ▶ , 1⃣ , 🏻 | 🤾🏽♀ , 🏌♂️ , 🇵🇵 , 🏴 , 🤠🤢 , 1 |
Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT | REGEX_VALID + REGEX_TEXT | 😴 , ▶️ , 🛌🏽 , 🇵🇹 , 2️⃣ , 🏴 , 🏴 , 🤾🏽♀️ , 🤾🏽♀ , 🏌♂️ , 🤠🤢 , 😴︎ , ▶ , 1⃣ , 🏻 | 🇵🇵 , 1 |
Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT | REGEX_WELL_FORMED + REGEX_TEXT | 😴 , ▶️ , 🛌🏽 , 🇵🇹 , 2️⃣ , 🏴 , 🏴 , 🤾🏽♀️ , 🤾🏽♀ , 🏌♂️ , 🤠🤢 , 🇵🇵 , 😴︎ , ▶ , 1⃣ , 🏻 | 1 |
Regex | Description | Example Matches | Example Non-Matches |
---|---|---|---|
Unicode::Emoji::REGEX_INCLUDE_MQE | Like REGEX , but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors | 😴 , ▶️ , 🛌🏽 , 🇵🇹 , 2️⃣ , 🏴 , 🤾🏽♀️ , 🤾🏽♀ , 🏻 | 🏌♂️ , 😴︎ , ▶ , 🇵🇵 , 🏴 , 🤠🤢 , 1 , 1⃣ |
Unicode::Emoji::REGEX_INCLUDE_MQE_UQE | Like REGEX , but additionally includes Emoji with missing Emoji Presentation Variation Selectors | 😴 , ▶️ , 🛌🏽 , 🇵🇹 , 2️⃣ , 🏴 , 🤾🏽♀️ , 🤾🏽♀ , 🏌♂️ , 🏻 | 😴︎ , ▶ , 🇵🇵 , 🏴 , 🤠🤢 , 1 , 1⃣ |
List of MQE and UQE Emoji sequences
Matches only simple one-codepoint (+ optional variation selector) Emoji:
Regex | Description | Example Matches | Example Non-Matches |
---|---|---|---|
Unicode::Emoji::REGEX_BASIC | Matches (non-textual) Basic Emoji, but no sequences at all | 😴 , ▶️ , 🏻 | 😴︎ , ▶ , 🛌🏽 , 🇵🇹 , 🇵🇵 ,2️⃣ , 🏴 , 🏴 , 🤾🏽♀️ , 🤾🏽♀ , 🏌♂️ , 🤠🤢 , 1 |
Unicode::Emoji::REGEX_TEXT | Matches only textual singleton Emoji | 😴︎ , ▶ | 😴 , ▶️ , 🏻 , 🛌🏽 , 🇵🇹 , 🇵🇵 ,2️⃣ , 🏴 , 🏴 , 🤾🏽♀️ , 🤾🏽♀ , 🏌♂️ , 🤠🤢 , 1 |
Here is a list of all Emoji that can be matched using the two regexes: character.construction/emoji-vs-text. The REGEX_BASIC
regex also matches visual Emoji components (skin tone modifiers and hair components).
While REGEX_BASIC
is part of the above regexes, REGEX_TEXT
is only included in the *_INCLUDE_TEXT
or *_UQE
variants.
Regex | 1 RGI/FQE | 2 RGI/MQE | 3 RGI/UQE | 4 Non-RGI | 5 Valid Region | 6 Any Region | 7 RGI Tag | 8 Valid Tag | 9 Any Tag | 10 Basic Emoji | 11 Basic Text | 12 Text Keycap |
---|---|---|---|---|---|---|---|---|---|---|---|---|
REGEX | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
REGEX INCLUDE TEXT | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
REGEX INCLUDE MQE | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
REGEX INCLUDE MQE UQE | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
REGEX VALID | ✅ | ✅ | (✅)¹ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ |
REGEX VALID INCLUDE TEXT | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ |
REGEX WELL FORMED | ✅ | ✅ | (✅)¹ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
REGEX WELL FORMED INCLUDE TEXT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
REGEX POSSIBLE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
REGEX BASIC | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
REGEX TEXT | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ |
¹ Matches all unqualified Emoji, except for textual singleton Emoji (see columns 11, 12)
See spec files for detailed examples about which regex matches which kind of Emoji.
REGEX
(recommended Emoji set, RGI)REGEX_INCLUDE_MQE
or REGEX_INCLUDE_MQE_UQE
if you want to catch Emoji sequences with missing Variation Selectors.REGEX_VALID
REGEX_WELL_FORMED
_INCLUDE_TEXT
suffix with any of the above base regexes, if you want to also match basic textual EmojiREGEX_POSSIBLE
, which is a simplified test for possible Emoji, comparable to REGEX_WELL_FORMED*
. It might contain false positives, however, the regex is less complex and suggested in the Unicode standard itself as a first check.Desc | Emoji | Escaped | REGEX (RGI/FQE) | REGEX_INCLUDE_MQE (RGI/MQE) | REGEX_VALID | REGEX_WELL_FORMED / REGEX_POSSIBLE |
---|---|---|---|---|---|---|
RGI ZWJ Sequence | 🤾🏽♀️ | \u{1F93E 1F3FD 200D 2640 FE0F} | ✅ | ✅ | ✅ | ✅ |
RGI ZWJ Sequence MQE | 🤾🏽♀ | \u{1F93E 1F3FD 200D 2640} | ❌ | ✅ | ✅ | ✅ |
Valid ZWJ Sequence, Non-RGI | 🤠🤢 | \u{1F920 200D 1F922} | ❌ | ❌ | ✅ | ✅ |
Known Region | 🇵🇹 | \u{1F1F5 1F1F9} | ✅ | ✅ | ✅ | ✅ |
Unknown Region | 🇵🇵 | \u{1F1F5 1F1F5} | ❌ | ❌ | ❌ | ✅ |
RGI Tag Sequence | 🏴 | \u{1F3F4 E0067 E0062 E0073 E0063 E0074 E007F} | ✅ | ✅ | ✅ | ✅ |
Valid Tag Sequence | 🏴 | \u{1F3F4 E0067 E0062 E0061 E0067 E0062 E007F} | ❌ | ❌ | ✅ | ✅ |
Well-formed Tag Sequence | 😴 | \u{1F634 E0067 E0062 E0061 E0061 E0061 E007F} | ❌ | ❌ | ❌ | ✅ |
Please see the standard for more details, examples, explanations.
More info about valid vs. recommended Emoji can also be found in this blog article on Emojipedia.
Ruby includes native regex Emoji properties, as listed in the following table. You can also opt-in to use the *_PROP_*
regexes to get the Emoji support level of this gem (instead of Ruby's).
Gem Regex (Unicode::Emoji 's Emoji support level) | Native Regex (Ruby's Emoji support level) |
---|---|
Unicode::Emoji::REGEX_PROP_EMOJI | /\p{Emoji}/ |
Unicode::Emoji::REGEX_PROP_MODIFIER | /\p{EMod}/ |
Unicode::Emoji::REGEX_PROP_MODIFIER_BASE | /\p{EBase}/ |
Unicode::Emoji::REGEX_PROP_COMPONENT | /\p{EComp}/ |
Unicode::Emoji::REGEX_PROP_PRESENTATION | /\p{EPres}/ |
Unicode::Emoji::REGEX_TEXT_PRESENTATION | /[\p{Emoji}&&\P{EPres}]/ |
Unicode::Emoji::REGEX_PICTO
matches single codepoints with the Extended_Pictographic property. For example, it will match ✀
BLACK SAFETY SCISSORS.
Unicode::Emoji::REGEX_PICTO_NO_EMOJI
matches single codepoints with the Extended_Pictographic property, but excludes Emoji characters.
See character.construction/picto for a list of all non-Emoji pictographic characters.
Use Unicode::Emoji::LIST
or the list method to get a ordered and categorized list of Emoji:
Unicode::Emoji.list.keys
# => ["Smileys & Emotion", "People & Body", "Component", "Animals & Nature", "Food & Drink", "Travel & Places", "Activities", "Objects", "Symbols", "Flags"]
Unicode::Emoji.list("Food & Drink").keys
# => ["food-fruit", "food-vegetable", "food-prepared", "food-asian", "food-marine", "food-sweet", "drink", "dishware"]
Unicode::Emoji.list("Food & Drink", "food-asian")
=> ["🍱", "🍘", "🍙", "🍚", "🍛", "🍜", "🍝", "🍠", "🍢", "🍣", "🍤", "🍥", "🥮", "🍡", "🥟", "🥠", "🥡"]
Please note that categories might change with future versions of the Emoji standard, although this has not happened often.
A list of all Emoji (generated from this gem) can be found at character.construction/emoji.
Allows you to access the codepoint data for a single character form Unicode's emoji-data.txt file:
require "unicode/emoji"
Unicode::Emoji.properties "☝" # => ["Emoji", "Emoji_Modifier_Base"]
FAQs
Unknown package
We found that unicode-emoji demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.