🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
Book a DemoInstallSign in
Socket

unicode-string-to-identifier

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

unicode-string-to-identifier

A tool that converts arbitrary text (like user input or file names) into valid Python identifiers while preserving as much of the original meaning as possible.

0.1.0a0
PyPI
Maintainers
1

A tool that converts arbitrary text (like user input or file names) into valid Python identifiers while preserving as much of the original meaning as possible.

Example Usage

from __future__ import print_function
from unicode_string_to_identifier import unicode_string_to_identifier

print(unicode_string_to_identifier(u""))
# u"_"
print(unicode_string_to_identifier(u" \r\n\t"))
# u"_"
print(unicode_string_to_identifier(u"123abc"))
# u"_123abc"
print(unicode_string_to_identifier(u"&abc 123"))
# u"_abc_123"
print(unicode_string_to_identifier(u"  hello  world  $"))
# u"_hello_world__"
print(unicode_string_to_identifier(u"测试@unicode"))
# u"测试_unicode"

How It Works

First, get_character_type() categorizes each Unicode character into one of four types:

  • LETTER_OR_UNDERSCORE (Unicode category starting with 'L' or underscore _)
  • DECIMAL_DIGIT (Unicode category 'Nd')
  • SPACE_OR_CONTROL (Unicode categories starting with 'Z' or 'Cc')
  • OTHER (all other characters)

Then, it implements the following conversion rules using a state machine:

  • The first character must be a letter/underscore. If the first character is a digit, prepend _ to make it valid.
  • Subsequent valid characters (letters/underscores/digits) are kept as-is.
  • Other characters are replaced with underscores, but whitespace/control character sequences are collapsed into single underscores.
  • Ensures the output is non-empty (appends _ if empty).

Contributing

Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.

License

This project is licensed under the MIT License.

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts