Security News
TC39 Advances 10+ ECMAScript Proposals: Key Features to Watch
TC39 is meeting in Tokyo this week and they have approved nearly a dozen proposals to advance to the next stages.
micromark-util-character
Advanced tools
The micromark-util-character package provides utility functions for character classification and manipulation, specifically designed to support the micromark Markdown parser. It includes methods for identifying whitespace, punctuation, and other character types relevant to Markdown syntax parsing.
Whitespace detection
This feature allows you to detect if a character is considered whitespace (e.g., spaces, tabs). It's useful for parsing tasks where whitespace needs to be treated specially.
"use strict";
var character = require('micromark-util-character');
console.log(character.isSpace(' ')); // true
console.log(character.isSpace('\t')); // true
console.log(character.isSpace('a')); // false
Punctuation detection
This feature enables the identification of punctuation characters, which is crucial for parsing and tokenizing Markdown syntax where punctuation often has special meaning.
"use strict";
var character = require('micromark-util-character');
console.log(character.isPunctuation('!')); // true
console.log(character.isPunctuation('.')); // true
console.log(character.isPunctuation('a')); // false
Alphanumeric check
This functionality allows for checking if a character is alphanumeric, supporting the parsing process by distinguishing between textual content and syntax markers.
"use strict";
var character = require('micromark-util-character');
console.log(character.isAlphanumeric('a')); // true
console.log(character.isAlphanumeric('1')); // true
console.log(character.isAlphanumeric('!')); // false
markdown-it is a comprehensive Markdown parser with a focus on speed and extensibility. While it provides its own character handling utilities, micromark-util-character is more focused on low-level character classification, making it more suitable for detailed parsing tasks.
remark-parse is a plugin for the remark Markdown processor that parses Markdown into an abstract syntax tree (AST). It shares a similar goal with micromark-util-character in terms of parsing Markdown, but remark-parse offers a higher-level API focused on the structure of Markdown documents rather than the individual characters.
micromark utility to handle character codes.
This package exposes algorithms to check whether characters match groups.
This package might be useful when you are making your own micromark extensions.
This package is ESM only. In Node.js (version 16+), install with npm:
npm install micromark-util-character
In Deno with esm.sh
:
import * as character from 'https://esm.sh/micromark-util-character@1'
In browsers with esm.sh
:
<script type="module">
import * as character from 'https://esm.sh/micromark-util-character@1?bundle'
</script>
import {asciiAlpha} from 'micromark-util-character'
console.log(asciiAlpha(64)) // false
console.log(asciiAlpha(65)) // true
This module exports the identifiers
asciiAlpha
,
asciiAlphanumeric
,
asciiAtext
,
asciiControl
,
asciiDigit
,
asciiHexDigit
,
asciiPunctuation
,
markdownLineEnding
,
markdownLineEndingOrSpace
,
markdownSpace
,
unicodePunctuation
,
unicodeWhitespace
.
There is no default export.
asciiAlpha(code)
Check whether the character code represents an ASCII alpha (a
through
z
, case insensitive).
An ASCII alpha is an ASCII upper alpha or ASCII lower alpha.
An ASCII upper alpha is a character in the inclusive range U+0041 (A
)
to U+005A (Z
).
An ASCII lower alpha is a character in the inclusive range U+0061 (a
)
to U+007A (z
).
code
(Code
)
— codeWhether it matches (boolean
).
asciiAlphanumeric(code)
Check whether the character code represents an ASCII alphanumeric (a
through z
, case insensitive, or 0
through 9
).
An ASCII alphanumeric is an ASCII digit (see asciiDigit
) or ASCII alpha
(see asciiAlpha
).
code
(Code
)
— codeWhether it matches (boolean
).
asciiAtext(code)
Check whether the character code represents an ASCII atext.
atext is an ASCII alphanumeric (see asciiAlphanumeric
), or a character in
the inclusive ranges U+0023 NUMBER SIGN (#
) to U+0027 APOSTROPHE ('
),
U+002A ASTERISK (*
), U+002B PLUS SIGN (+
), U+002D DASH (-
), U+002F
SLASH (/
), U+003D EQUALS TO (=
), U+003F QUESTION MARK (?
), U+005E
CARET (^
) to U+0060 GRAVE ACCENT (`
), or U+007B LEFT CURLY BRACE
({
) to U+007E TILDE (~
) ([RFC5322]).
See [RFC5322]:
Internet Message Format.
P. Resnick.
IETF.
code
(Code
)
— codeWhether it matches (boolean
).
asciiControl(code)
Check whether a character code is an ASCII control character.
An ASCII control is a character in the inclusive range U+0000 NULL (NUL) to U+001F (US), or U+007F (DEL).
code
(Code
)
— codeWhether it matches (boolean
).
asciiDigit(code)
Check whether the character code represents an ASCII digit (0
through
9
).
An ASCII digit is a character in the inclusive range U+0030 (0
) to
U+0039 (9
).
code
(Code
)
— codeWhether it matches (boolean
).
asciiHexDigit(code)
Check whether the character code represents an ASCII hex digit (a
through f
, case insensitive, or 0
through 9
).
An ASCII hex digit is an ASCII digit (see asciiDigit
), ASCII upper hex
digit, or an ASCII lower hex digit.
An ASCII upper hex digit is a character in the inclusive range U+0041
(A
) to U+0046 (F
).
An ASCII lower hex digit is a character in the inclusive range U+0061
(a
) to U+0066 (f
).
code
(Code
)
— codeWhether it matches (boolean
).
asciiPunctuation(code)
Check whether the character code represents ASCII punctuation.
An ASCII punctuation is a character in the inclusive ranges U+0021
EXCLAMATION MARK (!
) to U+002F SLASH (/
), U+003A COLON (:
) to U+0040 AT
SIGN (@
), U+005B LEFT SQUARE BRACKET ([
) to U+0060 GRAVE ACCENT
(`
), or U+007B LEFT CURLY BRACE ({
) to U+007E TILDE (~
).
code
(Code
)
— codeWhether it matches (boolean
).
markdownLineEnding(code)
Check whether a character code is a markdown line ending.
A markdown line ending is the virtual characters M-0003 CARRIAGE RETURN LINE FEED (CRLF), M-0004 LINE FEED (LF) and M-0005 CARRIAGE RETURN (CR).
In micromark, the actual character U+000A LINE FEED (LF) and U+000D CARRIAGE RETURN (CR) are replaced by these virtual characters depending on whether they occurred together.
code
(Code
)
— codeWhether it matches (boolean
).
markdownLineEndingOrSpace(code)
Check whether a character code is a markdown line ending (see
markdownLineEnding
) or markdown space (see markdownSpace
).
code
(Code
)
— codeWhether it matches (boolean
).
markdownSpace(code)
Check whether a character code is a markdown space.
A markdown space is the concrete character U+0020 SPACE (SP) and the virtual characters M-0001 VIRTUAL SPACE (VS) and M-0002 HORIZONTAL TAB (HT).
In micromark, the actual character U+0009 CHARACTER TABULATION (HT) is replaced by one M-0002 HORIZONTAL TAB (HT) and between 0 and 3 M-0001 VIRTUAL SPACE (VS) characters, depending on the column at which the tab occurred.
code
(Code
)
— codeWhether it matches (boolean
).
unicodePunctuation(code)
Check whether the character code represents Unicode punctuation.
A Unicode punctuation is a character in the Unicode Pc
(Punctuation,
Connector), Pd
(Punctuation, Dash), Pe
(Punctuation, Close), Pf
(Punctuation, Final quote), Pi
(Punctuation, Initial quote), Po
(Punctuation, Other), or Ps
(Punctuation, Open) categories, or an ASCII
punctuation (see asciiPunctuation
) ([UNICODE]).
See [UNICODE]:
The Unicode Standard.
Unicode Consortium.
code
(Code
)
— codeWhether it matches (boolean
).
unicodeWhitespace(code)
Check whether the character code represents Unicode whitespace.
Note that this does handle micromark specific markdown whitespace characters.
See markdownLineEndingOrSpace
to check that.
A Unicode whitespace is a character in the Unicode Zs
(Separator,
Space) category, or U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF),
U+000C (FF), or U+000D CARRIAGE RETURN (CR) ([UNICODE]).
See [UNICODE]:
The Unicode Standard.
Unicode Consortium.
code
(Code
)
— codeWhether it matches (boolean
).
This package is fully typed with TypeScript. It exports no additional types.
Projects maintained by the unified collective are compatible with maintained versions of Node.js.
When we cut a new major release, we drop support for unmaintained versions of
Node.
This means we try to keep the current release line,
micromark-util-character@^2
, compatible with Node.js 16.
This package works with micromark@^3
.
This package is safe.
See security.md
in micromark/.github
for how to
submit a security report.
See contributing.md
in micromark/.github
for ways
to get started.
See support.md
for ways to get help.
This project has a code of conduct. By interacting with this repository, organisation, or community you agree to abide by its terms.
FAQs
micromark utility to handle character codes
We found that micromark-util-character demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
TC39 is meeting in Tokyo this week and they have approved nearly a dozen proposals to advance to the next stages.
Security News
Our threat research team breaks down two malicious npm packages designed to exploit developer trust, steal your data, and destroy data on your machine.
Security News
A senior white house official is urging insurers to stop covering ransomware payments, indicating possible stricter regulations to deter cybercrime.