hast-util-to-text
hast utility to get the plain-text value of a node.
Contents
What is this?
This package is a utility that takes a hast node and gets its plain-text
value.
It is like the DOMs Node#innerText
, which is a bit nicer than
Node#textContent
, because this turns <br>
elements into line breaks and
uses '\t'
(tabs) between table cells.
There are some small deviations from the spec, because the DOM has knowledge of
associated CSS, and can take into account that elements have display: none
or
text-transform
association with them, and this utility can’t do that.
When should I use this?
This is a small utility that is useful when you want a plain-text version of a
node that is close to how it’s “visible” to users.
This utility is similar to hast-util-to-string
, which
is simpler, and more like the Node#textContent
algorithm discussed above.
There is also a package hast-util-from-text
, which sort
of does the inverse: it takes a string, sets that as text on the node, while
turning line endings into <br>
s
Install
This package is ESM only.
In Node.js (version 16+), install with npm:
npm install hast-util-to-text
In Deno with esm.sh
:
import {toText} from 'https://esm.sh/hast-util-to-text@4'
In browsers with esm.sh
:
<script type="module">
import {toText} from 'https://esm.sh/hast-util-to-text@4?bundle'
</script>
Use
import {h} from 'hastscript'
import {toText} from 'hast-util-to-text'
const tree = h('div', [
h('h1', {hidden: true}, 'Alpha.'),
h('article', [
h('p', ['Bravo', h('br'), 'charlie.']),
h('p', 'Delta echo \t foxtrot.')
])
])
console.log(toText(tree))
Yields:
Bravo
charlie.
Delta echo foxtrot.
API
This package exports the identifier toText
.
There is no default export.
toText(tree[, options])
Get the plain-text value of a node.
Parameters
tree
(Node
)
— tree to turn into textoptions
(Options
, optional)
— configuration
Returns
Serialized tree
(string
).
Algorithm
- if
tree
is a comment, returns its value
- if
tree
is a text, applies normal whitespace collapsing to its
value
, as defined by the CSS Text spec - if
tree
is a root or element, applies an algorithm similar to the
innerText
getter as defined by HTML
Notes
👉 Note: the algorithm acts as if tree
is being rendered, and as if
we’re a CSS-supporting user agent, with scripting enabled.
- if
tree
is an element that is not displayed (such as a head
), we’ll
still use the innerText
algorithm instead of switching to textContent
- if descendants of
tree
are elements that are not displayed, they are
ignored - CSS is not considered, except for the default user agent style sheet
- a line feed is collapsed instead of ignored in cases where Fullwidth, Wide,
or Halfwidth East Asian Width characters are used, the same goes for a case
with Chinese, Japanese, or Yi writing systems
- replaced elements (such as
audio
) are treated like non-replaced elements
Options
Configuration (TypeScript type).
Fields
whitespace
(Whitespace
, default: 'normal'
)
— default whitespace setting to use
Whitespace
Valid and useful whitespace values (from CSS) (TypeScript type).
Type
type Whitespace = 'normal' | 'nowrap' | 'pre' | 'pre-wrap'
Types
This package is fully typed with TypeScript.
It exports the additional types Options
and
Whitespace
.
Compatibility
Projects maintained by the unified collective are compatible with maintained
versions of Node.js.
When we cut a new major release, we drop support for unmaintained versions of
Node.
This means we try to keep the current release line, hast-util-to-text@^4
,
compatible with Node.js 16.
Security
hast-util-to-text
does not change the syntax tree so there are no
openings for cross-site scripting (XSS) attacks.
Related
Contribute
See contributing.md
in syntax-tree/.github
for
ways to get started.
See support.md
for ways to get help.
This project has a code of conduct.
By interacting with this repository, organization, or community you agree to
abide by its terms.
License
MIT © Titus Wormer