Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
hast-util-to-nlcst
Advanced tools
hast utility to transform to nlcst.
This package is a utility that takes a hast (HTML) syntax tree as input and turns it into nlcst (natural language).
This project is useful when you want to deal with ASTs and inspect the natural language inside HTML. Unfortunately, there is no way yet to apply changes to the nlcst back into hast.
The mdast utility mdast-util-to-nlcst
does the same but
uses a markdown tree as input.
The rehype plugin rehype-retext
wraps this utility to do the
same at a higher-level (easier) abstraction.
This package is ESM only. In Node.js (version 16+), install with npm:
npm install hast-util-to-nlcst
In Deno with esm.sh
:
import {toNlcst} from 'https://esm.sh/hast-util-to-nlcst@4'
In browsers with esm.sh
:
<script type="module">
import {toNlcst} from 'https://esm.sh/hast-util-to-nlcst@4?bundle'
</script>
Say our document example.html
contains:
<article>
Implicit.
<h1>Explicit: <strong>foo</strong>s-ball</h1>
<pre><code class="language-foo">bar()</code></pre>
</article>
…and our module example.js
looks as follows:
import {fromHtml} from 'hast-util-from-html'
import {toNlcst} from 'hast-util-to-nlcst'
import {ParseEnglish} from 'parse-english'
import {read} from 'to-vfile'
import {inspect} from 'unist-util-inspect'
const file = await read('example.html')
const tree = fromHtml(file)
console.log(inspect(toNlcst(tree, file, ParseEnglish)))
…now running node example.js
yields (positional info removed for brevity):
RootNode[2] (1:1-6:1, 0-134)
├─0 ParagraphNode[3] (1:10-3:3, 9-24)
│ ├─0 WhiteSpaceNode "\n " (1:10-2:3, 9-12)
│ ├─1 SentenceNode[2] (2:3-2:12, 12-21)
│ │ ├─0 WordNode[1] (2:3-2:11, 12-20)
│ │ │ └─0 TextNode "Implicit" (2:3-2:11, 12-20)
│ │ └─1 PunctuationNode "." (2:11-2:12, 20-21)
│ └─2 WhiteSpaceNode "\n " (2:12-3:3, 21-24)
└─1 ParagraphNode[1] (3:7-3:43, 28-64)
└─0 SentenceNode[4] (3:7-3:43, 28-64)
├─0 WordNode[1] (3:7-3:15, 28-36)
│ └─0 TextNode "Explicit" (3:7-3:15, 28-36)
├─1 PunctuationNode ":" (3:15-3:16, 36-37)
├─2 WhiteSpaceNode " " (3:16-3:17, 37-38)
└─3 WordNode[4] (3:25-3:43, 46-64)
├─0 TextNode "foo" (3:25-3:28, 46-49)
├─1 TextNode "s" (3:37-3:38, 58-59)
├─2 PunctuationNode "-" (3:38-3:39, 59-60)
└─3 TextNode "ball" (3:39-3:43, 60-64)
This package exports the identifier toNlcst
.
There is no default export.
toNlcst(tree, file, Parser)
Turn a hast tree into an nlcst tree.
👉 Note:
tree
must have positional info andfile
must be aVFile
corresponding totree
.
tree
(HastNode
)
— hast tree to transformfile
(VFile
)
— virtual fileParser
(ParserConstructor
or
ParserInstance
)
— parser to use.The algorithm supports implicit and explicit paragraphs, such as:
<article>
An implicit paragraph.
<h1>An explicit paragraph.</h1>
</article>
Overlapping paragraphs are also supported (see the tests or the HTML spec for more info).
Some elements are ignored and their content will not be present in
nlcst: <script>
, <style>
, <svg>
, <math>
, <del>
.
To ignore other elements, add a data-nlcst
attribute with a value of ignore
:
<p>This is <span data-nlcst="ignore">hidden</span>.</p>
<p data-nlcst="ignore">Completely hidden.</p>
<code>
elements are mapped to Source
nodes in
nlcst.
To mark other elements as source, add a data-nlcst
attribute with a value
of source
:
<p>This is <span data-nlcst="source">marked as source</span>.</p>
<p data-nlcst="source">Completely marked.</p>
ParserConstructor
Create a new parser (TypeScript type).
type ParserConstructor = new () => ParserInstance
ParserInstance
nlcst parser (TypeScript type).
For example, parse-dutch
, parse-english
, or
parse-latin
.
type ParserInstance = {
parse(value?: string | null | undefined): NlcstRoot
tokenize(value?: string | null | undefined): Array<NlcstSentenceContent>
tokenizeParagraph(value?: string | null | undefined): NlcstParagraph
tokenizeParagraphPlugins: Array<(node: NlcstParagraph) => undefined | void>
tokenizeSentencePlugins: Array<(node: NlcstSentence) => undefined | void>
}
This package is fully typed with TypeScript.
It exports the additional types ParserConstructor
and ParserInstance
.
Projects maintained by the unified collective are compatible with maintained versions of Node.js.
When we cut a new major release, we drop support for unmaintained versions of
Node.
This means we try to keep the current release line, hast-util-to-nlcst@^4
,
compatible with Node.js 16.
hast-util-to-nlcst
does not change the original syntax tree so there are no
openings for cross-site scripting (XSS) attacks.
mdast-util-to-nlcst
— transform mdast to nlcsthast-util-to-mdast
— transform hast to mdasthast-util-to-xast
— transform hast to xastSee contributing.md
in syntax-tree/.github
for
ways to get started.
See support.md
for ways to get help.
This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.
FAQs
hast utility to transform to nlcst
We found that hast-util-to-nlcst demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.