mdast-util-to-nlcst
mdast utility to transform to nlcst.
Contents
What is this?
This package is a utility that takes an mdast (markdown) syntax tree as
input and turns it into nlcst (natural language).
When should I use this?
This project is useful when you want to deal with ASTs and inspect the natural
language inside markdown.
Unfortunately, there is no way yet to apply changes to the nlcst back into
mdast.
The hast utility hast-util-to-nlcst
does the same but
uses an HTML tree as input.
The remark plugin remark-retext
wraps this utility to do the
same at a higher-level (easier) abstraction.
Install
This package is ESM only.
In Node.js (version 16+), install with npm:
npm install mdast-util-to-nlcst
In Deno with esm.sh
:
import {toNlcst} from 'https://esm.sh/mdast-util-to-nlcst@7'
In browsers with esm.sh
:
<script type="module">
import {toNlcst} from 'https://esm.sh/mdast-util-to-nlcst@7?bundle'
</script>
Use
Say we have the following example.md
:
Some *foo*sball.
…and next to it a module example.js
:
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toNlcst} from 'mdast-util-to-nlcst'
import {ParseEnglish} from 'parse-english'
import {read} from 'to-vfile'
import {inspect} from 'unist-util-inspect'
const file = await read('example.md')
const mdast = fromMarkdown(file)
const nlcst = toNlcst(mdast, file, ParseEnglish)
console.log(inspect(nlcst))
Yields:
RootNode[1] (1:1-1:17, 0-16)
└─0 ParagraphNode[1] (1:1-1:17, 0-16)
└─0 SentenceNode[4] (1:1-1:17, 0-16)
├─0 WordNode[1] (1:1-1:5, 0-4)
│ └─0 TextNode "Some" (1:1-1:5, 0-4)
├─1 WhiteSpaceNode " " (1:5-1:6, 4-5)
├─2 WordNode[2] (1:7-1:16, 6-15)
│ ├─0 TextNode "foo" (1:7-1:10, 6-9)
│ └─1 TextNode "sball" (1:11-1:16, 10-15)
└─3 PunctuationNode "." (1:16-1:17, 15-16)
API
This package exports the identifier toNlcst
.
There is no default export.
toNlcst(tree, file, Parser[, options])
Turn an mdast tree into an nlcst tree.
👉 Note: tree
must have positional info and file
must be a VFile
corresponding to tree
.
Parameters
Returns
nlcst tree (NlcstNode
).
Options
Configuration (TypeScript type).
Fields
ignore
List of mdast node types to ignore (Array<string>
, optional).
The types 'table'
, 'tableRow'
, and 'tableCell'
are always ignored.
Show example
Say we have the following file example.md
:
A paragraph.
> A paragraph in a block quote.
…and if we now transform with ignore: ['blockquote']
, we get:
RootNode[2] (1:1-3:1, 0-14)
├─0 ParagraphNode[1] (1:1-1:13, 0-12)
│ └─0 SentenceNode[4] (1:1-1:13, 0-12)
│ ├─0 WordNode[1] (1:1-1:2, 0-1)
│ │ └─0 TextNode "A" (1:1-1:2, 0-1)
│ ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
│ ├─2 WordNode[1] (1:3-1:12, 2-11)
│ │ └─0 TextNode "paragraph" (1:3-1:12, 2-11)
│ └─3 PunctuationNode "." (1:12-1:13, 11-12)
└─1 WhiteSpaceNode "\n\n" (1:13-3:1, 12-14)
source
List of mdast node types to mark as nlcst source nodes
(Array<string>
, optional).
The type 'inlineCode'
is always marked as source.
Show example
Say we have the following file example.md
:
A paragraph.
> A paragraph in a block quote.
…and if we now transform with source: ['blockquote']
, we get:
RootNode[3] (1:1-3:32, 0-45)
├─0 ParagraphNode[1] (1:1-1:13, 0-12)
│ └─0 SentenceNode[4] (1:1-1:13, 0-12)
│ ├─0 WordNode[1] (1:1-1:2, 0-1)
│ │ └─0 TextNode "A" (1:1-1:2, 0-1)
│ ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
│ ├─2 WordNode[1] (1:3-1:12, 2-11)
│ │ └─0 TextNode "paragraph" (1:3-1:12, 2-11)
│ └─3 PunctuationNode "." (1:12-1:13, 11-12)
├─1 WhiteSpaceNode "\n\n" (1:13-3:1, 12-14)
└─2 ParagraphNode[1] (3:1-3:32, 14-45)
└─0 SentenceNode[1] (3:1-3:32, 14-45)
└─0 SourceNode "> A paragraph in a block quote." (3:1-3:32, 14-45)
ParserConstructor
Create a new parser (TypeScript type).
Type
type ParserConstructor = new () => ParserInstance
ParserInstance
nlcst parser (TypeScript type).
For example, parse-dutch
, parse-english
, or
parse-latin
.
Type
type ParserInstance = {
tokenizeSentencePlugins: ((node: NlcstSentence) => undefined)[]
tokenizeParagraphPlugins: ((node: NlcstParagraph) => undefined)[]
tokenizeRootPlugins: ((node: NlcstRoot) => undefined)[]
parse(value: string | null | undefined): NlcstRoot
tokenize(value: string | null | undefined): Array<NlcstSentenceContent>
}
Types
This package is fully typed with TypeScript.
It exports the types Options
,
ParserConstructor
, and
ParserInstance
.
Compatibility
Projects maintained by the unified collective are compatible with maintained
versions of Node.js.
When we cut a new major release, we drop support for unmaintained versions of
Node.
This means we try to keep the current release line, mdast-util-to-nlcst@^7
,
compatible with Node.js 16.
Security
Use of mdast-util-to-nlcst
does not involve hast so there are no
openings for cross-site scripting (XSS) attacks.
Related
Contribute
See contributing.md
in syntax-tree/.github
for
ways to get started.
See support.md
for ways to get help.
This project has a code of conduct.
By interacting with this repository, organization, or community you agree to
abide by its terms.
License
MIT © Titus Wormer