Security News
RubyGems.org Adds New Maintainer Role
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
mdast-util-to-hast
Advanced tools
The mdast-util-to-hast package is a utility that allows for the transformation of MDAST (Markdown Abstract Syntax Tree) to HAST (Hypertext Abstract Syntax Tree). This is particularly useful for applications that need to convert markdown content into HTML or other formats that can be more easily manipulated or displayed in web environments.
Convert MDAST to HAST
This feature allows for the conversion of a markdown document represented as an MDAST into a HAST. The code sample demonstrates how to use unified with the remark-parse plugin to parse markdown into MDAST, and then convert that MDAST into HAST using mdast-util-to-hast.
const unified = require('unified')
const markdown = require('remark-parse')
const toHAST = require('mdast-util-to-hast')
unified()
.use(markdown)
.use(() => tree => toHAST(tree))
.process('# Hello world!', function (err, file) {
if (err) throw err
console.log(file)
})
remark-html is a plugin for the remark processor that compiles markdown to HTML. It is similar to mdast-util-to-hast in that it also deals with the conversion of markdown content to a more web-friendly format. However, remark-html is more of an end-to-end solution for converting markdown directly to HTML, whereas mdast-util-to-hast provides a lower-level conversion to HAST, which can then be further manipulated or converted to HTML.
rehype is a processor powered by plugins part of the unified ecosystem that manipulates HTML documents. It is similar to mdast-util-to-hast in the sense that both deal with HAST nodes. However, rehype operates directly on HTML content or HAST, focusing on the manipulation and processing of HTML, whereas mdast-util-to-hast is specifically designed for converting MDAST to HAST.
mdast utility to transform to hast.
This package is a utility that takes an mdast (markdown) syntax tree as input and turns it into a hast (HTML) syntax tree.
This project is useful when you want to deal with ASTs and turn markdown to HTML.
The hast utility hast-util-to-mdast
does the inverse of
this utility.
It turns HTML into markdown.
The remark plugin remark-rehype
wraps this utility to also
turn markdown to HTML at a higher-level (easier) abstraction.
This package is ESM only. In Node.js (version 12.20+, 14.14+, or 16.0+), install with npm:
npm install mdast-util-to-hast
In Deno with esm.sh
:
import {toHast} from "https://esm.sh/mdast-util-to-hast@12"
In browsers with esm.sh
:
<script type="module">
import {toHast} from "https://esm.sh/mdast-util-to-hast@12?bundle"
</script>
Say we have the following example.md
:
## Hello **World**!
…and next to it a module example.js
:
import {promises as fs} from 'node:fs'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'
const markdown = String(await fs.readFile('example.md'))
const mdast = fromMarkdown(markdown)
const hast = toHast(mdast)
const html = toHtml(hast)
console.log(html)
…now running node example.js
yields:
<h2>Hello <strong>World</strong>!</h2>
This package exports the identifiers toHast
, defaultHandlers
, all
, and
one
.
There is no default export.
toHast(node[, options])
mdast utility to transform to hast.
options
Configuration (optional).
options.allowDangerousHtml
Whether to persist raw HTML in markdown in the hast tree (boolean
, default:
false
).
Raw HTML is available in mdast as html
nodes and can be embedded
in hast as semistandard raw
nodes.
Most utilities ignore raw
nodes but two notable ones don’t:
hast-util-to-html
also has an option
allowDangerousHtml
which will output the raw HTML.
This is typically discouraged as noted by the option name but is useful if
you completely trust authorshast-util-raw
can handle the raw embedded HTML strings by
parsing them into standard hast nodes (element
, text
, etc).
This is a heavy task as it needs a full HTML parser, but it is the only way
to support untrusted contentoptions.clobberPrefix
Prefix to use before the id
attribute on footnotes to prevent it from
clobbering (string
, default: 'user-content-'
).
DOM clobbering is this:
<p id=x></p>
<script>alert(x) // `x` now refers to the DOM `p#x` element</script>
Elements by their ID are made available by browsers on the window
object,
which is a security risk.
Using a prefix solves this problem.
More information on how to handle clobbering and the prefix is explained in
Example: headings (DOM clobbering) in rehype-sanitize
.
👉 Note: this option affects footnotes. Footnotes are not specified by CommonMark. They are supported by GitHub, so they can be enabled by using the utility
mdast-util-gfm
.
options.footnoteLabel
Label to use for the footnotes section (string
, default: 'Footnotes'
).
Affects screen readers.
Change it when the markdown is not in English.
👉 Note: this option affects footnotes. Footnotes are not specified by CommonMark. They are supported by GitHub, so they can be enabled by using the utility
mdast-util-gfm
.
options.footnoteLabelTagName
HTML tag to use for the footnote label (string
, default: h2
).
Can be changed to match your document structure and play well with your CSS.
👉 Note: this option affects footnotes. Footnotes are not specified by CommonMark. They are supported by GitHub, so they can be enabled by using the utility
mdast-util-gfm
.
options.footnoteLabelProperties
Properties to use on the footnote label (object
, default:
{className: ['sr-only']}
).
Importantly, id: 'footnote-label'
is always added, because footnote calls use
it with aria-describedby
to provide an accessible label.
A sr-only
class is added by default to hide this from sighted users.
Change it to make the label visible, or add classes for other purposes.
👉 Note: this option affects footnotes. Footnotes are not specified by CommonMark. They are supported by GitHub, so they can be enabled by using the utility
mdast-util-gfm
.
options.footnoteBackLabel
Label to use from backreferences back to their footnote call (string
, default:
'Back to content'
).
Affects screen readers.
Change it when the markdown is not in English.
👉 Note: this option affects footnotes. Footnotes are not specified by CommonMark. They are supported by GitHub, so they can be enabled by using the utility
mdast-util-gfm
.
options.handlers
Object mapping node types to functions handling the corresponding nodes.
See lib/handlers/
for examples.
In a handler, you have access to h
, which should be used to create hast nodes
from mdast nodes.
On h
, there are several fields that may be of interest.
options.passThrough
List of mdast node types to pass through (keep) in hast (Array<string>
,
default: []
).
If the passed through nodes have children, those children are expected to be
mdast and will be handled.
Similar functionality can be achieved with a custom handler.
A passThrough
of ['customNode']
is equivalent to:
toHast(/* … */, {
handlers: {
customNode(h, node) {
return 'children' in node ? {...node, children: all(h, node)} : node
}
}
})
options.unknownHandler
Handler for unknown nodes (Handler?
).
Unknown nodes are nodes with a type that isn’t in handlers
or passThrough
.
The default behavior for unknown nodes is:
value
(and doesn’t have data.hName
,
data.hProperties
, or data.hChildren
, see later), create a hast text
node<div>
element (which could be changed with
data.hName
), with its children mapped from mdast to hast as welldefaultHandlers
Object mapping mdast node types to functions that can handle them.
See lib/handlers/index.js
.
all(h, parent)
Helper function for writing custom handlers passed to options.handlers
.
Pass it h
and a parent node (mdast) and it will turn the node’s children into
an array of transformed nodes (hast).
one(h, node, parent)
Helper function for writing custom handlers passed to options.handlers
.
Pass it h
, a node
, and its parent
(mdast) and it will turn node
into
hast content.
If you completely trust authors (or plugins) and want to allow them to HTML in
markdown, and the last utility has an allowDangerousHtml
option as well (such
as hast-util-to-html
) you can pass allowDangerousHtml
to this utility
(mdast-util-to-hast
):
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'
const markdown = 'It <i>works</i>! <img onerror="alert(1)">'
const mdast = fromMarkdown(markdown)
const hast = toHast(mdast, {allowDangerousHtml: true})
const html = toHtml(hast, {allowDangerousHtml: true})
console.log(html)
…now running node example.js
yields:
<p>It <i>works</i>! <img onerror="alert(1)"></p>
⚠️ Danger: observe that the XSS attack through the
onerror
attribute is still present.
If you do not trust the authors of the input markdown, or if you want to make
sure that further utilities can see HTML embedded in markdown, use
hast-util-raw
.
The following example passes allowDangerousHtml
to this utility
(mdast-util-to-hast
), then turns the raw embedded HTML into proper HTML nodes
(hast-util-raw
), and finally sanitizes the HTML by only allowing safe things
(hast-util-sanitize
):
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
import {raw} from 'hast-util-raw'
import {sanitize} from 'hast-util-sanitize'
import {toHtml} from 'hast-util-to-html'
const markdown = 'It <i>works</i>! <img onerror="alert(1)">'
const mdast = fromMarkdown(markdown)
const hast = raw(toHast(mdast, {allowDangerousHtml: true}))
const safeHast = sanitize(hast)
const html = toHtml(safeHast)
console.log(html)
…now running node example.js
yields:
<p>It <i>works</i>! <img></p>
👉 Note: observe that the XSS attack through the
onerror
attribute is no longer present.
If you know that the markdown is authored in a language other than English,
and you’re using micromark-extension-gfm
and mdast-util-gfm
to match how
GitHub renders markdown, and you know that footnotes are (or can?) be used, you
should translate the labels associated with them.
Let’s first set the stage:
import {fromMarkdown} from 'mdast-util-from-markdown'
import {gfm} from 'micromark-extension-gfm'
import {gfmFromMarkdown} from 'mdast-util-gfm'
import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'
const markdown = 'Bonjour[^1]\n\n[^1]: Monde!'
const mdast = fromMarkdown(markdown, {
extensions: [gfm()],
mdastExtensions: [gfmFromMarkdown()]
})
const hast = toHast(mdast)
const html = toHtml(hast)
console.log(html)
…now running node example.js
yields:
<p>Bonjour<sup><a href="#user-content-fn-1" id="user-content-fnref-1" data-footnote-ref aria-describedby="footnote-label">1</a></sup></p>
<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
<ol>
<li id="user-content-fn-1">
<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref class="data-footnote-backref" aria-label="Back to content">↩</a></p>
</li>
</ol>
</section>
This is a mix of English and French that screen readers can’t handle nicely. Let’s say our program does know that the markdown is in French. In that case, it’s important to translate and define the labels relating to footnotes so that screen reader users can properly pronounce the page:
@@ -9,7 +9,10 @@ const mdast = fromMarkdown(markdown, {
extensions: [gfm()],
mdastExtensions: [gfmFromMarkdown()]
})
-const hast = toHast(mdast)
+const hast = toHast(mdast, {
+ footnoteLabel: 'Notes de bas de page',
+ footnoteBackLabel: 'Arrière'
+})
const html = toHtml(hast)
console.log(html)
…now running node example.js
with the above patch applied yields:
@@ -1,8 +1,8 @@
<p>Bonjour<sup><a href="#user-content-fn-1" id="user-content-fnref-1" data-footnote-ref aria-describedby="footnote-label">1</a></sup></p>
-<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
+<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Notes de bas de page</h2>
<ol>
<li id="user-content-fn-1">
-<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref class="data-footnote-backref" aria-label="Back to content">↩</a></p>
+<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref class="data-footnote-backref" aria-label="Arrière">↩</a></p>
</li>
</ol>
</section>
This project supports CommonMark and the GFM constructs (footnotes, strikethrough, tables) and the frontmatter constructs YAML and TOML. Support can be extended to other constructs in two ways: a) with handlers, b) through fields on nodes.
For example, when we represent a mark element in markdown and want to turn it
into a <mark>
element in HTML, we can use a handler:
import {toHast, all} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'
const mdast = {
type: 'paragraph',
children: [{type: 'mark', children: [{type: 'text', value: 'x'}]}]
}
const hast = toHast(mdast, {
handlers: {
mark(h, node) {
return h(node, 'mark', all(h, node))
}
}
})
console.log(toHtml(hast))
We can do the same through certain fields on nodes:
import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'
const mdast = {
type: 'paragraph',
children: [
{
type: 'mark',
children: [{type: 'text', value: 'x'}],
data: {hName: 'mark'}
}
]
}
console.log(toHtml(toHast(mdast)))
This project by default handles CommonMark, GFM (footnotes, strikethrough, tables) and common frontmatter (YAML, TOML).
Existing handlers can be overwritten and handlers for more nodes can be added. It’s also possible to define how mdast is turned into hast through fields on nodes.
The following table gives insight into what input turns into what output:
mdast node | markdown example | hast node | html example |
---|---|---|---|
|
|
| |
|
|
| |
|
|
| |
|
|
|
|
|
|
| |
|
|
| |
|
|
| |
|
Nothing (default), |
n/a | |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
| ||
|
|
| |
|
| ||
|
|
| |
|
|
| |
|
|
Nothing |
n/a |
|
|
Nothing |
n/a |
👉 Note: GFM prescribes that the obsolete
align
attribute ontd
andth
elements is used. To usestyle
attributes instead of obsolete features, combine this utility with@mapbox/hast-util-table-cell-style
.
🧑🏫 Info: this project is concerned with turning one syntax tree into another. It does not deal with markdown syntax or HTML syntax. The preceding examples are illustrative rather than authoritative or exhaustive.
A frequent problem arises when having to turn one syntax tree into another. As the original tree (in this case, mdast for markdown) is in some cases limited compared to the destination (in this case, hast for HTML) tree, is it possible to provide more info in the original to define what the result will be in the destination? This is possible by defining data on mdast nodes, which this utility will read as instructions on what hast nodes to create.
An example is math, which is a nonstandard markdown extension, that this utility
doesn’t understand.
To solve this, mdast-util-math
defines instructions on mdast nodes that this
plugin does understand because they define a certain hast structure.
The following fields can be used:
node.data.hName
configures the element’s tag namenode.data.hProperties
is mixed into the element’s propertiesnode.data.hChildren
configures the element’s childrenhName
node.data.hName
sets the tag name of an element.
The following mdast:
{
type: 'strong',
data: {hName: 'b'},
children: [{type: 'text', value: 'Alpha'}]
}
…yields (hast):
{
type: 'element',
tagName: 'b',
properties: {},
children: [{type: 'text', value: 'Alpha'}]
}
hProperties
node.data.hProperties
sets the properties of an element.
The following mdast:
{
type: 'image',
src: 'circle.svg',
alt: 'Big red circle on a black background',
title: null,
data: {hProperties: {className: ['responsive']}}
}
…yields (hast):
{
type: 'element',
tagName: 'img',
properties: {
src: 'circle.svg',
alt: 'Big red circle on a black background',
className: ['responsive']
},
children: []
}
hChildren
node.data.hChildren
sets the children of an element.
The following mdast:
{
type: 'code',
lang: 'js',
data: {
hChildren: [
{
type: 'element',
tagName: 'span',
properties: {className: ['hljs-meta']},
children: [{type: 'text', value: '"use strict"'}]
},
{type: 'text', value: ';'}
]
},
value: '"use strict";'
}
…yields (hast):
{
type: 'element',
tagName: 'pre',
properties: {},
children: [{
type: 'element',
tagName: 'code',
properties: {className: ['language-js']},
children: [
{
type: 'element',
tagName: 'span',
properties: {className: ['hljs-meta']},
children: [{type: 'text', value: '"use strict"'}]
},
{type: 'text', value: ';'}
]
}]
}
👉 Note: the
pre
andlanguage-js
class are normalmdast-util-to-hast
functionality.
Assuming you know how to use (semantic) HTML and CSS, then it should generally be straight forward to style the HTML produced by this plugin. With CSS, you can get creative and style the results as you please.
Some semistandard features, notably GFMs tasklists and footnotes, generate HTML
that be unintuitive, as it matches exactly what GitHub produces for their
website.
There is a project, sindresorhus/github-markdown-css
,
that exposes the stylesheet that GitHub uses for rendered markdown, which might
either be inspirational for more complex features, or can be used as-is to
exactly match how GitHub styles rendered markdown.
The following CSS is needed to make footnotes look a bit like GitHub:
/* Style the footnotes section. */
.footnotes {
font-size: smaller;
color: #8b949e;
border-top: 1px solid #30363d;
}
/* Hide the section label for visual users. */
.sr-only {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
overflow: hidden;
clip: rect(0, 0, 0, 0);
word-wrap: normal;
border: 0;
}
/* Place `[` and `]` around footnote calls. */
[data-footnote-ref]::before {
content: '[';
}
[data-footnote-ref]::after {
content: ']';
}
The following interfaces are added to hast by this utility.
Raw
interface Raw <: Literal {
type: "raw"
}
Raw (Literal) represents a string if raw HTML inside
hast.
Raw nodes are typically ignored but are handled by
hast-util-to-html
and hast-util-raw
.
This package is fully typed with TypeScript.
It also exports Options
, Handler
, Handlers
, H
, and Raw
types.
If you’re working with raw nodes in the hast syntax tree (which are added when
allowDangerousHtml: true
), make sure to import this utility somewhere in your
types, as that registers the new node types in the tree.
/** @typedef {import('mdast-util-to-hast')} */
import {visit} from 'unist-util-visit'
/** @type {import('hast').Root} */
const tree = { /* … */ }
visit(tree, (node) => {
// `node` can now be `raw`.
})
Projects maintained by the unified collective are compatible with all maintained versions of Node.js. As of now, that is Node.js 12.20+, 14.14+, and 16.0+. Our projects sometimes work with older versions, but this is not guaranteed.
Use of mdast-util-to-hast
can open you up to a
cross-site scripting (XSS) attack.
Embedded hast properties (hName
, hProperties
, hChildren
), custom handlers,
and the allowDangerousHtml
option all provide openings.
The following example shows how a script is injected where a benign code block is expected with embedded hast properties:
const code = {type: 'code', value: 'alert(1)'}
code.data = {hName: 'script'}
Yields:
<script>alert(1)</script>
The following example shows how an image is changed to fail loading and therefore run code in a browser.
const image = {type: 'image', url: 'existing.png'}
image.data = {hProperties: {src: 'missing', onError: 'alert(2)'}}
Yields:
<img src="missing" onerror="alert(2)">
The following example shows the default handling of embedded HTML:
# Hello
<script>alert(3)</script>
Yields:
<h1>Hello</h1>
Passing allowDangerousHtml: true
to mdast-util-to-hast
is typically still
not enough to run unsafe code:
<h1>Hello</h1>
<script>alert(3)</script>
If allowDangerousHtml: true
is also given to hast-util-to-html
(or
rehype-stringify
), the unsafe code runs:
<h1>Hello</h1>
<script>alert(3)</script>
Use hast-util-sanitize
to make the hast tree safe.
hast-util-to-mdast
— transform hast to mdasthast-util-to-xast
— transform hast to xasthast-util-sanitize
— sanitize hast nodesSee contributing.md
in syntax-tree/.github
for ways to get
started.
See support.md
for ways to get help.
This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.
FAQs
mdast utility to transform to hast
The npm package mdast-util-to-hast receives a total of 5,162,399 weekly downloads. As such, mdast-util-to-hast popularity was classified as popular.
We found that mdast-util-to-hast demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.
Security News
Research
Socket's threat research team has detected five malicious npm packages targeting Roblox developers, deploying malware to steal credentials and personal data.