Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
github.com/syntax-tree/hast
Hypertext Abstract Syntax Tree format.
hast is a specification for representing HTML (and embedded SVG or MathML) as an abstract syntax tree. It implements the unist spec.
This document may not be released.
See releases for released documents.
The latest released version is 2.4.0
.
This document defines a format for representing hypertext as an abstract syntax tree. Development of hast started in April 2016 for rehype. This specification is written in a Web IDL-like grammar.
hast extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.
hast relates to JavaScript in that it has an ecosystem of utilities for working with compliant syntax trees in JavaScript. However, hast is not limited to JavaScript and can be used in other programming languages.
hast relates to the unified and rehype projects in that hast syntax trees are used throughout their ecosystems.
The reason for introducing a new “virtual” DOM is primarily:
If you are using TypeScript, you can use the hast types by installing them with npm:
npm install @types/hast
Literal
interface Literal <: UnistLiteral {
value: string
}
Literal (UnistLiteral) represents a node in hast containing a value.
Parent
interface Parent <: UnistParent {
children: [Comment | Doctype | Element | Text]
}
Parent (UnistParent) represents a node in hast containing other nodes (said to be children).
Its content is limited to only other hast content.
Comment
interface Comment <: Literal {
type: 'comment'
}
Comment (Literal) represents a Comment ([DOM]).
For example, the following HTML:
<!--Charlie-->
Yields:
{type: 'comment', value: 'Charlie'}
Doctype
interface Doctype <: Node {
type: 'doctype'
}
Doctype (Node) represents a DocumentType ([DOM]).
For example, the following HTML:
<!doctype html>
Yields:
{type: 'doctype'}
Element
interface Element <: Parent {
type: 'element'
tagName: string
properties: Properties?
content: Root?
children: [Comment | Element | Text]
}
Element (Parent) represents an Element ([DOM]).
A tagName
field must be present.
It represents the element’s local name ([DOM]).
The properties
field represents information associated with the element.
The value of the properties
field implements the
Properties interface.
If the tagName
field is 'template'
, a content
field can be present.
The value of the content
field implements the Root interface.
If the tagName
field is 'template'
, the element must be a
leaf.
If the tagName
field is 'noscript'
, its children should
be represented as if scripting is disabled
([HTML]).
For example, the following HTML:
<a href="https://alpha.com" class="bravo" download></a>
Yields:
{
type: 'element',
tagName: 'a',
properties: {
href: 'https://alpha.com',
className: ['bravo'],
download: true
},
children: []
}
Root
interface Root <: Parent {
type: 'root'
}
Root (Parent) represents a document.
Root can be used as the root of a tree, or as
a value of the content
field on a 'template'
Element,
never as a child.
Text
interface Text <: Literal {
type: 'text'
}
Text (Literal) represents a Text ([DOM]).
For example, the following HTML:
<span>Foxtrot</span>
Yields:
{
type: 'element',
tagName: 'span',
properties: {},
children: [{type: 'text', value: 'Foxtrot'}]
}
Properties
interface Properties {}
Properties represents information associated with an element.
Every field must be a PropertyName and every value a PropertyValue.
PropertyName
typedef string PropertyName
Property names are keys on Properties objects and reflect
HTML, SVG, ARIA, XML, XMLNS, or XLink attribute names.
Often, they have the same value as the corresponding attribute (for example,
id
is a property name reflecting the id
attribute name), but there are some
notable differences.
These rules aren’t simple. Use
hastscript
(orproperty-information
directly) to help.
The following rules are used to transform HTML attribute names to property names. These rules are based on how ARIA is reflected in the DOM ([ARIA]), and differs from how some (older) HTML attributes are reflected in the DOM.
stroke-miterlimit
becomes strokeMiterLimit
, autocorrect
becomes autoCorrect
, and allowfullscreen
becomes allowFullScreen
.readOnly
.itemid
become itemId
and bgcolor
becomes bgColor
.Some jargon is seen as one word even though it may not be seen as such by
dictionaries.
For example, nohref
becomes noHref
, playsinline
becomes playsInline
,
and accept-charset
becomes acceptCharset
.
The HTML attributes class
and for
respectively become className
and
htmlFor
in alignment with the DOM.
No other attributes gain different names as properties, other than a change in
casing.
property-information
lists all property names.
The property name rules differ from how HTML is reflected in the DOM for the following attributes:
charoff
becomes charOff
(not chOff
)char
stays char
(does not become ch
)rel
stays rel
(does not become relList
)checked
stays checked
(does not become defaultChecked
)muted
stays muted
(does not become defaultMuted
)value
stays value
(does not become defaultValue
)selected
stays selected
(does not become defaultSelected
)allowfullscreen
becomes allowFullScreen
(not allowFullscreen
)hreflang
becomes hrefLang
, not hreflang
autoplay
becomes autoPlay
, not autoplay
autocomplete
becomes autoComplete
(not autocomplete
)autofocus
becomes autoFocus
, not autofocus
enctype
becomes encType
, not enctype
formenctype
becomes formEncType
(not formEnctype
)vspace
becomes vSpace
, not vspace
hspace
becomes hSpace
, not hspace
lowsrc
becomes lowSrc
, not lowsrc
PropertyValue
typedef any PropertyValue
Property values should reflect the data type determined by their property name.
For example, the HTML <div hidden></div>
has a hidden
attribute, which is
reflected as a hidden
property name set to the property value true
, and
<input minlength="5">
, which has a minlength
attribute, is reflected as a
minLength
property name set to the property value 5
.
In JSON, the value
null
must be treated as if the property was not included. In JavaScript, bothnull
andundefined
must be similarly ignored.
The DOM has strict rules on how it coerces HTML to expected values, whereas hast
is more lenient in how it reflects the source.
Where the DOM treats <div hidden="no"></div>
as having a value of true
and
<img width="yes">
as having a value of 0
, these should be reflected as
'no'
and 'yes'
, respectively, in hast.
The reason for this is to allow plugins and utilities to inspect these non-standard values.
The DOM also specifies comma separated and space separated lists attribute
values.
In hast, these should be treated as ordered lists.
For example, <div class="alpha bravo"></div>
is represented as ['alpha', 'bravo']
.
There’s no special format for the property value of the
style
property name.
See the unist glossary.
See the unist list of utilities for more utilities.
hastscript
— create treeshast-util-assert
— assert nodeshast-util-class-list
— simulate the browser’s classList
API for hast nodeshast-util-classnames
— merge class names togetherhast-util-embedded
— check if a node is an embedded elementhast-util-excerpt
— truncate the tree to a commenthast-util-find-and-replace
— find and replace text in a treehast-util-from-dom
— transform from DOM treehast-util-from-html
— parse from HTMLhast-util-from-parse5
— transform from Parse5’s ASThast-util-from-selector
— parse CSS selectors to nodeshast-util-from-string
— set the plain-text value of a node (textContent
)hast-util-from-text
— set the plain-text value of a node (innerText
)hast-util-from-webparser
— transform Webparser’s AST to hasthast-util-has-property
— check if an element has a certain propertyhast-util-heading
— check if a node is heading contenthast-util-heading-rank
— get the rank (also known as depth or level) of headingshast-util-interactive
— check if a node is interactivehast-util-is-body-ok-link
— check if a link
element is “Body OK”hast-util-is-conditional-comment
— check if node
is a conditional commenthast-util-is-css-link
— check if node
is a CSS link
hast-util-is-css-style
— check if node
is a CSS style
hast-util-is-element
— check if node
is a (certain) elementhast-util-is-event-handler
— check if property
is an event handlerhast-util-is-javascript
— check if node
is a JavaScript script
hast-util-labelable
— check if node
is labelablehast-util-parse-selector
— create an element from a simple CSS selectorhast-util-phrasing
— check if a node is phrasing contenthast-util-raw
— parse a tree againhast-util-reading-time
— estimate the reading timehast-util-sanitize
— sanitize nodeshast-util-script-supporting
— check if node
is script-supporting contenthast-util-select
— querySelector
, querySelectorAll
, and matches
hast-util-sectioning
— check if node
is sectioning contenthast-util-shift-heading
— change heading rank (depth, level)hast-util-table-cell-style
— transform deprecated styling attributes on table cells to inline styleshast-util-to-dom
— transform to a DOM treehast-util-to-estree
— transform to estree (JavaScript AST) JSXhast-util-to-html
— serialize as HTMLhast-util-to-jsx
— transform hast to JSXhast-util-to-jsx-runtime
— transform to preact, react, solid, svelte, vue, etchast-util-to-mdast
— transform to mdast (markdown)hast-util-to-nlcst
— transform to nlcst (natural language)hast-util-to-parse5
— transform to Parse5’s ASThast-util-to-portable-text
— transform to portable texthast-util-to-string
— get the plain-text value of a node (textContent
)hast-util-to-text
— get the plain-text value of a node (innerText
)hast-util-to-xast
— transform to xast (xml)hast-util-transparent
— check if node
is transparent contenthast-util-truncate
— truncate the tree to a certain number of charactershast-util-whitespace
— check if node
is inter-element whitespacea-rel
— List of link types for rel
on a
/ area
aria-attributes
— List of ARIA attributescollapse-white-space
— Replace multiple white-space characters with a single spacecomma-separated-tokens
— Parse/stringify comma separated tokenshtml-tag-names
— List of HTML tag nameshtml-dangerous-encodings
— List of dangerous HTML character encoding labelshtml-encodings
— List of HTML character encoding labelshtml-element-attributes
— Map of HTML attributeshtml-event-attributes
— List of HTML event handler content attributeshtml-void-elements
— List of void HTML tag nameslink-rel
— List of link types for rel
on link
mathml-tag-names
— List of MathML tag namesmeta-name
— List of values for name
on meta
property-information
— Information on HTML propertiesspace-separated-tokens
— Parse/stringify space separated tokenssvg-tag-names
— List of SVG tag namessvg-element-attributes
— Map of SVG attributessvg-event-attributes
— List of SVG event handler content attributesweb-namespaces
— Map of web namespacesAs hast represents HTML, and improper use of HTML can open you up to a
cross-site scripting (XSS) attack, improper use of hast is also unsafe.
Always be careful with user input and use hast-util-santize
to
make the hast tree safe.
See contributing.md
in syntax-tree/.github
for
ways to get started.
See support.md
for ways to get help.
Ideas for new utilities and tools can be posted in syntax-tree/ideas
.
A curated list of awesome syntax-tree, unist, mdast, hast, xast, and nlcst resources can be found in awesome syntax-tree.
This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.
The initial release of this project was authored by @wooorm.
Special thanks to @eush77 for their work, ideas, and incredibly valuable feedback!
Thanks to @andrewburgess, @arobase-che, @arystan-sw, @BarryThePenguin, @brechtcs, @ChristianMurphy, @ChristopherBiscardi, @craftzdog, @cupojoe, @davidtheclark, @derhuerst, @detj, @DxCx, @erquhart, @flurmbo, @Hamms, @Hypercubed, @inklesspen, @jeffal, @jlevy, @Justineo, @lfittl, @kgryte, @kmck, @kthjm, @KyleAMathews, @macklinu, @medfreeman, @Murderlon, @nevik, @nokome, @phiresky, @revolunet, @rhysd, @Rokt33r, @rubys, @s1n, @Sarah-Seo, @sethvincent, @simov, @s1n, @StarpTech, @stefanprobst, @stuff, @subhero24, @tripodsan, @tunnckoCore, @vhf, @voischev, and @zjaml, for contributing to hast and related projects!
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.