Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement →

@mdream/js

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@mdream/js

JavaScript HTML-to-Markdown engine for mdream. Escape hatch for hooks and edge runtimes.

Source

npm

Version: 1.2.1

Version published: 2 weeks ago

Maintainers: 1

Created: 3 months ago

Source

@mdream/js

Installation

# pnpm
pnpm add @mdream/js@beta

# npm
npm install @mdream/js@beta

# yarn
yarn add @mdream/js@beta

Entry Points

Import	Description
`@mdream/js`	Core `htmlToMarkdown` and `streamHtmlToMarkdown` APIs
`@mdream/js/plugins`	Plugin utilities: `createPlugin`, `extractionPlugin`, `extractionCollectorPlugin`, `filterPlugin`, `frontmatterPlugin`, `isolateMainPlugin`, `tailwindPlugin`
`@mdream/js/preset/minimal`	`withMinimalPreset` for declarative config combining frontmatter, isolateMain, tailwind, and filter plugins
`@mdream/js/negotiate`	HTTP content negotiation: `shouldServeMarkdown`, `parseAcceptHeader`
`@mdream/js/parse`	Low-level HTML parser: `parseHtml`, `parseHtmlStream`
`@mdream/js/splitter`	Single-pass markdown splitter: `htmlToMarkdownSplitChunks`, `htmlToMarkdownSplitChunksStream`
`@mdream/js/llms-txt`	llms.txt artifact generation: `generateLlmsTxtArtifacts`, `createLlmsTxtStream`

API Reference

`htmlToMarkdown(html, options?)`

Converts an HTML string to Markdown synchronously.

import { htmlToMarkdown } from '@mdream/js'

const md = htmlToMarkdown('<h1>Hello</h1><p>World</p>')
// # Hello\n\nWorld

Parameters:

Parameter	Type	Description
`html`	`string`	The HTML string to convert
`options`	`Partial<MdreamOptions>`	Optional configuration (see MdreamOptions)

Returns: string

`streamHtmlToMarkdown(htmlStream, options?)`

Converts an HTML stream to Markdown incrementally. Useful for large documents or streaming HTTP responses.

import { streamHtmlToMarkdown } from '@mdream/js'

const stream = streamHtmlToMarkdown(response.body, {
  origin: 'https://example.com',
})

for await (const chunk of stream) {
  process.stdout.write(chunk)
}

Parameters:

Parameter	Type	Description
`htmlStream`	`ReadableStream<Uint8Array \| string> \| null`	A web `ReadableStream` of HTML content
`options`	`Partial<MdreamOptions>`	Optional configuration (see MdreamOptions)

Returns: AsyncIterable<string>

Options

`MdreamOptions`

Option	Type	Default	Description
`origin`	`string`	`undefined`	Origin URL for resolving relative image paths and internal links
`plugins`	`BuiltinPlugins`	`undefined`	Declarative built-in plugin configuration (see BuiltinPlugins)
`clean`	`boolean \| CleanOptions`	`undefined`	Post-processing cleanup. Pass `true` for all cleanup rules or an object for specific features (see CleanOptions). Sync API only for `fragments`.
`hooks`	`TransformPlugin[]`	`undefined`	Imperative hook-based transform plugins for custom behavior (see Plugins)

`BuiltinPlugins`

Declarative configuration for built-in plugins. Works with both the JavaScript and Rust engines.

Option	Type	Default	Description
`frontmatter`	`boolean \| ((fm: Record<string, string>) => void) \| FrontmatterConfig`	`undefined`	Extract metadata from HTML `<head>` into YAML frontmatter. Pass `true` for defaults, a callback to receive structured data, or a config object.
`isolateMain`	`boolean`	`undefined`	Isolate main content area. Prioritizes `<main>` elements, then falls back to header-to-footer heuristic.
`tailwind`	`boolean`	`undefined`	Convert Tailwind utility classes (bold, italic, hidden, etc.) to semantic Markdown formatting.
`filter`	`{ include?, exclude?, processChildren? }`	`undefined`	Filter elements by CSS selectors, tag names, or TAG_* constants (see Filter Plugin).
`extraction`	`Record<string, (element: ExtractedElement) => void>`	`undefined`	Extract elements matching CSS selectors during conversion. Each key is a CSS selector; the handler is called for every match.
`tagOverrides`	`Record<string, TagOverride \| string>`	`undefined`	Declarative tag overrides. String values act as aliases (e.g., `{ "x-heading": "h2" }`). Object values override specific handler properties.

`FrontmatterConfig`

Option	Type	Default	Description
`additionalFields`	`Record<string, string>`	`undefined`	Extra key-value pairs to inject into the frontmatter
`metaFields`	`string[]`	`['description', 'keywords', 'author', 'date', 'og:title', 'og:description', 'twitter:title', 'twitter:description']`	Meta tag names to extract beyond the standard set
`onExtract`	`(frontmatter: Record<string, string>) => void`	`undefined`	Callback to receive structured frontmatter data after conversion

`TagOverride`

Option	Type	Description
`enter`	`string`	Markdown string to emit when entering the tag
`exit`	`string`	Markdown string to emit when exiting the tag
`spacing`	`[number, number]`	Newlines to add `[before, after]` the tag
`isInline`	`boolean`	Whether the tag is inline (no block-level spacing)
`isSelfClosing`	`boolean`	Whether the tag is self-closing
`collapsesInnerWhiteSpace`	`boolean`	Whether inner whitespace should be collapsed

`CleanOptions`

Post-processing cleanup options. Pass true to clean to enable all of these.

Option	Type	Default	Description
`urls`	`boolean`	`false`	Strip tracking query parameters (`utm_*`, `fbclid`, `gclid`, etc.) from URLs
`fragments`	`boolean`	`false`	Strip fragment-only links that do not match any heading slug in the output
`emptyLinks`	`boolean`	`false`	Strip links with meaningless hrefs (`#`, `javascript:void(0)`, `data:`, `vbscript:`) and replace with plain text
`blankLines`	`boolean`	`false`	Collapse 3+ consecutive blank lines to 2
`redundantLinks`	`boolean`	`false`	Strip links where text equals URL: `[https://x.com](https://x.com)` becomes `https://x.com`
`selfLinkHeadings`	`boolean`	`false`	Strip self-referencing heading anchors: `## [Title](#title)` becomes `## Title`
`emptyImages`	`boolean`	`false`	Strip images with no alt text (decorative images, tracking pixels)
`emptyLinkText`	`boolean`	`false`	Drop links that produce no visible text: `[](url)` is removed entirely

When clean: true is passed, all options except urls and blankLines are enabled.

Plugins

`createPlugin(plugin)`

Factory function for creating type-safe transform plugins. All hooks are optional.

import { createPlugin } from '@mdream/js/plugins'

const myPlugin = createPlugin({
  beforeNodeProcess(event, state) {
    // Return { skip: true } to skip this node entirely
  },
  onNodeEnter(element, state) {
    // Return a string to inject markdown at this position
  },
  onNodeExit(element, state) {
    // Return a string to inject markdown at this position
  },
  processAttributes(element, state) {
    // Inspect or modify element attributes
  },
  processTextNode(textNode, state) {
    // Return { content: string, skip: boolean } to transform text
    // Return undefined for no transformation
  },
})

Plugin Hook Reference

Hook	Parameters	Return Type	Description
`beforeNodeProcess`	`(event: NodeEvent, state)`	`{ skip: boolean } \| void`	Called before any node processing. Return `{ skip: true }` to skip the node.
`onNodeEnter`	`(element: ElementNode, state)`	`string \| void`	Called when entering an element. Return a string to inject markdown.
`onNodeExit`	`(element: ElementNode, state)`	`string \| void`	Called when exiting an element. Return a string to inject markdown.
`processAttributes`	`(element: ElementNode, state)`	`void`	Called to inspect or modify element attributes.
`processTextNode`	`(textNode: TextNode, state)`	`{ content: string, skip: boolean } \| void`	Called for each text node. Return an object to transform text or skip it.

`filterPlugin(options)`

Filters elements by CSS selectors, tag names, or TAG_* constants.

import { TAG_FOOTER, TAG_NAV } from '@mdream/js'
import { filterPlugin } from '@mdream/js/plugins'

// Exclude navigation and footer
const plugin = filterPlugin({
  exclude: [TAG_NAV, TAG_FOOTER, '.sidebar', '#ads'],
})

// Include only specific elements
const plugin2 = filterPlugin({
  include: ['article', 'main'],
  processChildren: true, // default: true
})

Option	Type	Default	Description
`include`	`(string \| number)[]`	`[]`	CSS selectors, tag names, or TAG_* constants for elements to include (all others excluded)
`exclude`	`(string \| number)[]`	`[]`	CSS selectors, tag names, or TAG_* constants for elements to exclude
`processChildren`	`boolean`	`true`	Whether to also process children of matching elements

`frontmatterPlugin(options?)`

Extracts metadata from HTML <head> into YAML frontmatter. Collects <title> and <meta> tags.

import { frontmatterPlugin } from '@mdream/js/plugins'

const plugin = frontmatterPlugin({
  additionalFields: { source: 'crawler' },
  metaFields: ['robots', 'viewport'],
})

Default meta fields extracted: description, keywords, author, date, og:title, og:description, twitter:title, twitter:description.

`isolateMainPlugin()`

Isolates the main content area of a page using a priority-based strategy:

If an explicit <main> element exists (within 5 depth levels), use its content exclusively.
Otherwise, find content between the first heading (h1-h6) that is not inside a <header> tag and the first <footer>.
The <head> section is always passed through for other plugins (e.g., frontmatter extraction).

import { isolateMainPlugin } from '@mdream/js/plugins'

const plugin = isolateMainPlugin()

`tailwindPlugin()`

Converts Tailwind utility classes to Markdown formatting. Supports mobile-first responsive breakpoints (sm:, md:, lg:, xl:, 2xl:).

Tailwind Class	Markdown Output
`font-bold`, `font-semibold`, `font-black`, `font-extrabold`, `font-medium`	`text`
`italic`, `font-italic`	`text`
`line-through`	`~~text~~`
`hidden`, `invisible`	Element skipped
`absolute`, `fixed`, `sticky`	Element skipped

import { tailwindPlugin } from '@mdream/js/plugins'

const plugin = tailwindPlugin()

`extractionPlugin(selectors)` (Deprecated)

Deprecated. Use plugins.extraction config for declarative extraction that works with both JS and Rust engines.

Extracts elements matching CSS selectors during conversion. Callbacks receive matching elements with their accumulated text content.

import { extractionPlugin } from '@mdream/js/plugins'

const plugin = extractionPlugin({
  'h2': (element, state) => {
    console.log('Heading:', element.textContent)
  },
  'img[alt]': (element, state) => {
    console.log('Image:', element.attributes.src)
  },
})

`extractionCollectorPlugin(selectors)`

Internal extraction collector for the plugins.extraction config. Collects results during processing and calls callbacks post-conversion to match Rust engine behavior.

Presets

`withMinimalPreset(options?)`

Returns a declarative config combining frontmatter, isolateMain, tailwind, and filter plugins. Also enables clean: true by default. You can override any option.

import { htmlToMarkdown } from '@mdream/js'
import { withMinimalPreset } from '@mdream/js/preset/minimal'

const md = htmlToMarkdown(html, withMinimalPreset({
  origin: 'https://example.com',
}))

The minimal preset excludes these elements by default: <form>, <fieldset>, <object>, <embed>, <footer>, <aside>, <iframe>, <input>, <textarea>, <select>, <button>, <nav>.

You can override or extend the plugin config:

const md = htmlToMarkdown(html, withMinimalPreset({
  origin: 'https://example.com',
  clean: { urls: true, fragments: true },
  plugins: {
    frontmatter: {
      additionalFields: { source: 'my-crawler' },
    },
  },
}))

Content Negotiation

`shouldServeMarkdown(acceptHeader?, secFetchDest?)`

Determines if a client prefers Markdown over HTML using HTTP content negotiation.

Returns true when text/markdown or text/plain has higher quality than text/html in the Accept header.
If qualities are equal, earlier position wins.
Bare wildcards (*/*) do not trigger Markdown (prevents breaking OG crawlers).
sec-fetch-dest: document always returns false (browser navigation).

import { shouldServeMarkdown } from '@mdream/js/negotiate'

if (shouldServeMarkdown(request.headers.accept, request.headers['sec-fetch-dest'])) {
  return new Response(markdown, {
    headers: { 'content-type': 'text/markdown' },
  })
}

Parameters:

Parameter	Type	Description
`acceptHeader`	`string \| undefined`	The HTTP `Accept` header value
`secFetchDest`	`string \| undefined`	The `Sec-Fetch-Dest` header value

Returns: boolean

`parseAcceptHeader(accept)`

Parses an HTTP Accept header into an ordered list of media types with quality values.

import { parseAcceptHeader } from '@mdream/js/negotiate'

const entries = parseAcceptHeader('text/markdown;q=0.9, text/html')
// [{ type: 'text/markdown', q: 0.9, position: 0 }, { type: 'text/html', q: 1, position: 1 }]

Returns: Array<{ type: string, q: number, position: number }>

Markdown Splitter

`htmlToMarkdownSplitChunks(html, options?)`

Converts HTML to Markdown and returns an array of chunks. Compatible with LangChain's Document structure.

import { htmlToMarkdownSplitChunks } from '@mdream/js/splitter'

const chunks = htmlToMarkdownSplitChunks(html, {
  chunkSize: 1000,
  chunkOverlap: 200,
  origin: 'https://example.com',
})

for (const chunk of chunks) {
  console.log(chunk.content)
  console.log(chunk.metadata.headers) // e.g., { h2: 'Section Title' }
  console.log(chunk.metadata.code) // e.g., 'typescript'
  console.log(chunk.metadata.loc) // e.g., { lines: { from: 1, to: 10 } }
}

Returns: MarkdownChunk[]

`htmlToMarkdownSplitChunksStream(html, options?)`

Generator version that yields chunks during processing for better memory efficiency.

import { htmlToMarkdownSplitChunksStream } from '@mdream/js/splitter'

for (const chunk of htmlToMarkdownSplitChunksStream(html, options)) {
  process.stdout.write(chunk.content)
}

Returns: Generator<MarkdownChunk, void, undefined>

`SplitterOptions`

Extends EngineOptions with chunking-specific settings.

Option	Type	Default	Description
`headersToSplitOn`	`number[]`	`[TAG_H2, TAG_H3, TAG_H4, TAG_H5, TAG_H6]`	TAG_* header constants to split on
`returnEachLine`	`boolean`	`false`	Return each non-empty line as an individual chunk
`stripHeaders`	`boolean`	`true`	Strip header lines from chunk content
`chunkSize`	`number`	`1000`	Maximum chunk size (measured by `lengthFunction`)
`chunkOverlap`	`number`	`200`	Overlap between chunks for context preservation. Must be less than `chunkSize`.
`lengthFunction`	`(text: string) => number`	`(text) => text.length`	Function to measure chunk length. Replace with a token counter for LLM applications.
`keepSeparator`	`boolean`	`false`	Keep separators in the split chunks

`MarkdownChunk`

Field	Type	Description
`content`	`string`	The markdown content of the chunk
`metadata.headers`	`Record<string, string> \| undefined`	Header hierarchy at this chunk position (e.g., `{ h2: 'API', h3: 'Methods' }`)
`metadata.code`	`string \| undefined`	Code block language if chunk contains code
`metadata.loc`	`{ lines: { from: number, to: number } } \| undefined`	Line number range in original document

llms.txt Generation

`generateLlmsTxtArtifacts(options)`

Generates llms.txt content, optionally including llms-full.txt and individual markdown files.

import { generateLlmsTxtArtifacts } from '@mdream/js/llms-txt'

const result = await generateLlmsTxtArtifacts({
  files: processedPages,
  siteName: 'My Site',
  description: 'A description of my site',
  origin: 'https://example.com',
  generateFull: true,
  generateMarkdown: true,
  sections: [
    {
      title: 'Documentation',
      description: 'API reference and guides',
      links: [
        { title: 'Getting Started', href: '/docs/start', description: 'Quick start guide' },
      ],
    },
  ],
  notes: 'Generated by mdream',
})

// result.llmsTxt         -- index file with links to pages
// result.llmsFullTxt     -- single file with all page content (if generateFull: true)
// result.markdownFiles   -- array of { path, content } (if generateMarkdown: true)
// result.processedFiles  -- the input files passed through

`LlmsTxtArtifactsOptions`

Option	Type	Default	Description
`files`	`ProcessedFile[]`	(required)	Array of processed page files
`siteName`	`string`	`'Site'`	Site name for the header
`description`	`string`	`undefined`	Site description (rendered as blockquote)
`origin`	`string`	`''`	Origin URL to prepend to relative URLs
`generateFull`	`boolean`	`false`	Generate llms-full.txt with complete page content
`generateMarkdown`	`boolean`	`false`	Generate individual markdown files under `md/` directory
`outputDir`	`string`	`undefined`	Output directory (used to compute relative file paths)
`sections`	`LlmsTxtSection[]`	`undefined`	Custom sections to write before the Pages section
`notes`	`string \| string[]`	`undefined`	Notes to append at the end

`ProcessedFile`

Field	Type	Description
`title`	`string`	Page title
`content`	`string`	Markdown content of the page
`url`	`string`	URL path of the page
`filePath`	`string \| undefined`	Optional file path on disk
`metadata`	`{ title?, description?, keywords?, author? }`	Optional metadata extracted from the page

`LlmsTxtSection`

Field	Type	Description
`title`	`string`	Section heading
`description`	`string \| string[]`	Section description (can be multiple paragraphs)
`links`	`LlmsTxtLink[]`	Links in the section

`LlmsTxtLink`

Field	Type	Description
`title`	`string`	Link title
`href`	`string`	Link URL
`description`	`string \| undefined`	Optional description shown after the link

`createLlmsTxtStream(options)`

Creates a WritableStream<ProcessedFile> that generates llms.txt artifacts incrementally. Writes files to disk as pages are streamed in, never keeping full content in memory. Pages in llms.txt are sorted by URL path hierarchy on close.

import { createLlmsTxtStream } from '@mdream/js/llms-txt'

const stream = createLlmsTxtStream({
  outputDir: './output',
  siteName: 'My Site',
  origin: 'https://example.com',
  generateFull: true,
})

const writer = stream.getWriter()
await writer.write({ title: 'Home', content: '# Home\nWelcome', url: '/' })
await writer.write({ title: 'About', content: '# About\nInfo', url: '/about' })
await writer.close()
// Writes llms.txt and llms-full.txt to ./output/

Returns: WritableStream<ProcessedFile>

Low-Level Parser

`parseHtml(html, options?)`

Parses HTML into a list of DOM events. Returns the events and any remaining unparsed HTML.

import { parseHtml } from '@mdream/js/parse'

const { events, remainingHtml } = parseHtml('<p>Hello</p>')

Returns: { events: NodeEvent[], remainingHtml: string }

`parseHtmlStream(html, state, onEvent)`

Streaming parser that calls onEvent for each DOM event. Returns any remaining unparsed HTML (useful for processing partial chunks).

import { parseHtmlStream } from '@mdream/js/parse'

const state = { depthMap: new Uint8Array(1024), depth: 0 }
const remaining = parseHtmlStream(htmlChunk, state, (event) => {
  // Process each event
})

Returns: string (remaining unparsed HTML)

CLI

Reads HTML from stdin and outputs Markdown to stdout.

# Basic conversion
curl -s https://example.com | npx @mdream/js

# With origin URL for resolving relative paths
curl -s https://example.com | npx @mdream/js --origin https://example.com

# With minimal preset
curl -s https://example.com | npx @mdream/js --origin https://example.com --preset minimal

CLI Options

Flag	Description
`--origin <url>`	Origin URL for resolving relative image paths and links
`--preset <preset>`	Conversion preset. Currently supports: `minimal`
`-v, --version`	Show version number
`-h, --help`	Show help

Usage Examples

Custom Tag Overrides

Map custom HTML elements to standard Markdown behavior:

import { htmlToMarkdown } from '@mdream/js'

const md = htmlToMarkdown('<x-heading>Title</x-heading>', {
  plugins: {
    tagOverrides: {
      // String alias: make <x-heading> behave like <h2>
      'x-heading': 'h2',

      // Object override: custom enter/exit strings
      'callout': {
        enter: '> **Note:** ',
        exit: '\n',
        spacing: [2, 2],
      },
    },
  },
})

Declarative Extraction

Extract data from elements during conversion:

import { htmlToMarkdown } from '@mdream/js'

const images: { src: string, alt: string }[] = []

const md = htmlToMarkdown(html, {
  plugins: {
    extraction: {
      'img[alt]': (element) => {
        images.push({
          src: element.attributes.src,
          alt: element.attributes.alt,
        })
      },
    },
  },
})

console.log('Found images:', images)

Frontmatter with Callback

Receive structured frontmatter data for further processing:

import { htmlToMarkdown } from '@mdream/js'

let metadata: Record<string, string> = {}

const md = htmlToMarkdown(html, {
  plugins: {
    frontmatter: {
      additionalFields: { source: 'crawler' },
      metaFields: ['robots'],
      onExtract: (fm) => {
        metadata = fm
      },
    },
  },
})

console.log('Title:', metadata.title)
console.log('Description:', metadata.description)

Custom Hooks for Content Filtering

import { htmlToMarkdown } from '@mdream/js'
import { createPlugin } from '@mdream/js/plugins'

const md = htmlToMarkdown(html, {
  hooks: [
    createPlugin({
      beforeNodeProcess(event) {
        const { node } = event
        if (node.type === 1 /* ELEMENT_NODE */) {
          const el = node as any
          // Skip ads and promotional content
          if (el.attributes?.class?.includes('ad')
            || el.attributes?.id?.includes('promo')) {
            return { skip: true }
          }
        }
      },
      processTextNode(textNode) {
        // Uppercase all TODO markers
        if (textNode.value.includes('TODO')) {
          return { content: textNode.value.toUpperCase(), skip: false }
        }
      },
    }),
  ],
})

Token-Based Chunking for LLMs

import { htmlToMarkdownSplitChunks } from '@mdream/js/splitter'
import { encoding_for_model } from 'tiktoken'

const enc = encoding_for_model('gpt-4')

const chunks = htmlToMarkdownSplitChunks(html, {
  chunkSize: 512,
  chunkOverlap: 64,
  lengthFunction: text => enc.encode(text).length,
  origin: 'https://example.com',
})

Exported Types

import type {
  BuiltinPlugins,
  CleanOptions,
  ElementNode,
  EngineOptions,
  ExtractedElement,
  FrontmatterConfig,
  MarkdownChunk,
  MdreamOptions,
  Node,
  NodeEvent,
  PluginContext,
  SplitterOptions,
  TagOverride,
  TextNode,
  TransformPlugin,
} from '@mdream/js'

Exported Constants

import {
  ELEMENT_NODE, // Node type for HTML elements (1)
  NodeEventEnter, // Event type for entering a node (0)
  NodeEventExit, // Event type for exiting a node (1)
  TAG_H1, // Tag ID constant for <h1>
  TAG_H2, // Tag ID constant for <h2>
  TAG_H3, // Tag ID constant for <h3>
  TAG_H4, // Tag ID constant for <h4>
  TAG_H5, // Tag ID constant for <h5>
  TAG_H6, // Tag ID constant for <h6>
  TEXT_NODE, // Node type for text content (3)
} from '@mdream/js'

FAQs

What is @mdream/js?

Is @mdream/js well maintained?

Package last updated on 20 May 2026

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@mdream/js

@mdream/js

Installation

Entry Points

API Reference

htmlToMarkdown(html, options?)

streamHtmlToMarkdown(htmlStream, options?)

Options

MdreamOptions

BuiltinPlugins

FrontmatterConfig

TagOverride

CleanOptions

Plugins

createPlugin(plugin)

Plugin Hook Reference

filterPlugin(options)

frontmatterPlugin(options?)

isolateMainPlugin()

tailwindPlugin()

extractionPlugin(selectors) (Deprecated)

extractionCollectorPlugin(selectors)

Presets

withMinimalPreset(options?)

Content Negotiation

shouldServeMarkdown(acceptHeader?, secFetchDest?)

parseAcceptHeader(accept)

Markdown Splitter

htmlToMarkdownSplitChunks(html, options?)

htmlToMarkdownSplitChunksStream(html, options?)

SplitterOptions

MarkdownChunk

llms.txt Generation

generateLlmsTxtArtifacts(options)

LlmsTxtArtifactsOptions

ProcessedFile

LlmsTxtSection

LlmsTxtLink

createLlmsTxtStream(options)

Low-Level Parser

parseHtml(html, options?)

parseHtmlStream(html, state, onEvent)

CLI

CLI Options

Usage Examples

Custom Tag Overrides

Declarative Extraction

Frontmatter with Callback

Custom Hooks for Content Filtering

Token-Based Chunking for LLMs

Exported Types

Exported Constants

Related posts

pnpm 11.5 Adds Support for Recognizing npm Staged Publishes

Federal Audit Finds NIST Wasted Funds With No Plan to Clear NVD Backlog

`htmlToMarkdown(html, options?)`

`streamHtmlToMarkdown(htmlStream, options?)`

`MdreamOptions`

`BuiltinPlugins`

`FrontmatterConfig`

`TagOverride`

`CleanOptions`

`createPlugin(plugin)`

`filterPlugin(options)`

`frontmatterPlugin(options?)`

`isolateMainPlugin()`

`tailwindPlugin()`

`extractionPlugin(selectors)` (Deprecated)

`extractionCollectorPlugin(selectors)`

`withMinimalPreset(options?)`

`shouldServeMarkdown(acceptHeader?, secFetchDest?)`

`parseAcceptHeader(accept)`

`htmlToMarkdownSplitChunks(html, options?)`

`htmlToMarkdownSplitChunksStream(html, options?)`

`SplitterOptions`

`MarkdownChunk`

`generateLlmsTxtArtifacts(options)`

`LlmsTxtArtifactsOptions`

`ProcessedFile`

`LlmsTxtSection`

`LlmsTxtLink`

`createLlmsTxtStream(options)`

`parseHtml(html, options?)`

`parseHtmlStream(html, state, onEvent)`