
Company News
Andrew Becherer Joins Socket as Chief Information Security Officer
Socket’s first CISO brings deep experience securing high-growth SaaS companies as open source supply chain threats accelerate.
Ultra-performant HTML to Markdown Convertor Optimized for LLMs and llm.txt artifacts.
Ultra-performant HTML to Markdown Convertor Optimized for LLMs. Generate llms.txt artifacts using CLI, GitHub Actions, Vite Plugin and more.
|
Made possible by my Sponsor Program 💖 Follow me @harlan_zw 🐦 • Join Discord for help |
Traditional HTML to Markdown converters were not built for LLMs or humans. They tend to be slow and bloated and produce output that's poorly suited for LLMs token usage or for human readability.
Other LLM specific convertors focus on supporting all document formats, resulting in larger bundles and lower quality Markdown output.
Mdream produces high-quality Markdown for LLMs efficiently with no core dependencies. It includes a plugin system to customize the conversion process, allowing you to parse, extract, transform, and filter as needed.
pnpm add mdream
Mdream provides a CLI designed to work exclusively with Unix pipes, providing flexibility and freedom to integrate with other tools.
Pipe Site to Markdown
Fetches the Markdown Wikipedia page and converts it to Markdown preserving the original links and images.
curl -s https://en.wikipedia.org/wiki/Markdown \
| npx mdream --origin https://en.wikipedia.org --preset minimal \
| tee streaming.md
Tip: The --origin flag will fix relative image and link paths
Local File to Markdown
Converts a local HTML file to a Markdown file, using tee to write the output to a file and display it in the terminal.
cat index.html \
| npx mdream --preset minimal \
| tee streaming.md
--origin <url>: Base URL for resolving relative links and images--preset <preset>: Conversion presets: minimal--help: Display help information--version: Display version informationMdream provides two main functions for working with HTML:
htmlToMarkdown: Useful if you already have the entire HTML payload you want to convert.streamHtmlToMarkdown: Best practice if you are fetching or reading from a local file.For browser environments, you can use mdream directly via CDN without any build step:
<!DOCTYPE html>
<html>
<head>
<script src="https://unpkg.com/mdream/dist/iife.js"></script>
</head>
<body>
<script>
// Convert HTML to Markdown in the browser
const html = '<h1>Hello World</h1><p>This is a paragraph.</p>'
const markdown = window.mdream.htmlToMarkdown(html)
console.log(markdown) // # Hello World\n\nThis is a paragraph.
</script>
</body>
</html>
CDN Options:
https://unpkg.com/mdream/dist/iife.jshttps://cdn.jsdelivr.net/npm/mdream/dist/iife.jsThe browser build includes the core htmlToMarkdown function and is optimized for size (44kB uncompressed, 10.3kB gzipped).
Convert existing HTML
import { htmlToMarkdown } from 'mdream'
// Simple conversion
const markdown = htmlToMarkdown('<h1>Hello World</h1>')
console.log(markdown) // # Hello World
Convert from Fetch
import { streamHtmlToMarkdown } from 'mdream'
// Using fetch with streaming
const response = await fetch('https://example.com')
const htmlStream = response.body
const markdownGenerator = streamHtmlToMarkdown(htmlStream, {
origin: 'https://example.com'
})
// Process chunks as they arrive
for await (const chunk of markdownGenerator) {
console.log(chunk)
}
Pure HTML Parser
If you only need to parse HTML into a DOM-like AST without converting to Markdown, use parseHtml:
import { parseHtml } from 'mdream'
const html = '<div><h1>Title</h1><p>Content</p></div>'
const { events, remainingHtml } = parseHtml(html)
// Process the parsed events
events.forEach((event) => {
if (event.type === 'enter' && event.node.type === 'element') {
console.log('Entering element:', event.node.tagName)
}
})
The parseHtml function provides:
Presets are pre-configured combinations of plugins for common use cases.
The minimal preset optimizes for token reduction and cleaner output by removing non-essential content:
import { withMinimalPreset } from 'mdream/preset/minimal'
const options = withMinimalPreset({
origin: 'https://example.com'
})
Plugins included:
isolateMainPlugin() - Extracts main content areafrontmatterPlugin() - Generates YAML frontmatter from meta tagstailwindPlugin() - Converts Tailwind classes to MarkdownfilterPlugin() - Excludes forms, navigation, buttons, footers, and other non-content elementsCLI Usage:
curl -s https://example.com | npx mdream --preset minimal --origin https://example.com
The plugin system allows you to customize HTML to Markdown conversion by hooking into the processing pipeline. Plugins can filter content, extract data, transform nodes, or add custom behavior.
Mdream includes several built-in plugins that can be used individually or combined:
extractionPlugin: Extract specific elements using CSS selectors for data analysisfilterPlugin: Include or exclude elements based on CSS selectors or tag IDsfrontmatterPlugin: Generate YAML frontmatter from HTML head elements (title, meta tags)isolateMainPlugin: Isolate main content using <main> elements or header-to-footer boundariestailwindPlugin: Convert Tailwind CSS classes to Markdown formatting (bold, italic, etc.)readabilityPlugin: Content scoring and extraction (experimental)import { filterPlugin, frontmatterPlugin, isolateMainPlugin } from 'mdream/plugins'
const markdown = htmlToMarkdown(html, {
plugins: [
isolateMainPlugin(),
frontmatterPlugin(),
filterPlugin({ exclude: ['nav', '.sidebar', '#footer'] })
]
})
beforeNodeProcess: Called before any node processing, can skip nodesonNodeEnter: Called when entering an element nodeonNodeExit: Called when exiting an element nodeprocessTextNode: Called for each text nodeprocessAttributes: Called to process element attributesUse createPlugin() to create a plugin with type safety:
import type { ElementNode, TextNode } from 'mdream'
import { htmlToMarkdown } from 'mdream'
import { createPlugin } from 'mdream/plugins'
const myPlugin = createPlugin({
onNodeEnter(node: ElementNode) {
if (node.name === 'h1') {
return '🔥 '
}
},
processTextNode(textNode: TextNode) {
// Transform text content
if (textNode.parent?.attributes?.id === 'highlight') {
return {
content: `**${textNode.value}**`,
skip: false
}
}
}
})
// Use the plugin
const html: string = '<div id="highlight">Important text</div>'
const markdown: string = htmlToMarkdown(html, { plugins: [myPlugin] })
import type { ElementNode, NodeEvent } from 'mdream'
import { ELEMENT_NODE } from 'mdream'
import { createPlugin } from 'mdream/plugins'
const adBlockPlugin = createPlugin({
beforeNodeProcess(event: NodeEvent) {
const { node } = event
if (node.type === ELEMENT_NODE && node.name === 'div') {
const element = node as ElementNode
// Skip ads and promotional content
if (element.attributes?.class?.includes('ad')
|| element.attributes?.id?.includes('promo')) {
return { skip: true }
}
}
}
})
Extract specific elements and their content during HTML processing for data analysis or content discovery:
import { extractionPlugin, htmlToMarkdown } from 'mdream'
const html: string = `
<article>
<h2>Getting Started</h2>
<p>This is a tutorial about web scraping.</p>
<img src="/hero.jpg" alt="Hero image" />
</article>
`
// Extract elements using CSS selectors
const plugin = extractionPlugin({
'h2': (element: ExtractedElement, state: MdreamRuntimeState) => {
console.log('Heading:', element.textContent) // "Getting Started"
console.log('Depth:', state.depth) // Current nesting depth
},
'img[alt]': (element: ExtractedElement, state: MdreamRuntimeState) => {
console.log('Image:', element.attributes.src, element.attributes.alt)
// "Image: /hero.jpg Hero image"
console.log('Context:', state.options) // Access to conversion options
}
})
htmlToMarkdown(html, { plugins: [plugin] })
The extraction plugin provides memory-efficient element extraction with full text content and attributes, perfect for SEO analysis, content discovery, and data mining.
Licensed under the MIT license.
FAQs
Ultra-performant HTML to Markdown Convertor Optimized for LLMs and llm.txt artifacts.
The npm package mdream receives a total of 4,663 weekly downloads. As such, mdream popularity was classified as popular.
We found that mdream demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket’s first CISO brings deep experience securing high-growth SaaS companies as open source supply chain threats accelerate.

Company News
Replit is integrating Socket Firewall into its AI-powered development experience to help protect builders from malicious open source packages.

Security News
npm confirmed a tooling bug incorrectly marked several one-character packages as security holders and said it was working on a rollback.