
Security News
PodRocket Podcast: Inside the Recent npm Supply Chain Attacks
Socket CEO Feross Aboukhadijeh discusses the recent npm supply chain attacks on PodRocket, covering novel attack vectors and how developers can protect themselves.
markdown-to-markdown-sanitizer
Advanced tools
A robust markdown sanitizer that produces unambiguous and sanitized markdown output.
A robust markdown sanitizer focused on avoiding unexpected image and link URLs in markdown.
Note: This is brand new software and comes without security guarantees. Do your own testing for your own use case.
The sanitizer consumes markdown and produces markdown output. Generally speaking, this is less secure than sanitizing the final rendered output such as the generated HTML. Hence, this package should only be used when the markdown is rendered by a third-party such as GitHub or GitLab.
The primary use-case for this package is to sanitize AI-generated markdown which may have been subject to prompt-injection with the goal of exfiltrating data.
Note: The output of the sanitizer is designed to be unambiguous in terms of markdown parsing. This comes at the trade-off of reduced human readability of the generated markdown. Hence, it is only recommended to use this package when the markdown is meant to be rendered to an output format such as HTML, rather than being directly consumed by humans.
Markdown parsing substantially differs between implementations. Hence the parsed representation that may appear valid with one parser, may not be valid with another.
The way this package tests whether it is doing a good job is:
tests/bypass-attempts/*.md
remark
marked
markdown-it
showdown
commonmark
The current implementation is quite involved. Simpler implementations may be possible, but the interleaved markdown and HTML nature makes this quite hard.
Current steps:
remark
DOMPurify
to sanitize the HTML according to the input rulesturndown
to re-create the markdownThe last step is causing the reduced readability of the output (see trade-off documented above) but it robustly avoids parsing ambiguities Backslash-based escaping has proven to lead to parsing ambiguities between implementations.
This package validates URL prefixes and URL origins. Prefix allow-lists can be circumvented with open redirects, so make sure to make the prefixes are specific enough to avoid such attacks.
E.g. it is more secure to allow https://example.com/images/
than it is to allow all of
https://example.com/
which may contain open redirects.
Additionally, URLs may contain path traversal like /../
. This package does not resolve these.
It is your responsibility that your web server does not allow such traversal.
href
and src
attributes against configurable prefix allow-listsnpm install markdown-to-markdown-sanitizer
import { sanitizeMarkdown } from "markdown-to-markdown-sanitizer";
const options = {
defaultOrigin: "https://example.com",
allowedLinkPrefixes: ["https://example.com", "https://trusted-site.org"],
allowedImagePrefixes: ["https://example.com/images"],
};
const input = `
# My Document
Check out this [safe link](https://example.com/page) and this [unsafe link](https://malicious.com/page).


`;
const sanitized = sanitizeMarkdown(input, options);
console.log(sanitized);
// Output:
// # My Document
//
// Check out this [safe link](https://example.com/page) and this [unsafe link](#).
//
// 
// ![Unsafe image]()
interface SanitizeOptions {
/**
* Default origin for relative URLs (e.g., "https://github.com")
* Required if your content contains relative URLs that should be allowed.
*/
defaultOrigin: string;
/** Allowed URL prefixes for links (href attributes) */
allowedLinkPrefixes?: string[];
/** Allowed URL prefixes for images (src attributes) */
allowedImagePrefixes?: string[];
/**
* Default origin specifically for relative links
* (overrides defaultOrigin if set)
*/
defaultLinkOrigin?: string;
/**
* Default origin specifically for relative images
* (overrides defaultOrigin if set)
*/
defaultImageOrigin?: string;
/**
* Maximum length of URLs to be sanitized.
* Default is 200 characters. 0 means no limit.
*/
urlMaxLength?: number;
/**
* Maximum length of markdown content to process.
* Default is 100000 characters. 0 means no limit.
*/
maxMarkdownLength?: number;
/**
* Activates sanization designed to be safe in commonmark.
* Notably, this is what Github uses and it is needed to avoid GitHub rendering HTML entities.
* The output is less encoded and relies heavier on the markdown parsing to be correct.
* Default is false.
*/
sanitizeForCommonmark?: boolean;
}
The sanitizer uses DOMPurify with GitHub-compatible allow-lists for HTML elements and attributes:
Text Formatting:
strong
, b
, em
, i
, code
, pre
, tt
s
, strike
, del
, ins
, mark
sub
, sup
(subscript and superscript)Structure:
h1
, h2
, h3
, h4
, h5
, h6
(headers)p
, blockquote
, q
(paragraphs and quotes)br
, hr
(line breaks and horizontal rules)Lists:
ul
, ol
, li
(with start
, reversed
, value
attributes)dl
, dt
, dd
(definition lists)Links and Media:
a
(with href
, name
, id
, title
, target
attributes)img
(with src
, alt
, title
, width
, height
, align
attributes)Code and Technical:
pre
, code
, samp
, kbd
, var
Tables:
table
, thead
, tbody
, tfoot
, tr
, td
, th
colspan
, rowspan
, align
, valign
GitHub-Specific:
details
, summary
(with open
attribute)div
, span
(with class
, id
, dir
attributes)ruby
, rt
, rp
(East Asian typography)href
and src
are validated against allow-listsid
and name
attributes are prefixed with user-content-
The sanitizer supports flexible URL prefix matching:
// Protocol-only prefixes
const options1 = {
defaultOrigin: "https://example.com",
allowedLinkPrefixes: ["https:", "http:"], // Allow any HTTPS or HTTP URL
};
// Domain prefixes
const options2 = {
defaultOrigin: "https://example.com",
allowedLinkPrefixes: ["https://example.com", "https://api.example.com"],
};
// Path prefixes
const options3 = {
defaultOrigin: "https://example.com",
allowedLinkPrefixes: ["https://example.com/docs", "https://example.com/api"],
};
Configure maximum markdown length to prevent DoS attacks:
const options = {
defaultOrigin: "https://example.com",
allowedLinkPrefixes: ["https://example.com"],
maxMarkdownLength: 50000, // Limit to 50k characters
urlMaxLength: 500, // Limit URL length to 500 characters
};
// Content over the limit will be truncated before processing
const longContent = "a".repeat(60000);
const result = sanitizeMarkdown(longContent, options);
// Result will be based on truncated content (first 50k chars)
The sanitizer follows a multi-step pipeline to ensure security:
<url>
syntax to [url](url)
and rejects URLs with HTML entitiesdefaultOrigin
- Required for relative URL handlingThe sanitizer aggressively encodes dangerous characters to prevent XSS:
<>&"'[]:()/!\
&{hex};
(e.g., <
becomes &3c;
)The package includes comprehensive test coverage:
Run tests:
# Run all tests
pnpm test
# Run specific test file
pnpm test -- tests/basic-sanitization.test.ts
MIT
FAQs
A robust markdown sanitizer that produces unambiguous and sanitized markdown output.
We found that markdown-to-markdown-sanitizer demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Socket CEO Feross Aboukhadijeh discusses the recent npm supply chain attacks on PodRocket, covering novel attack vectors and how developers can protect themselves.
Security News
Maintainers back GitHub’s npm security overhaul but raise concerns about CI/CD workflows, enterprise support, and token management.
Product
Socket Firewall is a free tool that blocks malicious packages at install time, giving developers proactive protection against rising supply chain attacks.