Security News
38% of CISOs Fear They’re Not Moving Fast Enough on AI
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
node-html-markdown
Advanced tools
Fast HTML to markdown cross-compiler, compatible with both node and the browser
The node-html-markdown package is a tool for converting HTML content into Markdown format. It is useful for developers who need to transform HTML documents into a more readable and lightweight Markdown format, which is often used for documentation, blogging, and other text-based content.
Basic HTML to Markdown Conversion
This feature allows you to convert basic HTML elements like headings and paragraphs into Markdown. The code sample demonstrates how to use the NodeHtmlMarkdown class to translate a simple HTML string into Markdown.
const { NodeHtmlMarkdown } = require('node-html-markdown');
const html = '<h1>Hello World</h1><p>This is a paragraph.</p>';
const markdown = NodeHtmlMarkdown.translate(html);
console.log(markdown);
Customizing Conversion Options
This feature allows you to customize the conversion process by providing options. In the code sample, the strong delimiter is customized to use double underscores instead of the default asterisks.
const { NodeHtmlMarkdown } = require('node-html-markdown');
const html = '<h1>Hello World</h1><p>This is a paragraph.</p>';
const options = { strongDelimiter: '__' };
const markdown = NodeHtmlMarkdown.translate(html, options);
console.log(markdown);
Handling Complex HTML Structures
This feature demonstrates the package's ability to handle more complex HTML structures, such as lists and nested elements. The code sample shows how a div containing a heading and an unordered list is converted into Markdown.
const { NodeHtmlMarkdown } = require('node-html-markdown');
const html = '<div><h1>Title</h1><ul><li>Item 1</li><li>Item 2</li></ul></div>';
const markdown = NodeHtmlMarkdown.translate(html);
console.log(markdown);
Turndown is a highly customizable HTML to Markdown converter for JavaScript. It offers a similar functionality to node-html-markdown but with a focus on extensibility and customization. Turndown allows users to define custom rules for converting HTML elements, making it a versatile choice for developers needing more control over the conversion process.
html-to-markdown is another package that provides HTML to Markdown conversion. It is known for its simplicity and ease of use, similar to node-html-markdown. However, it may not offer as many customization options as Turndown, making it suitable for straightforward conversion tasks.
NHM is a fast HTML to markdown converter, compatible with both node and the browser.
It was built with the following two goals in mind:
We had a need to convert gigabytes of HTML daily very quickly. All libraries we found were too slow with node. We considered using a low-level language but decided to attempt to write something that would squeeze every bit of performance out of the JIT that we could. The end result was fast enough to make the cut!
The other libraries we tested produced output that would break in numerous conditions and produced output with many repeating linefeeds, etc. Generally speaking, outside of a markdown viewer, the result was not easy to read.
We took the approach of producing a clean, concise result with consistent spacing rules.
<yarn|npm|pnpm> add node-html-markdown
-----------------------------------------------------------------------------
Estimated processing times (fastest to slowest):
[node-html-markdown (reused instance)]
100 kB: 17ms
1 MB: 176ms
50 MB: 8.80sec
1 GB: 3min, 0sec
50 GB: 2hr, 30min, 14sec
[turndown (reused instance)]
100 kB: 27ms
1 MB: 280ms
50 MB: 13.98sec
1 GB: 4min, 46sec
50 GB: 3hr, 58min, 35sec
-----------------------------------------------------------------------------
Speed comparison - node-html-markdown (reused instance) is:
1.02 times as fast as node-html-markdown
1.57 times as fast as turndown
1.59 times as fast as turndown (reused instance)
-----------------------------------------------------------------------------
import { NodeHtmlMarkdown, NodeHtmlMarkdownOptions } from 'node-html-markdown'
/* ********************************************************* *
* Single use
* If using it once, you can use the static method
* ********************************************************* */
// Single file
NodeHtmlMarkdown.translate(
/* html */ `<b>hello</b>`,
/* options (optional) */ {},
/* customTranslators (optional) */ undefined,
/* customCodeBlockTranslators (optional) */ undefined
);
// Multiple files
NodeHtmlMarkdown.translate(
/* FileCollection */ {
'file1.html': `<b>hello</b>`,
'file2.html': `<b>goodbye</b>`
},
/* options (optional) */ {},
/* customTranslators (optional) */ undefined,
/* customCodeBlockTranslators (optional) */ undefined
);
/* ********************************************************* *
* Re-use
* If using it several times, creating an instance saves time
* ********************************************************* */
const nhm = new NodeHtmlMarkdown(
/* options (optional) */ {},
/* customTransformers (optional) */ undefined,
/* customCodeBlockTranslators (optional) */ undefined
);
// Single file
nhm.translate(/* html */ `<b>hello</b>`);
// Multiple Files
nhm.translate(
/* FileCollection */ {
'file1.html': `<b>hello</b>`,
'file2.html': `<b>goodbye</b>`
},
);
export interface NodeHtmlMarkdownOptions {
/**
* Use native window DOMParser when available
* @default false
*/
preferNativeParser: boolean,
/**
* Code block fence
* @default ```
*/
codeFence: string,
/**
* Bullet marker
* @default *
*/
bulletMarker: string,
/**
* Style for code block
* @default fence
*/
codeBlockStyle: 'indented' | 'fenced',
/**
* Emphasis delimiter
* @default _
*/
emDelimiter: string,
/**
* Strong delimiter
* @default **
*/
strongDelimiter: string,
/**
* Strong delimiter
* @default ~~
*/
strikeDelimiter: string,
/**
* Supplied elements will be ignored (ignores inner text does not parse children)
*/
ignore?: string[],
/**
* Supplied elements will be treated as blocks (surrounded with blank lines)
*/
blockElements?: string[],
/**
* Max consecutive new lines allowed
* @default 3
*/
maxConsecutiveNewlines: number,
/**
* Line Start Escape pattern
* (Note: Setting this will override the default escape settings, you might want to use textReplace option instead)
*/
lineStartEscape: [ pattern: RegExp, replacement: string ]
/**
* Global escape pattern
* (Note: Setting this will override the default escape settings, you might want to use textReplace option instead)
*/
globalEscape: [ pattern: RegExp, replacement: string ]
/**
* User-defined text replacement pattern (Replaces matching text retrieved from nodes)
*/
textReplace?: [ pattern: RegExp, replacement: string ][]
/**
* Keep images with data: URI (Note: These can be up to 1MB each)
* @example
* <img src="data:image/gif;base64,R0lGODlhEAAQAMQAAORHHOVSK......0o/">
* @default false
*/
keepDataImages?: boolean
/**
* Place URLS at the bottom and format links using link reference definitions
*
* @example
* Click <a href="/url1">here</a>. Or <a href="/url2">here</a>. Or <a href="/url1">this link</a>.
*
* Becomes:
* Click [here][1]. Or [here][2]. Or [this link][1].
*
* [1]: /url
* [2]: /url2
*/
useLinkReferenceDefinitions?: boolean
/**
* Wrap URL text in < > instead of []() syntax.
*
* @example
* The input <a href="https://google.com">https://google.com</a>
* becomes <https://google.com>
* instead of [https://google.com](https://google.com)
*
* @default true
*/
useInlineLinks?: boolean
}
Custom translators are an advanced option to allow handling certain elements a specific way.
These can be modified via the NodeHtmlMarkdown#translators
property, or added during creation.
For detail on how to use them see:
TranslatorConfig
defaultTranslators
The NodeHtmlMarkdown#codeBlockTranslators
property is a collection of translators which handles elements within a <pre><code>
block.
Being a performance-centric library, we're always interested in further improvements. There are several probable routes by which we could gain substantial performance increases over the current model.
Such methods include:
These would be fun to implement; however, for the time being, the present library is fast enough for my purposes. That said, I welcome discussion and any PR toward the effort of further improving performance, and I may ultimately do more work in that capacity in the future!
Looking to contribute? Check out our help wanted list for a good place to start!
FAQs
Fast HTML to markdown cross-compiler, compatible with both node and the browser
The npm package node-html-markdown receives a total of 97,821 weekly downloads. As such, node-html-markdown popularity was classified as popular.
We found that node-html-markdown demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.
Security News
Company News
Socket is joining TC54 to help develop standards for software supply chain security, contributing to the evolution of SBOMs, CycloneDX, and Package URL specifications.