
Security News
OWASP 2025 Top 10 Adds Software Supply Chain Failures, Ranked Top Community Concern
OWASP’s 2025 Top 10 introduces Software Supply Chain Failures as a new category, reflecting rising concern over dependency and build system risks.
@wordpress/block-serialization-default-parser
Advanced tools
Block serialization specification parser for WordPress posts.
This library contains the default block serialization parser implementations for WordPress documents. It provides native PHP and JavaScript parsers that implement the specification from @wordpress/block-serialization-spec-parser and which normally operates on the document stored in post_content.
Install the module
npm install @wordpress/block-serialization-default-parser --save
This package assumes that your code will run in an ES2015+ environment. If you're using an environment that has limited or no support for such language features and APIs, you should include the polyfill shipped in @wordpress/babel-preset-default in your code.
Parser function, that converts input HTML into a block based structure.
Usage
Input post:
<!-- wp:columns {"columns":3} -->
<div class="wp-block-columns has-3-columns">
<!-- wp:column -->
<div class="wp-block-column">
<!-- wp:paragraph -->
<p>Left</p>
<!-- /wp:paragraph -->
</div>
<!-- /wp:column -->
<!-- wp:column -->
<div class="wp-block-column">
<!-- wp:paragraph -->
<p><strong>Middle</strong></p>
<!-- /wp:paragraph -->
</div>
<!-- /wp:column -->
<!-- wp:column -->
<div class="wp-block-column"></div>
<!-- /wp:column -->
</div>
<!-- /wp:columns -->
Parsing code:
import { parse } from '@wordpress/block-serialization-default-parser';
parse( post ) ===
[
{
blockName: 'core/columns',
attrs: {
columns: 3,
},
innerBlocks: [
{
blockName: 'core/column',
attrs: null,
innerBlocks: [
{
blockName: 'core/paragraph',
attrs: null,
innerBlocks: [],
innerHTML: '\n<p>Left</p>\n',
},
],
innerHTML: '\n<div class="wp-block-column"></div>\n',
},
{
blockName: 'core/column',
attrs: null,
innerBlocks: [
{
blockName: 'core/paragraph',
attrs: null,
innerBlocks: [],
innerHTML: '\n<p><strong>Middle</strong></p>\n',
},
],
innerHTML: '\n<div class="wp-block-column"></div>\n',
},
{
blockName: 'core/column',
attrs: null,
innerBlocks: [],
innerHTML: '\n<div class="wp-block-column"></div>\n',
},
],
innerHTML:
'\n<div class="wp-block-columns has-3-columns">\n\n\n\n</div>\n',
},
];
Parameters
string: The HTML document to parse.Returns
ParsedBlock[]: A block-based representation of the input HTML.This is a recursive-descent parser that scans linearly once through the input document. Instead of directly recursing it utilizes a trampoline mechanism to prevent stack overflow. It minimizes data copying and passing through the use of globals for tracking state through the parse. Between every token (a block comment delimiter) we can instrument the parser and intervene should we want to; for example we might put a hard limit on how long we can be parsing a document or provide additional debugging diagnostics for a document.
The spec parser is defined via a Parsing Expression Grammar (PEG) which answers many questions inherently that we must answer explicitly in this parser. The goal for this implementation is to match the characteristics of the PEG so that it can be directly swapped out and so that the only changes are better runtime performance and memory usage.
Every serialized Gutenberg document is nominally an HTML document which, in addition to normal HTML, may also contain specially designed HTML comments -- the block comment delimiters -- which separate and isolate the blocks serialized in the document.
This parser attempts to create a state-machine around the transitions triggered from those delimiters -- the "tokens" of the grammar. Every time we find one we should only be doing either of:
Those actions have different effects depending on the context; for instance, when we exit a block we either need to add it to the output block list or we need to append it as the next innerBlock on the parent block below it in the block stack (the place where we track open blocks). The details are documented below.
The biggest challenge in this parser is making the right accounting of indices required to construct the innerHTML values for each block at every level of nesting depth. We take a simple approach:
innerHTML.innerBlocks list, add the content from where the content of the parent block started to where this inner block starts.innerBlocks list, add the content from where the previous inner block ended to where this inner block starts.innerHTML.This parser operates much faster than the generated parser from the specification. Because we know more about the parsing than the PEG does we can take advantage of several tricks to improve our speed and memory usage:
preg_match() takes an offset parameter we can crawl through the input without passing copies of the input text on every step. We can track our position in the string and only pass a number instead.Further, tokenizing with a RegExp brings an additional advantage. The parser generated by the PEG provides predictable performance characteristics in exchange for control over tokenization rules -- it doesn't allow us to define RegExp patterns in the rules so as to guard against e.g. cataclysmic backtracking that would break the PEG guarantees.
However, since our "token language" of the block comment delimiters is regular and can be trivially matched with RegExp patterns, we can do that here and then something magical happens: we jump out of PHP or JavaScript and into a highly-optimized RegExp engine written in C or C++ on the host system. We thereby leave the virtual machine and its overhead.
This is an individual package that's part of the Gutenberg project. The project is organized as a monorepo. It's made up of multiple self-contained software packages, each with a specific purpose. The packages in this monorepo are published to npm and used by WordPress as well as other software projects.
To find out more about contributing to this package or Gutenberg as a whole, please read the project's main contributor guide.

FAQs
Block serialization specification parser for WordPress posts.
The npm package @wordpress/block-serialization-default-parser receives a total of 68,784 weekly downloads. As such, @wordpress/block-serialization-default-parser popularity was classified as popular.
We found that @wordpress/block-serialization-default-parser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 23 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
OWASP’s 2025 Top 10 introduces Software Supply Chain Failures as a new category, reflecting rising concern over dependency and build system risks.

Research
/Security News
Socket researchers discovered nine malicious NuGet packages that use time-delayed payloads to crash applications and corrupt industrial control systems.

Security News
Socket CTO Ahmad Nassri discusses why supply chain attacks now target developer machines and what AI means for the future of enterprise security.