
Security News
PyPI Expands Trusted Publishing to GitLab Self-Managed as Adoption Passes 25 Percent
PyPI adds Trusted Publishing support for GitLab Self-Managed as adoption reaches 25% of uploads
@wordpress/block-serialization-default-parser
Advanced tools
Block serialization specification parser for WordPress posts.
This library contains the default block serialization parser implementations for WordPress documents. It provides native PHP and JavaScript parsers that implement the specification from @wordpress/block-serialization-spec-parser and which normally operates on the document stored in post_content.
Install the module
npm install @wordpress/block-serialization-default-parser --save
This package assumes that your code will run in an ES2015+ environment. If you're using an environment that has limited or no support for such language features and APIs, you should include the polyfill shipped in @wordpress/babel-preset-default in your code.
Parser function, that converts input HTML into a block based structure.
Usage
Input post:
<!-- wp:columns {"columns":3} -->
<div class="wp-block-columns has-3-columns">
<!-- wp:column -->
<div class="wp-block-column">
<!-- wp:paragraph -->
<p>Left</p>
<!-- /wp:paragraph -->
</div>
<!-- /wp:column -->
<!-- wp:column -->
<div class="wp-block-column">
<!-- wp:paragraph -->
<p><strong>Middle</strong></p>
<!-- /wp:paragraph -->
</div>
<!-- /wp:column -->
<!-- wp:column -->
<div class="wp-block-column"></div>
<!-- /wp:column -->
</div>
<!-- /wp:columns -->
Parsing code:
import { parse } from '@wordpress/block-serialization-default-parser';
parse( post ) ===
[
{
blockName: 'core/columns',
attrs: {
columns: 3,
},
innerBlocks: [
{
blockName: 'core/column',
attrs: null,
innerBlocks: [
{
blockName: 'core/paragraph',
attrs: null,
innerBlocks: [],
innerHTML: '\n<p>Left</p>\n',
},
],
innerHTML: '\n<div class="wp-block-column"></div>\n',
},
{
blockName: 'core/column',
attrs: null,
innerBlocks: [
{
blockName: 'core/paragraph',
attrs: null,
innerBlocks: [],
innerHTML: '\n<p><strong>Middle</strong></p>\n',
},
],
innerHTML: '\n<div class="wp-block-column"></div>\n',
},
{
blockName: 'core/column',
attrs: null,
innerBlocks: [],
innerHTML: '\n<div class="wp-block-column"></div>\n',
},
],
innerHTML:
'\n<div class="wp-block-columns has-3-columns">\n\n\n\n</div>\n',
},
];
Parameters
string: The HTML document to parse.Returns
ParsedBlock[]: A block-based representation of the input HTML.This is a recursive-descent parser that scans linearly once through the input document. Instead of directly recursing it utilizes a trampoline mechanism to prevent stack overflow. It minimizes data copying and passing through the use of globals for tracking state through the parse. Between every token (a block comment delimiter) we can instrument the parser and intervene should we want to; for example we might put a hard limit on how long we can be parsing a document or provide additional debugging diagnostics for a document.
The spec parser is defined via a Parsing Expression Grammar (PEG) which answers many questions inherently that we must answer explicitly in this parser. The goal for this implementation is to match the characteristics of the PEG so that it can be directly swapped out and so that the only changes are better runtime performance and memory usage.
Every serialized Gutenberg document is nominally an HTML document which, in addition to normal HTML, may also contain specially designed HTML comments -- the block comment delimiters -- which separate and isolate the blocks serialized in the document.
This parser attempts to create a state-machine around the transitions triggered from those delimiters -- the "tokens" of the grammar. Every time we find one we should only be doing either of:
Those actions have different effects depending on the context; for instance, when we exit a block we either need to add it to the output block list or we need to append it as the next innerBlock on the parent block below it in the block stack (the place where we track open blocks). The details are documented below.
The biggest challenge in this parser is making the right accounting of indices required to construct the innerHTML values for each block at every level of nesting depth. We take a simple approach:
innerHTML.innerBlocks list, add the content from where the content of the parent block started to where this inner block starts.innerBlocks list, add the content from where the previous inner block ended to where this inner block starts.innerHTML.This parser operates much faster than the generated parser from the specification. Because we know more about the parsing than the PEG does we can take advantage of several tricks to improve our speed and memory usage:
preg_match() takes an offset parameter we can crawl through the input without passing copies of the input text on every step. We can track our position in the string and only pass a number instead.Further, tokenizing with a RegExp brings an additional advantage. The parser generated by the PEG provides predictable performance characteristics in exchange for control over tokenization rules -- it doesn't allow us to define RegExp patterns in the rules so as to guard against e.g. cataclysmic backtracking that would break the PEG guarantees.
However, since our "token language" of the block comment delimiters is regular and can be trivially matched with RegExp patterns, we can do that here and then something magical happens: we jump out of PHP or JavaScript and into a highly-optimized RegExp engine written in C or C++ on the host system. We thereby leave the virtual machine and its overhead.
This is an individual package that's part of the Gutenberg project. The project is organized as a monorepo. It's made up of multiple self-contained software packages, each with a specific purpose. The packages in this monorepo are published to npm and used by WordPress as well as other software projects.
To find out more about contributing to this package or Gutenberg as a whole, please read the project's main contributor guide.

FAQs
Block serialization specification parser for WordPress posts.
The npm package @wordpress/block-serialization-default-parser receives a total of 73,114 weekly downloads. As such, @wordpress/block-serialization-default-parser popularity was classified as popular.
We found that @wordpress/block-serialization-default-parser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 23 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
PyPI adds Trusted Publishing support for GitLab Self-Managed as adoption reaches 25% of uploads

Research
/Security News
A malicious Chrome extension posing as an Ethereum wallet steals seed phrases by encoding them into Sui transactions, enabling full wallet takeover.

Security News
Socket is heading to London! Stop by our booth or schedule a meeting to see what we've been working on.