
Security News
npm Adopts OIDC for Trusted Publishing in CI/CD Workflows
npm now supports Trusted Publishing with OIDC, enabling secure package publishing directly from CI/CD workflows without relying on long-lived tokens.
Convert markdown and its elements (tables, lists, code, etc.) into structured, easily processable data formats like lists and hierarchical dictionaries (or JSON), with support for parsing back to markdown.
Convert markdown and its elements (tables, lists, code, etc.) into structured, easily processable data formats like lists and hierarchical dictionaries (or JSON), with support for parsing back to markdown.
start_line
and end_line
) to parsed markdown elementspip install markdown-to-data
from markdown_to_data import Markdown
markdown = """
---
title: Example text
author: John Doe
---
# Main Header
- [ ] Pending task
- [x] Completed subtask
- [x] Completed task
## Table Example
| Column 1 | Column 2 |
|----------|----------|
| Cell 1 | Cell 2 |
´´´python
def hello():
print("Hello World!")
´´´
"""
md = Markdown(markdown)
# Get parsed markdown as list
print(md.md_list)
# Each building block is a separate dictionary in the list
# Get parsed markdown as nested dictionary
print(md.md_dict)
# Headers are used as keys for nesting content
# Get information about markdown elements
print(md.md_elements)
md.md_list
)[
{
'metadata': {'title': 'Example text', 'author': 'John Doe'},
'start_line': 2,
'end_line': 5
},
{
'header': {'level': 1, 'content': 'Main Header'},
'start_line': 7,
'end_line': 7
},
{
'list': {
'type': 'ul',
'items': [
{
'content': 'Pending task',
'items': [
{
'content': 'Completed subtask',
'items': [],
'task': 'checked'
}
],
'task': 'unchecked'
},
{'content': 'Completed task', 'items': [], 'task': 'checked'}
]
},
'start_line': 9,
'end_line': 11
},
{
'header': {'level': 2, 'content': 'Table Example'},
'start_line': 13,
'end_line': 13
},
{
'table': {'Column 1': ['Cell 1'], 'Column 2': ['Cell 2']},
'start_line': 14,
'end_line': 16
},
{
'code': {
'language': 'python',
'content': 'def hello():\n print("Hello World!")'
},
'start_line': 18,
'end_line': 21
}
]
md.md_dict
){
'metadata': {'title': 'Example text', 'author': 'John Doe'},
'Main Header': {
'list_1': {
'type': 'ul',
'items': [
{
'content': 'Pending task',
'items': [
{
'content': 'Completed subtask',
'items': [],
'task': 'checked'
}
],
'task': 'unchecked'
},
{'content': 'Completed task', 'items': [], 'task': 'checked'}
]
},
'Table Example': {
'table_1': {'Column 1': ['Cell 1'], 'Column 2': ['Cell 2']},
'code_1': {
'language': 'python',
'content': 'def hello():\n print("Hello World!")'
}
}
}
}
md.md_elements
){
'metadata': {
'count': 1,
'positions': [0],
'variants': ['2_fields'],
'summary': {}
},
'header': {
'count': 2,
'positions': [1, 3],
'variants': ['h1', 'h2'],
'summary': {'levels': {1: 1, 2: 1}}
},
'list': {
'count': 1,
'positions': [2],
'variants': ['task', 'ul'],
'summary': {'task_stats': {'checked': 2, 'unchecked': 1, 'total_tasks': 3}}
},
'table': {
'count': 1,
'positions': [4],
'variants': ['2_columns'],
'summary': {'column_counts': [2], 'total_cells': 2}
},
'paragraph': {
'count': 4,
'positions': [5, 6, 7, 8],
'variants': [],
'summary': {}
}
}
The enhanced md_elements
property now provides:
to_md
)The Markdown
class provides a method to parse markdown data back to markdown-formatted strings.
The to_md
method comes with options to customize the output:
from markdown_to_data import Markdown
markdown = """
---
title: Example
---
# Main Header
- [x] Task 1
- [ ] Subtask
- [ ] Task 2
## Code Example
´´´python
print("Hello")
´´´
"""
md = Markdown(markdown)
Example 1: Include specific elements
print(md.to_md(
include=['header', 'list'], # Include all headers and lists
spacer=1 # One empty line between elements
))
Output:
# Main Header
- [x] Task 1
- [ ] Subtask
- [ ] Task 2
Example 2: Include by position and exclude specific types
print(md.to_md(
include=[0, 1, 2], # Include first three elements
exclude=['code'], # But exclude any code blocks
spacer=2 # Two empty lines between elements
))
Output:
---
title: Example
---
# Main Header
- [x] Task 1
- [ ] Subtask
- [ ] Task 2
to_md_parser
FunctionThe to_md_parser
function can be used directly to convert markdown data structures to markdown text:
from markdown_to_data import to_md_parser
data = [
{
'metadata': {
'title': 'Document'
}
},
{
'header': {
'level': 1,
'content': 'Title'
}
},
{
'list': {
'type': 'ul',
'items': [
{
'content': 'Task 1',
'items': [],
'task': 'checked'
}
]
}
}
]
print(to_md_parser(data=data, spacer=1))
Output:
---
title: Document
---
# Title
- [x] Task 1
metadata = '''
---
title: Document
author: John Doe
tags: markdown, documentation
---
'''
md = Markdown(metadata)
print(md.md_list)
Output:
[
{
'metadata': {
'title': 'Document',
'author': 'John Doe',
'tags': ['markdown', 'documentation']
},
'start_line': 2,
'end_line': 6
}
]
headers = '''
# Main Title
## Section
### Subsection
'''
md = Markdown(headers)
print(md.md_list)
Output:
[
{
'header': {'level': 1, 'content': 'Main Title'},
'start_line': 2,
'end_line': 2
},
{
'header': {
'level': 2,
'content': 'Section'
},
'start_line': 3,
'end_line': 3
},
{
'header': {'level': 3, 'content': 'Subsection'},
'start_line': 4,
'end_line': 4
}
]
lists = '''
- Regular item
- Nested item
- [x] Completed task
- [ ] Pending subtask
1. Ordered item
1. Nested ordered
'''
md = Markdown(lists)
print(md.md_list)
Output:
[
{
'list': {
'type': 'ul',
'items': [
{
'content': 'Regular item',
'items': [
{'content': 'Nested item', 'items': [], 'task': None}
],
'task': None
},
{
'content': 'Completed task',
'items': [
{
'content': 'Pending subtask',
'items': [],
'task': 'unchecked'
}
],
'task': 'checked'
}
]
},
'start_line': 2,
'end_line': 5
},
{
'list': {
'type': 'ol',
'items': [
{
'content': 'Ordered item',
'items': [
{'content': 'Nested ordered', 'items': [], 'task': None}
],
'task': None
}
]
},
'start_line': 6,
'end_line': 7
}
]
tables = '''
| Header 1 | Header 2 |
|----------|----------|
| Value 1 | Value 2 |
| Value 3 | Value 4 |
'''
md = Markdown(tables)
print(md.md_list)
Output:
[
{
'table': {
'Header 1': ['Value 1', 'Value 3'],
'Header 2': ['Value 2', 'Value 4']
},
'start_line': 2,
'end_line': 5
}
]
code = '''
´´´python
def example():
return "Hello"
´´´
´´´javascript
console.log("Hello");
´´´
'''
md = Markdown(code)
print(md.md_list)
Output:
[
{
'code': {
'language': 'python',
'content': 'def example():\n return "Hello"'
},
'start_line': 2,
'end_line': 5
},
{
'code': {'language': 'javascript', 'content': 'console.log("Hello");'},
'start_line': 7,
'end_line': 9
}
]
blockquotes = '''
> Simple quote
> Multiple lines
> Nested quote
>> Inner quote
> Back to outer
'''
md = Markdown(blockquotes)
print(md.md_list)
Output:
[
{
'blockquote': [
{'content': 'Simple quote', 'items': []},
{'content': 'Multiple lines', 'items': []}
],
'start_line': 2,
'end_line': 3
},
{
'blockquote': [
{
'content': 'Nested quote',
'items': [{'content': 'Inner quote', 'items': []}]
},
{'content': 'Back to outer', 'items': []}
],
'start_line': 5,
'end_line': 7
}
]
def_lists = '''
Term
: Definition 1
: Definition 2
'''
md = Markdown(def_lists)
print(md.md_list)
Output:
[
{
'def_list': {'term': 'Term', 'list': ['Definition 1', 'Definition 2']},
'start_line': 2,
'end_line': 4
}
]
Contributions are welcome! Please feel free to submit a Pull Request or open an issue.
FAQs
Convert markdown and its elements (tables, lists, code, etc.) into structured, easily processable data formats like lists and hierarchical dictionaries (or JSON), with support for parsing back to markdown.
We found that markdown-to-data demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
npm now supports Trusted Publishing with OIDC, enabling secure package publishing directly from CI/CD workflows without relying on long-lived tokens.
Research
/Security News
A RubyGems malware campaign used 60 malicious packages posing as automation tools to steal credentials from social media and marketing tool users.
Security News
The CNA Scorecard ranks CVE issuers by data completeness, revealing major gaps in patch info and software identifiers across thousands of vulnerabilities.