
Security News
Browserslist-rs Gets Major Refactor, Cutting Binary Size by Over 1MB
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Parses Microsoft's Rich Text Format (RTF) documents. It creates an in-memory object which represents the tree structure of the RTF document. This object can in turn be rendered by using one of the renderers.
So far, rtfparse provides only one renderer (HTML_Decapsulator
) which liberates the HTML code encapsulated in RTF. This will come handy, for examle, if you ever need to extract the HTML from a HTML-formatted email message saved by Microsoft Outlook.
MS Outlook also tends to use RTF compression, so the CLI of rtfparse can optionally decompress that, too.
You can of course write your own renderers of parsed RTF documents and consider contributing them to this project.
Install rtfparse from your local repository with pip:
pip install rtfparse
Installation creates an executable file rtfparse
in your python scripts folder which should be in your $PATH
.
Use the rtfparse
executable from the command line. Read rtfparse --help
.
rtfparse writes logs into ~/rtfparse/
into these files:
rtfparse.debug.log
rtfparse.info.log
rtfparse.errors.log
rtfparse --rtf-file "path/to/rtf_file.rtf" --decapsulate-html --output-file "path/to/extracted.html"
For this, the CLI of rtfparse uses extract_msg and compressed_rtf.
rtfparse --msg-file "path/to/email.msg" --decapsulate-html --output-file "path/to/extracted.html"
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf"
When extracting the RTF from the .msg
file, you can save the attachments (which includes images embedded in the email text) in a directory:
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir"
In rtfparse
version 1.x you will be able to embed these images in the decapsulated HTML. This functionality will be provided by the package embedimg.
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir" --embed-img
In the current version the option --embed-img
does nothing.
from pathlib import Path
from rtfparse.parser import Rtf_Parser
from rtfparse.renderers.html_decapsulator import HTML_Decapsulator
source_path = Path(r"path/to/your/rtf/document.rtf")
target_path = Path(r"path/to/your/html/decapsulated.html")
# Create parent directory of `target_path` if it does not already exist:
target_path.parent.mkdir(parents=True, exist_ok=True)
parser = Rtf_Parser(rtf_path=source_path)
parsed = parser.parse_file()
renderer = HTML_Decapsulator()
with open(target_path, mode="w", encoding="utf-8") as html_file:
renderer.render(parsed, html_file)
from pathlib import Path
from extract_msg import openMsg
from compressed_rtf import decompress
from io import BytesIO
from rtfparse.parser import Rtf_Parser
from rtfparse.renderers.html_decapsulator import HTML_Decapsulator
source_file = Path("path/to/your/source.msg")
target_file = Path(r"path/to/your/target.html")
# Create parent directory of `target_path` if it does not already exist:
target_file.parent.mkdir(parents=True, exist_ok=True)
# Get a decompressed RTF bytes buffer from the MS Outlook message
msg = openMsg(source_file)
decompressed_rtf = decompress(msg.compressedRtf)
rtf_buffer = BytesIO(decompressed_rtf)
# Parse the rtf buffer
parser = Rtf_Parser(rtf_file=rtf_buffer)
parsed = parser.parse_file()
# Decapsulate the HTML from the parsed RTF
decapsulator = HTML_Decapsulator()
with open(target_file, mode="w", encoding="utf-8") as html_file:
decapsulator.render(parsed, html_file)
FAQs
Tool to parse Microsoft Rich Text Format (RTF)
We found that rtfparse demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Browserslist-rs now uses static data to reduce binary size by over 1MB, improving memory use and performance for Rust-based frontend tools.
Research
Security News
Eight new malicious Firefox extensions impersonate games, steal OAuth tokens, hijack sessions, and exploit browser permissions to spy on users.
Security News
The official Go SDK for the Model Context Protocol is in development, with a stable, production-ready release expected by August 2025.