
Security News
/Research
Wallet-Draining npm Package Impersonates Nodemailer to Hijack Crypto Transactions
Malicious npm package impersonates Nodemailer and drains wallets by hijacking crypto transactions across multiple blockchains.
@omer-go/docx-parser-converter-ts
Advanced tools
A TypeScript library to convert DOCX files to WYSIWYG HTML or plain text formats while preserving styles.
A powerful TypeScript library for converting DOCX documents into HTML and plain text, with detailed parsing of document properties and styles. This project is based on a Python version.
Welcome to the Docx Parser and Converter for TypeScript/JavaScript! This library allows you to easily convert DOCX documents into HTML and plain text formats, extracting detailed properties and styles.
The project is structured to parse DOCX files, convert their content into structured data models, and provide conversion utilities to transform this data into HTML or plain text.
The current version (0.0.1) of this package is primarily designed and tested for browser environments.
While efforts are underway to ensure full Node.js compatibility, using this version in a Node.js environment might lead to errors (such as document is not defined
or Buffer is not defined
) because some underlying dependencies or utility functions currently rely on browser-specific APIs.
For browser usage, the library should function as expected. Node.js support will be improved in future releases.
To install the library, you can use npm or yarn:
npm install @omer-go/docx-parser-converter-ts
# or
yarn add @omer-go/docx-parser-converter-ts
ES Modules (Recommended for modern browsers and bundlers):
import { DocxToHtmlConverter, DocxToTxtConverter } from '@omer-go/docx-parser-converter-ts';
UMD (for direct use in browsers via <script>
tag):
If you include the UMD build (dist/docx-parser-converter.umd.js
), the library will be available on the global window.DocxParserConverter
object:
<script src="path/to/node_modules/@omer-go/docx-parser-converter-ts/dist/docx-parser-converter.umd.js"></script>
<script>
const { DocxToHtmlConverter, DocxToTxtConverter } = window.DocxParserConverter;
// ... use them ...
</script>
This example demonstrates usage with a file input in a browser.
HTML Setup:
<input type="file" id="docxFile" accept=".docx" />
<button onclick="handleConvert()">Convert</button>
<div id="htmlOutput"></div>
<pre id="textOutput"></pre>
JavaScript for Conversion:
// Assuming you've imported or accessed the converters as shown above
async function handleConvert() {
const fileInput = document.getElementById('docxFile');
const htmlOutputDiv = document.getElementById('htmlOutput');
const textOutputPre = document.getElementById('textOutput');
if (!fileInput.files || fileInput.files.length === 0) {
alert('Please select a DOCX file.');
return;
}
const file = fileInput.files[0];
try {
const arrayBuffer = await file.arrayBuffer(); // DOCX content as ArrayBuffer
// Convert to HTML
const htmlConverter = await DocxToHtmlConverter.create(arrayBuffer, { useDefaultValues: true });
const htmlResult = htmlConverter.convertToHtml();
htmlOutputDiv.innerHTML = htmlResult;
// Convert to Plain Text
const txtConverter = await DocxToTxtConverter.create(arrayBuffer, { useDefaultValues: true });
const txtResult = txtConverter.convertToTxt({ indent: true });
textOutputPre.textContent = txtResult;
} catch (error) {
console.error("Conversion error:", error);
alert("Error during conversion: " + error.message);
}
}
The Docx Parser and Converter library supports parsing various XML components within a DOCX file. Below is a detailed list of the supported and unsupported components:
document.xml:
numbering.xml:
styles.xml:
The Docx Parser and Converter library follows a structured workflow to parse, convert, and merge document properties and styles according to DOCX specifications. Hereβs a detailed overview of the technical process:
Parsing XML Files:
word/document.xml
, word/styles.xml
, and word/numbering.xml
.DocumentParser
extracts the main document structure (paragraphs, tables, runs) into structured models.NumberingParser
extracts numbering definitions and levels.StylesParser
extracts styles for paragraphs, runs, tables, and document defaults.Property and Style Merging:
Conversion to HTML and TXT:
DocxToHtmlConverter
takes the parsed document models and converts the elements into HTML format.DocxToTxtConverter
converts the document models into plain text format.This process ensures accurate parsing and conversion while preserving the original document's structure and style as much as possible within the supported features.
XML Element | HTML Element | Notes |
---|---|---|
w:p | p | Paragraph element |
w:r | span | Run element, used for inline text formatting |
w:tbl | table | Table element |
w:tr | tr | Table row |
w:tc | td | Table cell |
w:tblGrid | colgroup | Table grid, converted to colgroup for column definitions |
w:gridCol | col | Grid column, converted to col for column width |
w:tblPr | table | Table properties |
w:tblW | table style="width:Xpt;" | Table width, converted using CSS width property (approx.) |
w:tblBorders | table, tr, td style="border:X;" | Table borders, converted using CSS border property |
w:tblCellMar | td style="padding:Xpt;" | Table cell margins, converted using CSS padding property |
w:b | b or strong or CSS font-weight | Bold text |
w:i | i or em or CSS font-style | Italic text |
w:u | span style="text-decoration:underline;" | Underline text, converted using CSS text-decoration property |
w:color | span style="color:#RRGGBB;" | Text color, converted using CSS color property |
w:sz | span style="font-size:Xpt;" | Text size, converted using CSS font-size property (in points) |
w:jc | p style="text-align:left|center|right|justify;" | Text alignment, converted using CSS text-align property |
w:ind | p style="margin-left:Xpt; text-indent:Xpt;" | Indentation, converted using CSS margin and text-indent |
w:spacing | p style="line-height:X; margin-top:Ypt; margin-bottom:Zpt;" | Line/paragraph spacing, converted using CSS properties |
w:highlight | span style="background-color:#RRGGBB;" | Text highlight, converted using CSS background-color property |
w:shd | span style="background-color:#RRGGBB;" | Shading, converted using CSS background-color property |
w:vertAlign | span style="vertical-align:super|sub;" | Vertical alignment (superscript/subscript) |
w:pgMar | body/div style="padding: Xpt;" | Page margins, applied to a wrapper div or body |
w:rFonts | span style="font-family:'font-name';" | Font name, converted using CSS font-family property |
w:tab | span (with calculated width) | Tab characters, converted to spans with appropriate spacing |
Numbering | ol, ul, li with CSS for styling | List items with various numbering/bullet styles |
Detailed API documentation will be made available soon. For now, please refer to the exported classes and their methods:
DocxToHtmlConverter
static async create(docxFile: ArrayBuffer | Uint8Array | File | Blob, options?: DocxToHtmlOptions): Promise<DocxToHtmlConverter>
convertToHtml(): string
DocxToTxtConverter
static async create(docxFile: ArrayBuffer | Uint8Array | File | Blob, options?: DocxToTxtOptions): Promise<DocxToTxtConverter>
convertToTxt(options?: { indent?: boolean }): string
Interfaces for options (DocxToHtmlOptions
, DocxToTxtOptions
) are also exported.
Enjoy using Docx Parser and Converter! πβ¨
FAQs
A TypeScript library to convert DOCX files to WYSIWYG HTML or plain text formats while preserving styles.
The npm package @omer-go/docx-parser-converter-ts receives a total of 0 weekly downloads. As such, @omer-go/docx-parser-converter-ts popularity was classified as not popular.
We found that @omer-go/docx-parser-converter-ts demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
/Research
Malicious npm package impersonates Nodemailer and drains wallets by hijacking crypto transactions across multiple blockchains.
Security News
This episode explores the hard problem of reachability analysis, from static analysis limits to handling dynamic languages and massive dependency trees.
Security News
/Research
Malicious Nx npm versions stole secrets and wallet info using AI CLI tools; Socketβs AI scanner detected the supply chain attack and flagged the malware.