Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
UniParser is a powerful, lightweight Node.js library designed to handle parsing of multiple file formats—such as PDF, DOCX, TXT, HTML, and Markdown—and convert them into plain text with ease.
🚀 Say goodbye to file format limitations! UniParser extracts text content from all these formats, providing a consistent text output for your applications.
.docx
files.To install UniParser, simply run:
npm install uniparser
After installation, you can easily import UniParser to start working with different file formats:
const { parsePDF, parseDOCX, parseTXT, parseHTML, parseMarkdown } = require('uniparser');
// Parsing a PDF file
const pdfText = await parsePDF('./path/to/sample-file.pdf');
console.log(pdfText);
// Parsing a DOCX file
const docxText = await parseDOCX('./path/to/sample-file.docx');
console.log(docxText);
// Parsing a TXT file
const txtText = await parseTXT('./path/to/sample-file.txt');
console.log(txtText);
// Parsing an HTML file
const htmlText = await parseHTML('./path/to/sample-file.html');
console.log(htmlText);
// Parsing a Markdown file
const markdownText = await parseMarkdown('./path/to/sample-file.md');
console.log(markdownText);
For small files, you can use UniParser synchronously:
const { parseTXT, parseMarkdown } = require('uniparser');
// Synchronously read small text files
const txtContent = parseTXT('./path/to/sample-file.txt');
console.log(txtContent);
const markdownContent = parseMarkdown('./path/to/sample-file.md');
console.log(markdownContent);
.pdf
): Converts PDF documents to plain text..docx
): Extracts text from Microsoft Word .docx
files..txt
): Reads plain text from simple text files..html
): Strips HTML tags and returns the text content..md
): Converts Markdown files to plain text, removing all formatting.Here's a quick example to get you started with DOCX parsing:
const { parseDOCX } = require('uniparser');
(async () => {
const docxText = await parseDOCX('./path/to/sample-file.docx');
console.log(docxText);
})();
This project is licensed under the MIT License. See the LICENSE file for more information.
Contributions are welcome! If you'd like to improve UniParser, feel free to fork the repository and submit a pull request. We appreciate your feedback and contributions!
💡 UniParser makes it easier than ever to extract content from a wide range of file formats—Try it now and streamline your file processing tasks! 🌟
FAQs
A universal parser for PDF, DOCX, TXT, MD, and HTML files into text
We found that uniparser demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.