Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

office-text-extractor

Package Overview
Dependencies
Maintainers
1
Versions
23
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

office-text-extractor

Yet another library to extract text from MS Office and PDF files

  • 3.0.0-beta.2
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
5.8K
increased by15.05%
Maintainers
1
Weekly downloads
 
Created
Source

office-text-extractor

Github Workflow Status GitHub Stars

Yet another library to extract text from MS Office (docx, pptx, xlsx) and PDF (pdf) files.

Similar projects

There are other great projects that do the same job and have inspired this project, such as:

  • any-text
  • officeparser
  • textract

How is this project different?

  • Parses file based on its mime type, not its file extension.
  • Does not spawn a child process to use a tool installed on the device.
  • Reads and returns text from the file if it contains plain text.

Libraries used

This module uses some amazing existing libraries that perform better than the ones that originally existed in this module, and are therefore used instead:

This module also uses:

  • xml2js - to convert the MS Office XML files into JSON
  • js-yaml - to convert JSON into YAML
  • file-type - to detect the mime type of files
  • decompress - to unzip files
  • read-chunk - to read chunks of data from large files

A big thank you to the contributors of these projects!

Installation

Note

This package is now pure ESM (from version 2.0.0 onwards). Please read this article for a guide on how to ensure your project can import this library.

To use this in an Node project, install it using npm/pnpm/yarn:

# Using npm
> npm install office-text-extractor

# Using pnpm
> pnpm add office-text-extractor

# Using yarn
> yarn add office-text-extractor

Usage

import { extractText } from 'office-text-extractor'

// Extract the text using `async-await`.
const text = await extractText('path/to/file')
console.log(text)

// Extract the text using Promises.
extractText('path/to/file')
	.then((text) => console.log(text))
	.catch((error) => console.error(error))

Note

There is no support for browser environments yet. If you want to add support, please feel free to open a pull request.

License

This project is licensed under the ISC license. Please see license.md for more details.

Keywords

FAQs

Package last updated on 15 Jun 2023

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc