Socket
Book a DemoInstallSign in
Socket

doc_ripper

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

doc_ripper

0.0.9
bundlerRubygems
Version published
Maintainers
1
Created
Source

DocRipper

Gem Version

Grab the text from common document formats with 1 command. DocRipper is an extremely lightweight Ruby wrapper that can be used to parse text contents from common file formats (currently .doc, .docx and .pdf, .sketch) without the need for a large number of dependencies like an OCR library or OpenOffice/LibreOffice.

For simple parsing, you'll likely see a large performance improvement with DocRipper over solutions that rely on OpenOffice/LibreOffice for .doc/.docx conversion.

Need OCR support or in-image text parsing? Take a look at Docsplit.

Supported File Formats

.doc
.docx
.pdf
.txt
.sketch
File formatSupported?Dependencies
.docxAntiword
.docxx
.pdfxPoppler-utils
.txtx
.sketchxSqlite3

Quickstart

  gem install doc_ripper

Specify a file path of a file

  require 'doc_ripper'

  DocRipper::rip('/path/to/file')

If the file cannot be read, nil will be returned.

  DocRipper::rip('/path/to/missing/file')
  => nil

Want to raise an exception? Use #rip!

#rip! will raise an exception if rip returns nil or the file type isn't supported

  # invalid file type
  DocRipper::rip!('/path/to/invalide/file.type')
  => DocRipper::UnsupportedFileType

  # missing file
  DocRipper::rip!('/path/to/missing/file.doc')
  => DocRipper::FileNotFound

Dependencies

FAQs

Package last updated on 05 Feb 2019

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

About

Packages

Stay in touch

Get open source security insights delivered straight into your inbox.

  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc

U.S. Patent No. 12,346,443 & 12,314,394. Other pending.