textminer

Package Overview

Dependencies

Maintainers

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

textminer

0.1.5

Rubygems

Version published: 10 years ago

Maintainers: 1

Created: 10 years ago

Source

textminer

textminer helps you text mine through Crossref's TDM (Text & Data Mining) services:

Changes

For changes see the CHANGELOG

gem API

Textiner.search - search by DOI, query string, filters, etc. to get Crossref metadata, which you can use downstream to get full text links. This method essentially wraps Serrano.works(), but only a subset of params - this interface may change depending on feedback.
Textiner.fetch - Fetch full text given a url, supports Crossref's Text and Data Mining service
Textiner.extract - Extract text from a pdf

Install

Release version

gem install textminer

Development version

git clone git@github.com:sckott/textminer.git
cd textminer
rake install

Examples

Within Ruby

Search

Search by DOI

require 'textminer'
# link to full text available
Textminer.search(doi: '10.7554/elife.06430')
# no link to full text available
Textminer.search(doi: "10.1371/journal.pone.0000308")

Many DOIs at once

require 'serrano'
dois = Serrano.random_dois(sample: 6)
Textminer.search(doi: dois)

Search with filters

Textminer.search(filter: {has_full_text: true})

Get full text links

The object returned form Textminer.search is a class, which has methods for pulling out all links, xml only, pdf only, or plain text only

x = Textminer.search(filter: {has_full_text: true})
x.links_xml
x.links_pdf
x.links_plain

Fetch full text

Textminer.fetch() gets full text based on URL input. We determine how to pull down and parse the content based on content type.

# get some metadata
res = Textminer.search(member: 2258, filter: {has_full_text: true});
# get links
links = res.links_xml(true);
# Get full text for an article
res = Textminer.fetch(url: links[0]);
# url
res.url
# file path
res.path
# content type
res.type
# parse content
res.parse

Extract text from PDF

Textminer.extract() extracts text from a pdf, given a path for a pdf

res = Textminer.search(member: 2258, filter: {has_full_text: true});
links = res.links_pdf(true);
res = Textminer.fetch(url: links[0]);
Textminer.extract(res.path)

On the CLI

Coming soon...

To do

CLI executable
better test suite
better documentation

FAQs

What is textminer?

Is textminer well maintained?

Package last updated on 05 Dec 2015

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

textminer

textminer

Changes

gem API

Install

Release version

Development version

Examples

Within Ruby

Search

Get full text links

Fetch full text

Extract text from PDF

On the CLI

To do

Related posts

ESLint Adds Support for Parallel Linting, Closing 10-Year-Old Feature Request

Malicious Go Module Disguised as SSH Brute Forcer Exfiltrates Credentials via Telegram