Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

scholarvista

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

scholarvista

ScholarVista is a tool that analyzes research papers and extracts and plots information from them. It utilizes Grobid, a library for extracting content from research papers, to extract all the relevant data. The extracted data is then plotted and displayed using Python.

  • 0.1.3
  • PyPI
  • Socket score

Maintainers
1

PyPI version Documentation Status zenodo publish workflow test workflow lint workflow

ScholarVista

ScholarVista is a tool that extracts and plots information from a set of Academic Research Papers in PDF / TEI XML format. To process PDFs, it utilizes Grobid to generate the TEI XML files, then ScholarVista extracts the relevant information from the TEI XML files and generates the following data:

  1. Keyword Cloud for each of the paper's abstract and for the total of all abstracts.
  2. Links List for each one of the links found in the paper.
  3. Figures Histogram comparing the number of figures per paper.

Table of Contents:

Requirements

If you want to generate the results from a set of PDF academic papers, you must ensure that the Grobid Service to be installed and running in your machine. See Grobid Installation Instrucions here.

If you already have the TEI XML files generated, you can directly generate the information from them.

Install ScholarVista

PIP

$ pip install scholarvista

When using pip it is a good practice to use virtual environments. Check out the official documentation on virtual envornments here.

Using ScholarVista

CLI Tool

The most convenient way of using ScholarVista is by using its CLI.

The CLI Tool will generate and save to a directory a keyword cloud and a list of URLs for each PDF analyzed, together with a histogram comparing the numer of figures of each PDF.

Usage: scholarvista [OPTIONS] COMMAND [ARGS]...

  ScholarVista's CLI main entry point.

Options:
  --input-dir PATH   Directory containing PDF files.  [required]
  --output-dir PATH  Directory to save results. Defaults to current directory.
  --help             Show this message and exit.

Commands:
  process-pdfs  Process all PDFs in the given directory.
  process-xmls  Process all TEI XMLs in the given directory.

Python Modules

See example.py

Execution Instructions

You can execute ScholarVista CLI from your shell like this:

# Process PDF files and save the results to a specified directory
$ scholarvista --input-dir ./pdfs --output-dir ./output process-pdfs

Note: The process-pdfs command requires the Grobid Service to be up and running as described in requirements.

# Process TEI XML files and save the results to the current directory
$ scholarvista --input-dir ./xmls process-xmls

License

Please refer to the LICENSE file.

Where to Get Help

For further assistance or to contribute to the project, please refer to the CONTRIBUTING.md file.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc