![lint workflow](https://github.com/mciccale/ScholarVista/actions/workflows/lint.yml/badge.svg)
ScholarVista
ScholarVista is a tool that extracts and plots information from a set of Academic Research Papers in PDF / TEI XML format. To process PDFs, it utilizes Grobid to generate the TEI XML files, then ScholarVista extracts the relevant information from the TEI XML files and generates the following data:
- Keyword Cloud for each of the paper's abstract and for the total of all abstracts.
- Links List for each one of the links found in the paper.
- Figures Histogram comparing the number of figures per paper.
Table of Contents:
Requirements
If you want to generate the results from a set of PDF academic papers, you must ensure that the Grobid Service to be installed and running in your machine. See Grobid Installation Instrucions here.
If you already have the TEI XML files generated, you can directly generate the information from them.
Install ScholarVista
PIP
$ pip install scholarvista
When using pip it is a good practice to use virtual environments. Check out the official documentation on virtual envornments here.
Using ScholarVista
CLI Tool
The most convenient way of using ScholarVista is by using its CLI.
The CLI Tool will generate and save to a directory a keyword cloud and a list of URLs for each PDF analyzed, together with a histogram comparing the numer of figures of each PDF.
Usage: scholarvista [OPTIONS] COMMAND [ARGS]...
ScholarVista's CLI main entry point.
Options:
--input-dir PATH Directory containing PDF files. [required]
--output-dir PATH Directory to save results. Defaults to current directory.
--help Show this message and exit.
Commands:
process-pdfs Process all PDFs in the given directory.
process-xmls Process all TEI XMLs in the given directory.
Python Modules
See example.py
Execution Instructions
You can execute ScholarVista CLI from your shell like this:
$ scholarvista --input-dir ./pdfs --output-dir ./output process-pdfs
Note: The process-pdfs
command requires the Grobid Service to be up and running as described in requirements.
$ scholarvista --input-dir ./xmls process-xmls
License
Please refer to the LICENSE
file.
Where to Get Help
For further assistance or to contribute to the project, please refer to the CONTRIBUTING.md
file.