SheetReader Python Bindings
SheetReader allows to read your Excel spreadsheet files (.xlsx) blazingly fast. This repository contains the Python bindings, as the core library is implemented in C++.
Quickstart
Sheetreader is available through:
pip install pysheetreader
After successful installation, spreadsheets can be loaded:
import pysheetreader as sr
sheet = sr.read_xlsx("my_favorite_sheet.xlsx")
To convert a spreadsheet into a pandas
Dataframe:
import pysheetreader as sr
import pandas as pd
sheet = sr.read_xlsx("my_favorite_sheet.xlsx")
df = pd.DataFrame.from_dict(sheet[0])
Parameters:
Parameter | Type | Description | Default |
---|
path | string | The path of the .xlsx file to parse. | - |
sheet | integer or string | The sheet of the file to parse, can be either the index (starting at 1) or the name. | 1 |
headers | boolean | Whether to interpret the first parsed row as headers. | True |
skip_rows | integer | How many rows to skip before parsing data. | 0 |
skip_columns | integer | How many columns to skip before parsing data. | 0 |
num_threads | integer | How many threads to use for parsing. Use -1 for automatic threading. | -1 |
col_types | dict or list | How to interpret parsed data, either by names (dict ) or by position (list ). Types: numeric , text , logical , date , skip , guess . | None |
Build Instructions
First install the submodules, which contain the sheetreader-core
dependency with:
git clone --recurse-submodules https://github.com/polydbms/sheetreader-python.git
To build from source, this repository provides a pyproject.toml
.
The SheetReader wheel file can be generated through:
python -m build .
or installed with pip through:
pip install .
More resources
SheetReader is part of the PolyDB Project. We also provide bindings/extensions for several other environments:
Paper
SheetReader was published in the Information Systems Journal. Cite as:
@article{DBLP:journals/is/GavriilidisHZM23,
author = {Haralampos Gavriilidis and
Felix Henze and
Eleni Tzirita Zacharatou and
Volker Markl},
title = {SheetReader: Efficient Specialized Spreadsheet Parsing},
journal = {Inf. Syst.},
volume = {115},
pages = {102183},
year = {2023},
url = {https://doi.org/10.1016/j.is.2023.102183},
doi = {10.1016/J.IS.2023.102183},
timestamp = {Mon, 26 Jun 2023 20:54:32 +0200},
biburl = {https://dblp.org/rec/journals/is/GavriilidisHZM23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}