================================================================================
pyexcel-io - Let you focus on data, instead of file formats
.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png
:target: https://www.patreon.com/chfw
.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel-mobans/master/images/awesome-badge.svg
:target: https://awesome-python.com/#specific-formats-processing
.. image:: https://github.com/pyexcel/pyexcel-io/workflows/run_tests/badge.svg
:target: http://github.com/pyexcel/pyexcel-io/actions
.. image:: https://codecov.io/gh/pyexcel/pyexcel-io/branch/master/graph/badge.svg
:target: https://codecov.io/gh/pyexcel/pyexcel-io
.. image:: https://badge.fury.io/py/pyexcel-io.svg
:target: https://pypi.org/project/pyexcel-io
.. image:: https://anaconda.org/conda-forge/pyexcel-io/badges/version.svg
:target: https://anaconda.org/conda-forge/pyexcel-io
.. image:: https://pepy.tech/badge/pyexcel-io/month
:target: https://pepy.tech/project/pyexcel-io
.. image:: https://anaconda.org/conda-forge/pyexcel-io/badges/downloads.svg
:target: https://anaconda.org/conda-forge/pyexcel-io
.. image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg
:target: https://gitter.im/pyexcel/Lobby
.. image:: https://img.shields.io/static/v1?label=continuous%20templating&message=%E6%A8%A1%E7%89%88%E6%9B%B4%E6%96%B0&color=blue&style=flat-square
:target: https://moban.readthedocs.io/en/latest/#at-scale-continous-templating-for-open-source-projects
.. image:: https://img.shields.io/static/v1?label=coding%20style&message=black&color=black&style=flat-square
:target: https://github.com/psf/black
.. image:: https://readthedocs.org/projects/pyexcel-io/badge/?version=latest
:target: http://pyexcel-io.readthedocs.org/en/latest/
Support the project
If your company has embedded pyexcel and its components into a revenue generating
product, please support me on github, patreon <https://www.patreon.com/bePatron?u=5537627>
_
or bounty source <https://salt.bountysource.com/teams/chfw-pyexcel>
_ to maintain
the project and develop it further.
If you are an individual, you are welcome to support me too and for however long
you feel like. As my backer, you will receive
early access to pyexcel related contents <https://www.patreon.com/pyexcel/posts>
_.
And your issues will get prioritized if you would like to become my patreon as pyexcel pro user
.
With your financial support, I will be able to invest
a little bit more time in coding, documentation and writing interesting posts.
Known constraints
Fonts, colors and charts are not supported.
Nor to read password protected xls, xlsx and ods files.
Introduction
pyexcel-io provides one application programming interface(API) to read
and write the data in excel format, import the data into and export the data
from database. It provides support for csv(z) format, django database and
sqlalchemy supported databases. Its supported file formats are extended to cover
"xls", "xlsx", "ods" by the following extensions:
.. _file-format-list:
.. _a-map-of-plugins-and-file-formats:
.. table:: A list of file formats supported by external plugins
======================== ======================= =================
Package name Supported file formats Dependencies
======================== ======================= =================
pyexcel-io
_ csv, csvz [#f1], tsv,
tsvz [#f2]
pyexcel-xls
_ xls, xlsx(read only), xlrd
,
xlsm(read only) xlwt
pyexcel-xlsx
_ xlsx openpyxl
_
pyexcel-ods3
_ ods pyexcel-ezodf
,
lxml
pyexcel-ods
ods odfpy
_
======================== ======================= =================
.. table:: Dedicated file reader and writers
======================== ======================= =================
Package name Supported file formats Dependencies
======================== ======================= =================
pyexcel-xlsxw
_ xlsx(write only) XlsxWriter
_
pyexcel-libxlsxw
_ xlsx(write only) libxlsxwriter
_
pyexcel-xlsxr
_ xlsx(read only) lxml
pyexcel-xlsbr
_ xlsb(read only) pyxlsb
pyexcel-odsr
_ read only for ods, fods lxml
pyexcel-odsw
_ write only for ods loxun
pyexcel-htmlr
_ html(read only) lxml,html5lib
pyexcel-pdfr
_ pdf(read only) camelot
======================== ======================= =================
Plugin shopping guide
Since 2020, all pyexcel-io plugins have dropped the support for python versions
which are lower than 3.6. If you want to use any of those Python versions, please use pyexcel-io
and its plugins versions that are lower than 0.6.0.
Except csv files, xls, xlsx and ods files are a zip of a folder containing a lot of
xml files
The dedicated readers for excel files can stream read
In order to manage the list of plugins installed, you need to use pip to add or remove
a plugin. When you use virtualenv, you can have different plugins per virtual
environment. In the situation where you have multiple plugins that does the same thing
in your environment, you need to tell pyexcel which plugin to use per function call.
For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr.
You need to append get_array(..., library='pyexcel-odsr').
.. _pyexcel-io: https://github.com/pyexcel/pyexcel-io
.. _pyexcel-xls: https://github.com/pyexcel/pyexcel-xls
.. _pyexcel-xlsx: https://github.com/pyexcel/pyexcel-xlsx
.. _pyexcel-ods: https://github.com/pyexcel/pyexcel-ods
.. _pyexcel-ods3: https://github.com/pyexcel/pyexcel-ods3
.. _pyexcel-odsr: https://github.com/pyexcel/pyexcel-odsr
.. _pyexcel-odsw: https://github.com/pyexcel/pyexcel-odsw
.. _pyexcel-pdfr: https://github.com/pyexcel/pyexcel-pdfr
.. _pyexcel-xlsxw: https://github.com/pyexcel/pyexcel-xlsxw
.. _pyexcel-libxlsxw: https://github.com/pyexcel/pyexcel-libxlsxw
.. _pyexcel-xlsxr: https://github.com/pyexcel/pyexcel-xlsxr
.. _pyexcel-xlsbr: https://github.com/pyexcel/pyexcel-xlsbr
.. _pyexcel-htmlr: https://github.com/pyexcel/pyexcel-htmlr
.. _xlrd: https://github.com/python-excel/xlrd
.. _xlwt: https://github.com/python-excel/xlwt
.. _openpyxl: https://bitbucket.org/openpyxl/openpyxl
.. _XlsxWriter: https://github.com/jmcnamara/XlsxWriter
.. _pyexcel-ezodf: https://github.com/pyexcel/pyexcel-ezodf
.. _odfpy: https://github.com/eea/odfpy
.. _libxlsxwriter: http://libxlsxwriter.github.io/getting_started.html
.. rubric:: Footnotes
.. [#f1] zipped csv file
.. [#f2] zipped tsv file
If you need to manipulate the data, you might do it yourself or use its brother
library pyexcel <https://github.com/pyexcel/pyexcel>
__ .
If you would like to extend it, you may use it to write your own
extension to handle a specific file format.
Installation
You can install pyexcel-io via pip:
.. code-block:: bash
$ pip install pyexcel-io
or clone it and install it:
.. code-block:: bash
$ git clone https://github.com/pyexcel/pyexcel-io.git
$ cd pyexcel-io
$ python setup.py install
Development guide
Development steps for code changes
#. git clone https://github.com/pyexcel/pyexcel-io.git
#. cd pyexcel-io
Upgrade your setup tools and pip. They are needed for development and testing only:
#. pip install --upgrade setuptools pip
Then install relevant development requirements:
#. pip install -r rnd_requirements.txt # if such a file exists
#. pip install -r requirements.txt
#. pip install -r tests/requirements.txt
Once you have finished your changes, please provide test case(s), relevant documentation
and update changelog.yml
.. note::
As to rnd_requirements.txt, usually, it is created when a dependent
library is not released. Once the dependecy is installed
(will be released), the future
version of the dependency in the requirements.txt will be valid.
How to test your contribution
Although nose
and doctest
are both used in code testing, it is adviable that unit tests are put in tests. doctest
is incorporated only to make sure the code examples in documentation remain valid across different development releases.
On Linux/Unix systems, please launch your tests like this::
$ make
On Windows, please issue this command::
> test.bat
Before you commit
Please run::
$ make format
so as to beautify your code otherwise your build may fail your unit test.
License
New BSD License
Change log
0.6.6 - 31.1.2022
updated
#. #112 <https://github.com/pyexcel/pyexcel-io/issues/112>
_: Log Empty Row
Warning instead 'print'
0.6.5 - 08.10.2021
updated
#. #109 <https://github.com/pyexcel/pyexcel-io/issues/109>
_: enable ods3 to
have datetime
0.6.4 - 31.10.2020
updated
#. #102 <https://github.com/pyexcel/pyexcel-io/issues/102>
_: skip columns from
imported excel sheet.
0.6.3 - 12.10.2020
fixed
#. #96 <https://github.com/pyexcel/pyexcel-io/issues/96>
_: regression: unknown
file type shall trigger NoSupportingPluginFound
updated
#. extra dependencies uses 0.6.0 based plugins
0.6.2 - 7.10.2020
updated
#. #94 <https://github.com/pyexcel/pyexcel-io/issues/94>
_: keep backward
compatibility for pyexcel-xls 0.4.1
0.6.1 - 7.10.2020
removed
#. python 3.6 lower versions are no longer supported
updated
#. pyexcel-io plugin interface has been rewritten. PyInstaller user will be
impacted. please read 'Packaging with Pyinstaller' in the documentation.
#. new query set reader plugin. pyexcel<=0.6.4 has used intrusive way of getting
query set source done. it is against the plugin interface.
fixed
#. #74 <https://github.com/pyexcel/pyexcel-io/issues/74>
_: handle zip files
which contain non-UTF-8 encoded files.
added
#. #86 <https://github.com/pyexcel/pyexcel-io/issues/86>
_: allow trailing
options, get_data(...keep_trailing_empty_cells=True).
0.5.20 - 17.7.2019
updated
#. #70 <https://github.com/pyexcel/pyexcel-io/issues/70>
_: when the given file
is a root directory, the error shall read it is not a file
0.5.19 - 14.7.2019
updated
#. pyexcel#185 <https://github.com/pyexcel/pyexcel/issues/185>
_: handle stream
conversion if file type(html) needs string content then bytes to handle
0.5.18 - 12.06.2019
updated
#. #69 <https://github.com/pyexcel/pyexcel-io/issues/69>
_: Force file
type(force_file_type) on write
0.5.17 - 04.04.2019
updated
#. #68 <https://github.com/pyexcel/pyexcel-io/issues/68>
_: Raise IOError when
the data file does not exist
0.5.16 - 19.03.2019
updated
#. #67 <https://github.com/pyexcel/pyexcel-io/issues/67>
_: fix conversion
issue for long type on python 2.7 for ods
0.5.15 - 16.03.2019
updated
#. pyexcel-ods#33 <https://github.com/pyexcel/pyexcel-ods/issues/33>
_: fix
integer comparision error on i586
0.5.14 - 21.02.2019
updated
#. #65 <https://github.com/pyexcel/pyexcel-io/issues/65>
_: add
tests/init.py because python2.7 setup.py test needs it
0.5.13 - 12.02.2019
updated
#. #63 <https://github.com/pyexcel/pyexcel-io/issues/63>
_: Version 0.5.12
prevents xslx and ods plugin from being loaded
0.5.12 - 9.02.2019
updated
#. #60 <https://github.com/pyexcel/pyexcel-io/issues/60>
: include tests in
tar ball
#. #61 <https://github.com/pyexcel/pyexcel-io/issues/61>
: enable python
setup.py test
0.5.11 - 3.12.2018
updated
#. #59 <https://github.com/pyexcel/pyexcel-io/issues/59>
_: Please use
scan_plugins_regex, which lml 0.7 complains about
0.5.10 - 27.11.2018
added
#. #57 <https://github.com/pyexcel/pyexcel-io/issues/57>
_, long type will not
be written in ods. please use string type. And if the integer is equal or
greater than 10 to the power of 16, it will not be written either in ods. In
both situation, IntegerPrecisionLossError will be raised. And this version
enables pyexcel-ods and pyexcel-ods3 to do so.
0.5.9.1 - 30.08.2018
updated
#. #53 <https://github.com/pyexcel/pyexcel-io/issues/53>
_, upgrade lml
dependency to at least 0.0.2
0.5.9 - 23.08.2018
added
#. pyexcel#148 <https://github.com/pyexcel/pyexcel/issues/148>
_, support
force_file_type
0.5.8 - 16.08.2018
added
#. #49 <https://github.com/pyexcel/pyexcel-io/issues/49>
_, support additional
options when detecting float values in csv format. default_float_nan,
ignore_nan_text
0.5.7 - 02.05.2018
fixed
#. #48 <https://github.com/pyexcel/pyexcel-io/issues/48>
, turn off pep 0515
#. #47 <https://github.com/pyexcel/pyexcel-io/issues/47>
, csv reader cannot
handle relative file names
0.5.6 - 11.01.2018
fixed
#. #46 <https://github.com/pyexcel/pyexcel-io/issues/46>
_, expose bulk_save
to developer
0.5.5 - 23.12.2017
fixed
#. Issue #45 <https://github.com/pyexcel/pyexcel-io/issues/45>
_, csv reader
throws exception because google app engine does not support mmap. People who
are not working with google app engine, need not to take this update. Enjoy
your Christmas break.
0.5.4 - 10.11.2017
updated
#. PR #44 <https://github.com/pyexcel/pyexcel-io/pull/44>
_, use unicodewriter
for csvz writers.
0.5.3 - 23.10.2017
updated
#. pyexcel pyexcel#105 <https://github.com/pyexcel/pyexcel/issues/105>
_,
remove gease from setup_requires, introduced by 0.5.2.
#. remove python2.6 test support
0.5.2 - 20.10.2017
added
#. pyexcel#103 <https://github.com/pyexcel/pyexcel/issues/103>
_, include
LICENSE file in MANIFEST.in, meaning LICENSE file will appear in the released
tar ball.
0.5.1 - 02.09.2017
Fixed
#. pyexcel-ods#25 <https://github.com/pyexcel/pyexcel-ods/issues/25>
_,
Unwanted dependency on pyexcel.
0.5.0 - 30.08.2017
Added
#. Collect all data type conversion codes as service.py.
Updated
#. #19 <https://github.com/pyexcel/pyexcel-io/issues/19>
_, use cString by
default. For python, it will be a performance boost
0.4.4 - 08.08.2017
Updated
#. #42 <https://github.com/pyexcel/pyexcel-io/issues/42>
_, raise exception if
database table name does not match the sheet name
0.4.3 - 29.07.2017
Updated
#. #41 <https://github.com/pyexcel/pyexcel-io/issues/41>
_, walk away
gracefully when mmap is not available.
0.4.2 - 05.07.2017
Updated
#. #37 <https://github.com/pyexcel/pyexcel-io/issues/37>
_, permanently fix the
residue folder pyexcel by release all future releases in a clean clone.
0.4.1 - 29.06.2017
Updated
#. #39 <https://github.com/pyexcel/pyexcel-io/issues/39>
_, raise exception
when bulk save in django fails. Please bulk_save=False
if you as the
developer choose to save the records one by one if bulk_save cannot be used.
However, exception in one-by-one save case will be raised as well. This
change is made to raise exception in the first place so that you as the
developer will be suprised when it was deployed in production.
0.4.0 - 19.06.2017
Updated
#. 'built-in' as the value to the parameter 'library' as parameter to invoke
pyexcel-io's built-in csv, tsv, csvz, tsvz, django and sql won't work. It is
renamed to 'pyexcel-io'.
#. built-in csv, tsv, csvz, tsvz, django and sql are lazy loaded.
#. pyexcel-io plugin interface has been updated. v0.3.x plugins won't work.
#. #32 <https://github.com/pyexcel/pyexcel-io/issues/32>
_, csv and csvz file
handle are made sure to be closed. File close mechanism is enfored.
#. iget_data function is introduced to cope with dangling file handle issue.
Removed
#. Removed plugin loading code and lml is used instead
0.3.4 - 18.05.2017
Updated
#. #33 <https://github.com/pyexcel/pyexcel-io/issues/33>
, handle mmap object
differently given as file content. This issue has put in a priority to single
sheet csv over multiple sheets in a single memory stream. The latter format
is pyexcel own creation but is rarely used. In latter case,
multiple_sheet=True should be passed along get_data.
#. #34 <https://github.com/pyexcel/pyexcel-io/issues/34>
, treat mmap object
as a file content.
#. #35 <https://github.com/pyexcel/pyexcel-io/issues/35>
_, encoding parameter
take no effect when given along with file content
#. use ZIP_DEFALTED to really do the compression
0.3.3 - 30.03.2017
Updated
#. #31 <https://github.com/pyexcel/pyexcel-io/issues/31>
_, support pyinstaller
0.3.2 - 26.01.2017
Updated
#. #29 <https://github.com/pyexcel/pyexcel-io/issues/29>
_, change
skip_empty_rows to False by default
0.3.1 - 21.01.2017
Added
#. updated versions of extra packages
Updated
#. #23 <https://github.com/pyexcel/pyexcel-io/issues/23>
_, provide helpful
message when old pyexcel plugin exists
#. restored previously available diagnosis message for missing libraries
0.3.0 - 22.12.2016
Added
#. lazy loading of plugins. for example, pyexcel-xls is not entirely loaded
until xls format is used at its first attempted reading or writing. Since it
is loaded, it will not be loaded in the second io action.
#. pyexcel-xls#11 <https://github.com/pyexcel/pyexcel-xls/issues/11>
_, make
case-insensitive for file type
0.2.6 - 21.12.2016
Updated
#. #24 <https://github.com/pyexcel/pyexcel-io/issues/24>
__, pass on batch_size
0.2.5 - 20.12.2016
Updated
#. #26 <https://github.com/pyexcel/pyexcel-io/issues/26>
__, performance issue
with getting the number of columns.
0.2.4 - 24.11.2016
Updated
#. #23 <https://github.com/pyexcel/pyexcel-io/issues/23>
__, Failed to convert
long integer string in python 2 to its actual value
0.2.3 - 16.09.2016
Added
#. #21 <https://github.com/pyexcel/pyexcel-io/issues/21>
, choose subset from
data base tables for export
#. #22 <https://github.com/pyexcel/pyexcel-io/issues/22>
, custom renderer if
given row_renderer
as parameter.
0.2.2 - 31.08.2016
Added
#. support pagination. two pairs: start_row, row_limit and start_column,
column_limit help you deal with large files.
#. skip_empty_rows=True
was introduced. To include empty rows, put it to
False.
Updated
#. #20 <https://github.com/pyexcel/pyexcel-io/issues/20>
__, pyexcel-io
attempts to parse cell contents of 'infinity' as a float/int, crashes
0.2.1 - 11.07.2016
Added
#. csv format: handle utf-16 encoded csv files. Potentially being able to decode
other formats if correct "encoding" is provided
#. csv format: write utf-16 encoded files. Potentially other encoding is also
supported
#. support stdin as input stream and stdout as output stream
Updated
#. Attention, user of pyexcel-io! No longer io stream validation is performed in
python 3. The guideline is: io.StringIO for csv, tsv only, otherwise BytesIO
for xlsx, xls, ods. You can use RWManager.get_io to produce a correct stream
type for you.
#. #15 <https://github.com/pyexcel/pyexcel-io/issues/15>
__, support foreign
django/sql foreign key
0.2.0 - 01.06.2016
Added
#. autoload of pyexcel-io plugins
#. auto detect datetime
, float
and int
. Detection can be switched off by
auto_detect_datetime
, auto_detect_float
, auto_detect_int
0.1.0 - 17.01.2016
Added
#. yield key word to return generator as content