get-reader
This module provides a get_reader()
function that returns reader
objects similar to those returned by csv.reader()
. This package:
-
reduces common boilerplate code for handling files and reading
records
-
reads data from CSV, pandas, SQL connections, MS Excel, DBF, and squint
-
provides a single interface across Python versions (including
seamless Unicode-aware CSV support for Python 2)
-
is easy to incorporate into your own projects:
- has no hard dependencies
- runs on Python 2.6, 2.7, 3.2 through 3.8, PyPy, PyPy3, and Jython
- is freely available under the Apache License, version 2
- can be easily vendored directly into your codebase if you don't
want to include it as a dependency
Open a UTF-8 encoded CSV:
from get_reader import get_reader
reader = get_reader('myfile.csv')
for row in reader:
print(', '.join(row))
In the above example, file handling is managed automatically by the
reader object. The file is automatically closed when the iterator is
exhausted or when the object is deleted. It also handles Unicode in
Python 2 without changes.
Open a Latin-1 (ISO-8859-1) encoded CSV file:
reader = get_reader('myfile.csv', encoding='latin-1')
for row in reader:
print(', '.join(row))
Use the reader as a context manager:
with get_reader('myfile.csv') as reader:
for row in reader:
print(', '.join(row))
In this example, reader
automatically closes its internal file object
when exiting the with
block even if the for-loop doesn't finish
exhausting the reader
.
Access other data sources:
df = pd.DataFrame([...])
reader = get_reader(df)
connection = ...
reader = get_reader(connection, 'SELECT col1, col2 FROM mytable;')
reader = get_reader('myfile.xlsx')
reader = get_reader('myfile.dbf')
select = ...
reader = get_reader(select({'col1': 'col2'}).sum())
Call constructors directly to override auto-detect behavior:
reader = get_reader.from_csv('myfile.dat', delimiter='\t')
Install
The get_reader
module has no hard dependencies; is tested on
Python 2.6, 2.7, 3.2 through 3.8, PyPy, PyPy3, and Jython; and
is freely available under the Apache License, version 2.
You can install get_reader
using pip
:
pip install get_reader
To install optional support for MS Excel and DBF files (dBase,
Foxpro, etc.), use the following:
pip install get_reader[excel,dbf]
Python 2 Support Statement
While official support for Python 2 ends on January 1, 2020, this
project will continue to support older versions as long as the
existing ecosystem provides the ability to run automated tests
on those older versions.
Reference
get_reader(obj, *args, **kwds)
Return a Reader
object which will iterate over records in
the given obj—like a csv.reader()
. The given obj may
be one of the following:
- CSV file (string path or file object)
- iterable of dictionary rows
- database connection (should be DBAPI2 compatible)
- pandas DataFrame, Series, Index, or MultiIndex
- squint Select, Query, or Result
If optional extras are installed, obj may also be:
- MS Excel file path
- DBF file path
When obj is a file path, the Reader
contains a file object
that is handled internally. When given a file-like obj (rather
than a path), users are responsible for properly closing this
file themselves.
The given obj is checked against supported types and
automatically passed to the appropriate constructor if a match is
found. If obj is a string, it is treated as a file path whose
extension determines its content type. Any *args and **kwds
are passed along to the matching constructor:
from get_reader import get_reader
reader = get_reader('myfile.csv')
connection = ...
reader = get_reader(connection, 'SELECT col1, col2 FROM mytable;')
df = pd.DataFrame([...])
reader = get_reader(df)
reader = get_reader('myfile.xlsx', worksheet='Sheet2')
If the obj type cannot be determined automatically, users can
call the constructor methods directly.
Constructor Methods
get_reader.from_csv(csvfile, encoding='utf-8', dialect='excel', **kwds)
Return a reader object which will iterate over lines in the
given csvfile. The csvfile can be a string (treated as a
file path) or any object which supports the iterator protocol
and returns a string each time its __next__()
method is
called—file objects and list objects are both suitable. If
csvfile is a file object, it should be opened with newline=''
.
from get_reader import get_reader
reader = get_reader.from_csv('myfile.tab', delimiter='\t')
Using explicit file handling:
from get_reader import get_reader
with open('myfile.csv') as csvfile:
reader = get_reader.from_csv(fh)
get_reader.from_dicts(records, fieldnames=None)
Return a reader object which will iterate over the given
dictionary records. This can be thought of as converting a
csv.DictReader()
into a plain, non-dictionary csv.reader()
.
from get_reader import get_reader
dictrows = [
{'A': 1, 'B': 'x'},
{'A': 2, 'B': 'y'},
]
reader = get_reader.from_dicts(dictrows)
This method assumes that record contents are consistent. If the first
record is a dictionary, it is assumed that all following records will
be dictionaries with matching keys.
get_reader.from_sql(connection, table_or_query)
Return a reader object which will iterate over the records
from a given database table or over the records returned from
a SQL query. The connection should be a DBAPI2 compatible
database connection and table_or_query must be a string
with a table name or a SQL query.
Read records from a specified table:
from get_reader import get_reader
connection = ...
reader = get_reader.from_sql(connection, 'mytable')
Read records from the results of a SQL query:
reader = get_reader.from_sql(connection, 'SELECT col1, col2 FROM mytable;')
get_reader.from_excel(path, worksheet=0)
Return a reader object which will iterate over lines in the given
Excel worksheet. The path must specify an XLSX or XLS file and
worksheet must specify the index or name of the worksheet to
load (defaults to the first worksheet).
Load first worksheet:
from get_reader import get_reader
reader = get_reader.from_excel('mydata.xlsx')
Specific worksheets can be loaded by name (a string) or index
(an integer):
reader = get_reader.from_excel('mydata.xlsx', 'Sheet 2')
get_reader.from_pandas(obj, index=True)
Return a reader object which will iterate over records in
a pandas DataFrame
, Series
, Index
or MultiIndex
.
import pandas as pd
from get_reader import get_reader
df = pd.DataFrame(...)
reader = get_reader.from_pandas(df)
get_reader.from_dbf(filename, encoding=None, **kwds)
Return a reader object which will iterate over lines in the given
DBF file (from dBase, FoxPro, etc.).
from get_reader import get_reader
reader = get_reader.from_dbf('myfile.dbf')
get_reader.from_squint(obj, fieldnames=None)
Return a reader object which will iterate over the records returned
from a squint Select
, Query
, or Result
. If the fieldnames
argument is not provided, this function tries to construct names
using the values from the underlying object.
import squint
from get_reader import get_reader
select = squint.Select(...)
reader = get_reader.from_squint(select)
class Reader(iterable, closefunc=<no value>)
An iterator which will produce rows from the given iterable. The
given iterable should produce non-string sequences. An optional
closefunc may be provided to close associated resources (files,
database cursors, etc.) once the reader is no longer needed—it will
be automatically called when:
- the iterable is exhausted
- exiting a
with
statement (if used as a context manager) - the Reader is garbage collected
Reader.close()
Closes any associated resources (calls closefunc early):
from get_reader import Reader
reader = Reader(..., closefunc=...)
reader.close()
If the resources have already been closed, this method passes
without error.
class ReaderLike()
An abstract class that can be used for type checking. Objects
will test as ReaderLike
if they are one of the following:
- instance of the
Reader
class - object returned by
csv.reader()
- non-exhaustible iterable that produces non-string sequences
See the following examples:
>>> isinstance(get_reader(csvfile), ReaderLike)
True
>>> isinstance(csv.reader(csvfile), ReaderLike)
True
>>> list_of_lists = [['col1', 'col2'], ['a', 'b']]
>>> isinstance(list_of_lists, ReaderLike)
True
>>> list_of_strings = ['col1,col2', 'a,b']
>>> isinstance(list_of_strings, ReaderLike)
False
>>> list_of_sets = [{'col1', 'col2'}, {'a', 'b'}]
>>> isinstance(list_of_sets, ReaderLike)
False
Freely licensed under the Apache License, Version 2.0
(C) Copyright 2018 – 2019 Shawn Brown.