SQLDF - Structured Query Language (SQL) on DataFrames (DF)
A simple wrapper to run SQL (SQLite) queries on pandas.DataFrame objects (Python).
Requirements
- 'python' >= 3.5
- 'pandas' >= 1.0
Installation
With pip
(from PyPI repository):
pip install sqldf
Examples of use
- SELECT query with WHERE condition
import pandas as pd
import numpy as np
import sqldf
df = pd.DataFrame({'col1': ['A', 'B', np.NaN, 'C', 'D'], 'col2': ['F', np.NaN, 'G', 'H', 'I']})
query = """
SELECT *
FROM df
WHERE col_1 IS NOT NULL;
"""
df_view = sqldf.run(query)
- UPDATE query that change inplace a pd.Dataframe
import pandas as pd
import sqldf
url = ('https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv')
tips = pd.read_csv(url)
query = """
UPDATE tips
SET tip = tip*2
WHERE tip < 2;
"""
sqldf.run(query)
How it works
- It create a virtual in-memory SQLite3 database at runtime
- It convert the pd.DataFrame input(s) to SQL table(s)
- It proceed the SQL query on the table(s)
- It convert back the SQL table(s) to updated pd.DataFrame(s) if required
- It returns the result of the query if required