Status
FLUENTQL IS WORK IN PROGRESS. WHILE IT IS AVAILABLE ON PYPI, USE IT WITH CAUTION. PUBLIC APIS MAY CHANGE UNEXPECTEDLY WHILE FEATURES ARE ADDED AT PACE.
Introduction
FluentQL is a lightweight, fast and intuitive Python SQL query builder. It helps you build cross-dialect queries straight in your code, as well as test and validate your queries.
Using SqlAlchemy and Pandas? FluentQL seamlessly integrates with your existing codebase, providing type checking capabilities out of the box.
Getting Started
Fluentql requires Python 3.5 and up.
pip install fluentql
Quick overview
from fluentql import GenericSQLDialect, Q, Table
table1 = Table("table1")
table2 = Table("table2")
query = (
Q.select(table1.all(), table2["col_x"])
.set_from(table1)
.left_join(table2, lambda q: q.on(table1["id"] == table2["id"]))
.order_by(table1["col_y"])
.fetch(10)
.distinct()
)
compiled_query = query.compile(GenericSQLDialect)
print(compiled_query)
will output
select distinct table1.*, table2.col_x from table1 left join table2 on table1.id = table2.id order by col_y limit 10;
Supported SQL Dialects
There are many varieties of SQL out there, and while FluentQL comes with popular dialects implemented out of the box, you might need additional functionality for your use case. This library was built with extensibility in mind, so you can easily extend an existing Dialect implementation or implement your own. No only that, but you can roll out your custom functions with ease.
FluentQL includes the following dialect implementations:
- Generic SQL - should provide compatibility with a variety of dialects for basic queries
- Microsoft SQL (coming soon)
- PosgreSQL (coming soon)
- Teradata (coming soon)
Usage
Overview
Fluentql comes with a built-in runtime type checking system based on Python's typing module and PEP484 type annotations. However, by default, all types are Any, thus giving users control over how much they want type checking to be used.
If using type checking, errors are thrown as queries are constructed if types don't match, such as when adding a string column to a number column, or when using SUM on a Date column.
The core logical units of fluentql are:
- Column
- Table
- Function
- Query
- Dialect
Column
A column has a name and a type and it is bound to a table. The base class Column should never be instantiated on its own; the correct way to create a column that holds a numeric type is as follows:
from fluentql.base_types import Collection
from fluentql.types import Column, NumberType
class NumberColumn(Collection[NumberType], Column):
pass
Or just use fluentql.types.NumberColumn
.
The Generic
typeCollection
provides the functionality required by the type checking engine and it is simply a subclass of typing.Generic
.
Fluentql comes with a number of concrete column types in fluentql.types
:
- AnyColumn
- BooleanColumn
- DateColumn
- DateTimeColumn
- NumberColumn
- StringColumn
- TimeColumn
Those implementations are based on their base type counterparts found in fluentql.base_types
:
- BooleanType
- DateTime
- DateTimeType
- NumberType
- StringType
- TimeType
Table
Tables are used as targets for Queries and as containers for columns. That does not mean you need to define a schema, as you will see below. Tables can have schemas, but by default they hold any column:
from fluentql import Table
books = Table("books")
In this case, books
is a table that will pass AnyColumn
columns of a given name when asked, like so:
book_id = books["id"]
title = books["title"]
However, tables can have schemas. Schemas can be defined using PEP484 type hints, like so:
from fluentql import Table
from fluentql.types import BooleanColumn, NumberColumn, StringColumn
class Books(Table):
id: NumberColumn
title: StringColumn
is_available: BooleanColumn
def __init__(self, db=None):
super().__init__("books", db)
books = Books()
Then, reference the columns as normal:
title = books["title"]
However, now, this raises a KeyError:
release_date = books["release_date"]
Using SqlAlchemy Models [TODO]
from fluentql import Table
from project.models import MySqlAlchemyModel
table = Table.from_model(MySqlAlchemyModel)
The table name and schema are automatically copied from the model class.
Functions
Functions are the most important units in fluentql. They implement (well, semantically only) various functionalities of SQL and are the main interface between queries and the type checking system.
The base class for all functions is fluentql.F
. The theory is:
- a function takes 0 or more arguments of a given type
- a function returns a typed value
Let's look at a couple of examples. First, let's imagine a function CoolFunction that takes two Boolean columns as arguments and returns a Boolean column:
from fluentql import F
from fluentql.base_types import BooleanType, Collection
class CoolFunction(F):
a: Collection[BooleanType]
b: Collection[BooleanType]
returns: Collection[BooleanType]
Note: the arguments a and b must be provided in order, as you expect the function to be called; however, their name is irrelevant. The only name that matters here is returns, that will always be used as the return type
Remember, there is no functionality associated to CoolFunction, meaning that it's purpose is not to be called with two booleans and return a value after applying some logic - the snippet above is all there is to it. However, let's assume that our hypothetical SQL dialect actually has this function, and we'd call it like so:
select * from my_table where CoolFunction(boolcol1, boolcol2) = True;
Fluentql wants to tell you when the query you're building doesn't really make sense (meaning that it's not type safe).
class Books(Table):
id: NumberColumn
title: StringColumn
is_on_sale: BooleanColumn
is_sold_out: BooleanColumn
books = Books()
So, while this is OK:
CoolFunction(books["is_on_sale"], books["is_sold_out"])
this is not:
CoolFunction(books["is_on_sale"], books["title"]) # raises TypeError
and neither is this:
CoolFunction(books["is_on_sale"]) # raises TypeError
Let's take this one step further and look at a function Add that takes either scalars or columns of type String and Number:
from typing import TypeVar
T = TypeVar("T", NumberType, StringType)
class Add(F):
arg1: Union[T, Collection[T]]
arg2: Union[T, Collection[T]]
returns: Union[T, Collection[T]]
This behaves as you'd expect a TypeVar to behave. You can pass it two Numbers, two NumberColumns, or one of each (same for Strings and StringColumn). However, the return type is quite vague, so let's enhance this.
Note: When a F
is instantiated, all the types are checked and ran through the typing engine. Each instance of F will have a type_checker object which holds details about the types that were matched, including a mapping from all the TypeVars found in the expected arg types list to concrete types that were matched against the given values. Explore fluentql.type_checking.TypeChecker
to get a better feel of it.
Instead of passing a type annotation for returns
, we can create a function. This function must take matched_types
and type_var_mapping
as its only arguments.
matched_types
is a list of types that the given arguments matched against the expected type definition. Effectively, if our arg of type A
is matched against Union[A, B]
, the matched type for it will be A. The reference here is against the expected type, not the given type, so if we instead saw Union[AS, B]
, where AS
is a superclass of A
, the matched type would've been AS
. There's obviously a bit more to it, but it's easy to think of it as a Union unpacker.
type_var_mapping
is a dict, where keys are TypeVars and the value is a tuple of two elements: the first is a list of argument indices, showing which arguments in our list were actually matched against that TypeVar; the second is the concrete type that resolves the TypeVar for that specific function call.
So, back to our example, we can be a bit smarter with our Add function and determine exactly what it will return, like so:
class Add(F):
arg1: Union[T, Collection[T]]
arg2: Union[T, Collection[T]]
def returns(self, matched_types, type_var_mapping):
t_type = type_var_mapping[T][1]
if any(Collection in t.__mro__ for t in matched_types if hasattr(t, "__mro__")):
return Collection[t_type]
return t_type
Now, we can observe:
nc1 = NumberColumn("nc1")
nc2 = NumberColumn("nc2")
a1 = Add(nc1, nc2)
a2 = Add(nc1, 100)
a3 = Add(200, 300)
a4 = Add(nc1, "abc")
Where are functions used? All operators are implemented as functions (Add, Subtract, Multiply, Divide, Modulo, BitwiseAnd etc). As well as that, any SQL functions (such as Max, Min, Count) should be implemented as subclasses of F.
List of all included functions (in fluentql.function
):
- Add
- Subtract
- Multiply
- Divide
- Modulo
- BitwiseOr
- BitwiseAnd
- BitwiseXor
- Equals
- LessThan
- LessThanOrEqual
- GreaterThan
- GreaterThanOrEqual
- NotEqual
- Not
- As (used for aliases)
- TableStar
- Star
- Like
- In
- Max
- Min
- Sum
- Asc
- Desc
Column
objects also come with a comprehensive implementation of the Data Model to facilitate a simpler, more expressive syntax:
col1 = NumberColumn("col1")
col2 = NumberColumn("col2")
col1 + col2
col1 - col2
col1 == col2
col1 < col2
col1.max()
col1.isin(col2)
col1.asc()
and more.
Query
The fluentql.query.Query
or, simpler, fluentql.Q
class is what users will mostly interact with - it is the interface to the query builder itself.
Each instance of Q
has a _command
attribute, which is a value of QueryCommands
. The possible commands are:
- SELECT
- INSERT
- UPDATE
- DELETE
- CREATE
- DROP
- WHERE
- ON
- JOIN
- HAVING
The first half of them are matches for actual SQL statements, while the last ones are synthetic sub-query types that are used as containers in certain circumstances.
There are two main ideas to bear in mind:
- Each core method of
Q
is only set to execute for a particular set of QueryCommands (e.g. where()
will only work for SELECT
and DELETE
statements) - Each statement is initialised through a classmethod which returns an instance of
Q
: Q.select()
returns a Q
with _command = QueryCommands.SELECT
and so on.
SELECT
Basic use
Q.select().set_from(table)
Q.select(table["col1"]).from_(table)
Calling select
with no arguments returns a select *
.
Where
Q.select().set_from(table).where(table["col1"] < table["col2"])
Q.select().set_from(table).where(table["col1"] > table["col2"]).or_where(lambda q: q.where(table["col3"].like("%abc")).and_where(table["col4"] != "XYZ"))
Pass a lambda function to where
, and_where
or or_where
to nest conditions.
Join
Q.select().set_from(table).inner_join(table2, lambda q: q.on(table["col1"] == table2["col3"]))
Q.select().set_from(table).left_join(table2, lambda q: q.on(table["col1"] == table2["col3"]).and_on(table["col1"] % 2 = 0))
Q.select().set_from(table).right_join(table2, lambda q: q.using("id"))
Available join methods:
inner_join
outer_join
left_join
right_join
cross_join
Group By
Q.select(table["col1"]).set_from(table).group_by(table["col1"])
Q.select(table["col2"], table["col3"], table["col1"].max()).set_from(table).group_by(table["col2"], table["col3"])
Having
Q.select(table["col1"]).set_from(table).group_by(table["col1"]).having(table["col1"] < 20).and_having(table["col1"] % 2 == 1)
having
, and_having
, or_having
methods work in a similar way to where
methods, in that they can also take a lambda
as argument to nest conditions.
Order By
Q.select().set_from(table).order_by(table["col1"])
Q.select().set_from(table).order_by(table["col1"].asc())
Q.select().set_from(table).order_by(table["col1"], table["col2"].desc())
Use Column.asc
and Column.desc
to mark ordering in an order_by clause.
Fetch and Skip
Q.select().set_from(table).fetch(100)
Q.select().set_from(table).skip(30)
Q.select().set_from(table).fetch(100).skip(30)
Delete
Delete queries may only have where clauses:
Q.delete().set_from(table)
Q.delete().set_from(table).where(table["col1"] == "val")
Insert
Coming soon
Create
Coming soon
Drop
Coming soon