Python library for GraphBLAS: high-performance sparse linear algebra for scalable graph analytics.
For algorithms, see
graphblas-algorithms
.
Install
Install the latest version of Python-graphblas via conda:
$ conda install -c conda-forge python-graphblas
or pip:
$ pip install python-graphblas[default]
This will also install the SuiteSparse:GraphBLAS compiled C library.
We currently support the GraphBLAS C API 2.0 specification.
Optional Dependencies
The following are not required by python-graphblas, but may be needed for certain functionality to work.
pandas
– required for nicer __repr__
;matplotlib
– required for basic plotting of graphs;scipy
– used in io
module to read/write scipy.sparse
format;networkx
– used in io
module to interface with networkx
graphs;fast-matrix-market
- for faster read/write of Matrix Market files with gb.io.mmread
and gb.io.mmwrite
.
Description
Currently works with SuiteSparse:GraphBLAS, but the goal is to make it work with all implementations of the GraphBLAS spec.
The approach taken with this library is to follow the C-API 2.0 specification as closely as possible while making improvements
allowed with the Python syntax. Because the spec always passes in the output object to be written to, we follow the same,
which is very different from the way Python normally operates. In fact, many who are familiar with other Python data
libraries (numpy, pandas, etc) will find it strange to not create new objects for every call.
At the highest level, the goal is to separate output, mask, and accumulator on the left side of the assignment
operator =
and put the computation on the right side. Unfortunately, that approach doesn't always work very well
with how Python handles assignment, so instead we (ab)use the left-shift <<
notation to give the same flavor of
assignment. This opens up all kinds of nice possibilities.
This is an example of how the mapping works:
GrB_Matrix_mxm(M, mask, GrB_PLUS_INT64, GrB_MIN_PLUS_INT64, A, B, NULL)
M(mask.V, accum=binary.plus) << A.mxm(B, semiring.min_plus)
The expression on the right A.mxm(B)
creates a delayed object which does no computation. Once it is used in the
<<
expression with M
, the whole thing is translated into the equivalent GraphBLAS call.
Delayed objects also have a .new()
method which can be used to force computation and return a new
object. This is convenient and often appropriate, but will create many unnecessary objects if used in a loop. It
also loses the ability to perform accumulation with existing results. For best performance, following the standard
GraphBLAS approach of (1) creating the object outside the loop and (2) using the object repeatedly within each loop
is a much better approach, even if it doesn't feel very Pythonic.
Descriptor flags are set on the appropriate elements to keep logic close to what it affects. Here is the same call
with descriptor bits set. ttcsr
indicates transpose the first and second matrices, complement the structure of the mask,
and do a replacement on the output.
GrB_Matrix_mxm(M, mask, GrB_PLUS_INT64, GrB_MIN_PLUS_INT64, A, B, desc.ttcsr)
M(~mask.S, accum=binary.plus, replace=True) << A.T.mxm(B.T, semiring.min_plus)
The objects receiving the flag operations (A.T, ~mask, etc) are also delayed objects. They hold on to the state but
do no computation, allowing the correct descriptor bits to be set in a single GraphBLAS call.
If no mask or accumulator is used, the call looks like this:
M << A.mxm(B, semiring.min_plus)
The use of <<
to indicate updating is actually just syntactic sugar for a real .update()
method. The above
expression could be written as:
M.update(A.mxm(B, semiring.min_plus))
Operations
M(mask, accum) << A.mxm(B, semiring)
w(mask, accum) << A.mxv(v, semiring)
w(mask, accum) << v.vxm(B, semiring)
M(mask, accum) << A.ewise_add(B, binaryop)
M(mask, accum) << A.ewise_mult(B, binaryop)
M(mask, accum) << A.kronecker(B, binaryop)
M(mask, accum) << A.T
M(mask, accum) << A[rows, cols]
w(mask, accum) << A[rows, col_index]
w(mask, accum) << A[row_index, cols]
s = A[row_index, col_index].value
Assign
M(mask, accum)[rows, cols] << A
M(mask, accum)[rows, col_index] << v
M(mask, accum)[row_index, cols] << v
M(mask, accum)[rows, cols] << s
M[row_index, col_index] << s
del M[row_index, col_index]
Apply
M(mask, accum) << A.apply(unaryop)
M(mask, accum) << A.apply(binaryop, left=s)
M(mask, accum) << A.apply(binaryop, right=s)
Reduce
v(mask, accum) << A.reduce_rowwise(op)
v(mask, accum) << A.reduce_columnwise(op)
s(accum) << A.reduce_scalar(op)
s(accum) << v.reduce(op)
Creating new Vectors / Matrices
A = Matrix.new(dtype, num_rows, num_cols)
B = A.dup()
A = Matrix.from_coo([row_indices], [col_indices], [values])
New from delayed
Delayed objects can be used to create a new object using .new()
method
C = A.mxm(B, semiring).new()
Properties
size = v.size
nrows = M.nrows
ncols = M.ncols
nvals = M.nvals
rindices, cindices, vals = M.to_coo()
Initialization
There is a mechanism to initialize graphblas
with a context prior to use. This allows for setting the backend to
use as well as the blocking/non-blocking mode. If the context is not initialized, a default initialization will
be performed automatically.
import graphblas as gb
gb.init("suitesparse", blocking=True)
from graphblas import binary, semiring
from graphblas import Matrix, Vector, Scalar
Performant User Defined Functions
Python-graphblas requires numba
which enables compiling user-defined Python functions to native C for use in GraphBLAS.
Example customized UnaryOp:
from graphblas import unary
def force_odd_func(x):
if x % 2 == 0:
return x + 1
return x
unary.register_new("force_odd", force_odd_func)
v = Vector.from_coo([0, 1, 3], [1, 2, 3])
w = v.apply(unary.force_odd).new()
w
Similar methods exist for BinaryOp, Monoid, and Semiring.
Relation to other network analysis libraries
Python-graphblas aims to provide an efficient and consistent expression
of graph operations using linear algebra. This allows the development of
high-performance implementations of existing and new graph algorithms
(also see graphblas-algorithms
).
While end-to-end analysis can be done using python-graphblas
, users
might find that other libraries in the Python ecosystem provide a more
convenient high-level interface for data pre-processing and transformation
(e.g. pandas
, scipy.sparse
), visualization (e.g. networkx
, igraph
),
interactive exploration and analysis (e.g. networkx
, igraph
) or for
algorithms that are not (yet) implemented in graphblas-algorithms
(e.g.
networkx
, igraph
, scipy.sparse.csgraph
). To facilitate communication with
other libraries, graphblas.io
contains multiple connectors, see the
following section.
Import/Export connectors to the Python ecosystem
graphblas.io
contains functions for converting to and from:
import graphblas as gb
A = gb.io.from_scipy_sparse(m)
m = gb.io.to_scipy_sparse(m, format="csr")
A = gb.io.from_networkx(g)
g = gb.io.to_networkx(A)
v = gb.Vector.from_dense(m)
m = v.to_dense()
A = gb.Matrix.from_dense(m, missing_value=0)
m = A.to_dense(fill_value=0)