Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

formulaic

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

formulaic

An implementation of Wilkinson formulas.

  • 1.0.2
  • PyPI
  • Socket score

Maintainers
1

Formulaic

PyPI - Version PyPI - Python Version PyPI - Status build docs codecov Code Style

Formulaic is a high-performance implementation of Wilkinson formulas for Python.

It provides:

  • high-performance dataframe to model-matrix conversions.
  • support for reusing the encoding choices made during conversion of one data-set on other datasets.
  • extensible formula parsing.
  • extensible data input/output plugins, with implementations for:
    • input:
      • pandas.DataFrame
      • pyarrow.Table
    • output:
      • pandas.DataFrame
      • numpy.ndarray
      • scipy.sparse.CSCMatrix
  • support for symbolic differentiation of formulas (and hence model matrices).
  • and much more.

Example code

import pandas
from formulaic import Formula

df = pandas.DataFrame({
    'y': [0, 1, 2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

y, X = Formula('y ~ x + z').get_model_matrix(df)

y =

y
00
11
22

X =

Interceptx[T.B]x[T.C]z
01.0000.3
11.0100.1
21.0010.2

Note that the above can be short-handed to:

from formulaic import model_matrix
model_matrix('y ~ x + z', df)

Benchmarks

Formulaic typically outperforms R for both dense and sparse model matrices, and vastly outperforms patsy (the existing implementation for Python) for dense matrices (patsy does not support sparse model matrix output).

Benchmarks

For more details, see here.

  • Patsy: a prior implementation of Wilkinson formulas for Python, which is widely used (e.g. in statsmodels). It has fantastic documentation (which helped bootstrap this project), and a rich array of features.
  • StatsModels.jl @formula: The implementation of Wilkinson formulas for Julia.
  • R Formulas: The implementation of Wilkinson formulas for R, which is thoroughly introduced here. [R itself is an implementation of S, in which formulas were first made popular].
  • The work that started it all: Wilkinson, G. N., and C. E. Rogers. Symbolic description of factorial models for analysis of variance. J. Royal Statistics Society 22, pp. 392–399, 1973.

Used by

Below are some of the projects that use Formulaic:

  • Glum (High performance Python GLM's with all the features).
  • Lifelines (Survival analysis in Python).
  • Linearmodels (Additional linear models including instrumental variable and panel data models that are missing from statsmodels).
  • Pyfixest (Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax).
  • Tabmat (Efficient matrix representations for working with tabular data).
  • Add your project here!

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc