Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
.. role:: python(code) :language: python
.. role:: bash(code) :language: bash
|docs| |pypi| |circleci| |mit|
Lore is a python framework to make machine learning approachable for Engineers and maintainable for Data Scientists.
Keras <https://keras.io/>
_ (TensorFlow/Theano/CNTK), XGBoost <https://xgboost.readthedocs.io/>
_ and SciKit Learn <http://scikit-learn.org/stable/>
_. They can all be subclassed with build, fit or predict overridden to completely customize your algorithm and architecture, while still benefiting from everything else.|model|
This example demonstrates nested transformers and how to use lore.io with a postgres database users
table that has feature first_name
and response has_subscription
columns. If you don't want to create the database, you can follow a database free example app on medium_.
.. code-block:: bash
$ pip install lore $ lore init my_app --python-version=3.6.4 --keras --xgboost --postgres
We'll naively try to predict whether users are subscribers, given their first name.
Update config/database.cfg to specify your database url:
.. code-block:: ini
[MAIN] url: $DATABASE_URL
you can set environment variable for only the lore process with the .env file:
.. code-block:: bash
DATABASE_URL=postgres://localhost:5432/development
Create a sql file that specifies your data:
.. code-block:: sql
-- my_app/extracts/subscribers.sql SELECT first_name, has_subscription FROM users LIMIT = %(limit)s
Pipelines are the unsexy, but essential component of most machine learning applications. They transform raw data into encoded training (and prediction) data for a model. Lore has several features to make data munging more palatable.
.. code-block:: python
import lore.io import lore.pipelines.holdout from lore.encoders import Norm, Discrete, Boolean, Unique from lore.transformers import NameAge, NameSex, Log
class Holdout(lore.pipelines.holdout.Base):
def get_data(self):
# lore.io.main is a Connection created by config/database.cfg + DATABASE_URL
# dataframe() supports keyword args for interpolation (limit)
# subscribers is the name of the extract
# cache=True enables LRU query caching
return lore.io.main.dataframe(filename='subscribers', limit=100, cache=True)
def get_encoders(self):
# An arbitrairily chosen set of encoders (w/ transformers)
# that reference sql columns in the extract by name.
# A fair bit of thought will probably go into expanding
# your list with features for your model.
return (
Unique('first_name', minimum_occurrences=100),
Norm(Log(NameAge('first_name'))),
Discrete(NameSex('first_name'), bins=10),
)
def get_output_encoder(self):
# A single encoder that references the predicted outcome
return Boolean('has_subscription')
The superclass :python:lore.pipelines.base.Holdout
will take care of:
Define some models that will fit and predict the data. Base models are designed to be extended and overridden, but work with defaults out of the box.
.. code-block:: python
import lore.models.keras import lore.models.xgboost import lore.estimators.keras import lore.estimators.xgboost
from my_app.pipelines.subscribers import Holdout
class DeepName(lore.models.keras.Base): def init(self): super(DeepName, self).init( pipeline=Holdout(), estimator=lore.estimators.keras.BinaryClassifier() # a canned estimator for deep learning )
class BoostedName(lore.models.xgboost.Base): def init(self): super(BoostedName, self).init( pipeline=Holdout(), estimator=lore.estimators.xgboost.Base() # a canned estimator for XGBoost )
Test the models predictive power:
.. code-block:: python
import unittest
from my_app.models.subscribers import DeepName, BoostedName
class TestSubscribers(unittest.TestCase): def test_deep_name(self): model = DeepName() # initialize a new model model.fit(epochs=20) # fit to the pipeline's training_data predictions = model.predict(model.pipeline.test_data) # predict the holdout self.assertEqual(list(predictions), list(model.pipeline.encoded_test_data.y)) # hah!
def test_xgboosted_name(self):
model = BoostedName()
model.fit()
predictions = model.predict(model.pipeline.test_data)
self.assertEqual(list(predictions), list(model.pipeline.encoded_test_data.y)) # hah hah hah!
Run tests:
.. code-block:: bash
$ lore test
Experiment and tune :bash:notebooks/
with :bash:$ lore notebook
using the app kernel
.. code-block::
├── .env.template <- Template for environment variables for developers (mirrors production) ├── README.md <- The top-level README for developers using this project. ├── requirements.txt <- keeps dev and production in sync (pip) ├── runtime.txt <- keeps dev and production in sync (pyenv) │ ├── data/ <- query cache and other temp data │ ├── docs/ <- generated from src │ ├── logs/ <- log files per environment │ ├── models/ <- local model store from fittings │ ├── notebooks/ <- explorations of data and models │ └── my_exploration/ │ └── exploration_1.ipynb │ ├── appname/ <- python module for appname │ ├── init.py <- loads the various components (makes this a module) │ │ │ ├── api/ <- external entry points to runtime models │ │ └── my_project.py <- hub endpoint for predictions │ │ │ ├── extracts/ <- sql │ │ └── my_project.sql │ │ │ ├── estimators/ <- Code that make predictions │ │ └── my_project.py <- Keras/XGBoost implementations │ │ │ ├── models/ <- Combine estimator(s) w/ pipeline(s) │ │ └── my_project.py │ │ │ └── pipelines/ <- abstractions for processing data │ └── my_project.py <- train/test/split data encoding │ └── tests/ ├── data/ <- cached queries for fixture data ├── models/ <- model store for test runs └── unit/ <- unit tests
Lore provides python modules to standardize Machine Learning techniques across multiple libraries.
Keras <https://keras.io/>
, XGBoost <https://xgboost.readthedocs.io/>
, SciKit Learn <http://scikit-learn.org/stable/>
__. They come with reasonable defaults for rough draft training out of the box.Use your favorite library in a lore project, just like you'd use them in any other python project. They'll play nicely together.
Keras <https://keras.io/>
_ (TensorFlow/Theano/CNTK) + Tensorboard <https://www.tensorflow.org/programmers_guide/summaries_and_tensorboard>
__XGBoost <https://xgboost.readthedocs.io/>
__SciKit-Learn <http://scikit-learn.org/stable/>
__Jupyter Notebook <http://jupyter.org/>
__Pandas <https://pandas.pydata.org/>
__Numpy <http://www.numpy.org/>
__Matplotlib <https://matplotlib.org/>
, ggplot <http://ggplot.yhathq.com/>
, plotnine <http://plotnine.readthedocs.io/en/stable/>
__SQLAlchemy <https://www.sqlalchemy.org/>
, Psycopg2 <http://initd.org/psycopg/docs/>
There are many ways to manage python dependencies in development and production <http://docs.python-guide.org/en/latest/starting/installation/>
_, and each has it's own pitfalls. Lore codifies a solution that “just works” with lore install, which exactly replicates what will be run in production.
Python compatibility
Heroku_ buildpack compatibility CircleCI_, Domino_)
Environment Specific Configuration
logging.getLogger(__name__)
is setup appropriately to console, file and/or syslog depending on environmentMultiple concurrent project compatibility
Binary library installation for MAXIMUM SPEED
IO
lore.io.connection.Connection.select()
and :python:Connection.dataframe()
can be automatically LRU cached to diskConnection
supports python %(name)s variable replacement in SQLConnection
statements are always annotated with metadata for pgHeroConnection
is lazy, for fast startup, and avoids bootup errors in development with low connectivityConnection
supports multiple concurrent database connectionsSerialization
Caching
Encoders
Transformers
Base Models
Fitting
Keras/Tensorflow
Utils
lore.util.timer
context manager writes to the log in development or librato in production*lore.util.timed
is a decorator for recording function execution wall time.. code-block:: bash
$ lore server # start an api process $ lore console # launch a console in your virtual env $ lore notebook # launch jupyter notebook in your virtual env $ lore fit MODEL # train the model $ lore generate [scaffold, model, estimator, pipeline, notebook, test] NAME $ lore init [project] # create file structure $ lore install # setup dependencies in virtualenv $ lore test # make sure the project is in working order $ lore pip # launch pip in your virtual env $ lore python # launch python in your virtual env
.. |docs| image:: https://readthedocs.org/projects/lore-machine-learning/badge/?version=latest :alt: Documentation Status :scale: 100% :target: http://lore-machine-learning.readthedocs.io/en/latest/?badge=latest .. |pypi| image:: https://badge.fury.io/py/lore.svg :alt: Pip Package Status :scale: 100% :target: https://pypi.python.org/pypi/lore .. |circleci| image:: https://circleci.com/gh/instacart/lore.png?style=shield&circle-token=54008e55ae13a0fa354203d13e7874c5efcb19a2 :alt: Build Status :scale: 100% :target: https://circleci.com/gh/instacart/lore .. |mit| image:: https://img.shields.io/badge/License-MIT-blue.svg :alt: MIT License :scale: 100% :target: https://opensource.org/licenses/MIT .. |model| image:: https://raw.githubusercontent.com/instacart/lore/master/docs/images/model.png :alt: Anatomy of a lore model throughout its lifecycle :scale: 100% :target: http://lore-machine-learning.readthedocs.io/en/latest/
.. _Heroku: https://heroku.com/ .. _CircleCI: https://circleci.com/ .. _Domino: https://www.dominodatalab.com/ .. _loggly: https://www.loggly.com/ .. _librato: https://www.librato.com/ .. _rollbar: https://rollbar.com/ .. _medium: https://tech.instacart.com/how-to-build-a-deep-learning-model-in-15-minutes-a3684c6f71e
FAQs
a framework for building and using data science
We found that lore demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 6 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.