Frouros is a Python library for drift detection in machine learning systems that provides a combination of classical and more recent algorithms for both concept and data drift detection.
"Everything changes and nothing stands still"
"You could not step twice into the same river"
Heraclitus of Ephesus (535-475 BCE.)
⚡️ Quickstart
🔄 Concept drift
As a quick example, we can use the breast cancer dataset to which concept drift it is induced and show the use of a concept drift detector like DDM (Drift Detection Method). We can see how concept drift affects the performance in terms of accuracy.
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from frouros.detectors.concept_drift import DDM, DDMConfig
from frouros.metrics import PrequentialError
np.random.seed(seed=31)
X, y = load_breast_cancer(return_X_y=True)
(
X_train,
X_test,
y_train,
y_test,
) = train_test_split(X, y, train_size=0.7, random_state=31)
pipeline = Pipeline(
[
("scaler", StandardScaler()),
("model", LogisticRegression()),
]
)
pipeline.fit(X=X_train, y=y_train)
config = DDMConfig(
warning_level=2.0,
drift_level=3.0,
min_num_instances=25,
)
detector = DDM(config=config)
metric = PrequentialError(alpha=1.0)
def stream_test(X_test, y_test, y, metric, detector):
"""Simulate data stream over X_test and y_test. y is the true label."""
drift_flag = False
for i, (X, y) in enumerate(zip(X_test, y_test)):
y_pred = pipeline.predict(X.reshape(1, -1))
error = 1 - (y_pred.item() == y.item())
metric_error = metric(error_value=error)
_ = detector.update(value=error)
status = detector.status
if status["drift"] and not drift_flag:
drift_flag = True
print(f"Concept drift detected at step {i}. Accuracy: {1 - metric_error:.4f}")
if not drift_flag:
print("No concept drift detected")
print(f"Final accuracy: {1 - metric_error:.4f}\n")
stream_test(
X_test=X_test,
y_test=y_test,
y=y,
metric=metric,
detector=detector,
)
drift_size = int(y_test.shape[0] * 0.2)
y_test_drift = y_test[-drift_size:]
modify_idx = np.random.rand(*y_test_drift.shape) <= 0.5
y_test_drift[modify_idx] = (y_test_drift[modify_idx] + 1) % len(np.unique(y_test))
y_test[-drift_size:] = y_test_drift
detector.reset()
metric.reset()
stream_test(
X_test=X_test,
y_test=y_test,
y=y,
metric=metric,
detector=detector,
)
More concept drift examples can be found here.
📊 Data drift
As a quick example, we can use the iris dataset to which data drift is induced and show the use of a data drift detector like Kolmogorov-Smirnov test.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from frouros.detectors.data_drift import KSTest
np.random.seed(seed=31)
X, y = load_iris(return_X_y=True)
(
X_train,
X_test,
y_train,
y_test,
) = train_test_split(X, y, train_size=0.7, random_state=31)
feature_idx = 0
X_test[:, feature_idx] += np.random.normal(
loc=0.0,
scale=3.0,
size=X_test.shape[0],
)
model = DecisionTreeClassifier(random_state=31)
model.fit(X=X_train, y=y_train)
alpha = 0.001
detector = KSTest()
_ = detector.fit(X=X_train[:, feature_idx])
result, _ = detector.compare(X=X_test[:, feature_idx])
if result.p_value <= alpha:
print(f"Data drift detected at feature {feature_idx}")
else:
print(f"No data drift detected at feature {feature_idx}")
More data drift examples can be found here.
🛠 Installation
Frouros can be installed via pip:
pip install frouros
🕵🏻♂️️ Drift detection methods
The currently implemented detectors are listed in the following table.
❗ What is and what is not Frouros?
Unlike other libraries that in addition to provide drift detection algorithms, include other functionalities such as anomaly/outlier detection, adversarial detection, imbalance learning, among others, Frouros has and will ONLY have one purpose: drift detection.
We firmly believe that machine learning related libraries or frameworks should not follow Jack of all trades, master of none principle. Instead, they should be focused on a single task and do it well.
✅ Who is using Frouros?
Frouros is actively being used by the following projects to implement drift
detection in machine learning pipelines:
If you want your project listed here, do not hesitate to send us a pull request.
👍 Contributing
Check out the contribution section.
💬 Citation
If you want to cite Frouros you can use the SoftwareX publication.
@article{CESPEDESSISNIEGA2024101733,
title = {Frouros: An open-source Python library for drift detection in machine learning systems},
journal = {SoftwareX},
volume = {26},
pages = {101733},
year = {2024},
issn = {2352-7110},
doi = {https://doi.org/10.1016/j.softx.2024.101733},
url = {https://www.sciencedirect.com/science/article/pii/S2352711024001043},
author = {Jaime {Céspedes Sisniega} and Álvaro {López García}},
keywords = {Machine learning, Drift detection, Concept drift, Data drift, Python},
abstract = {Frouros is an open-source Python library capable of detecting drift in machine learning systems. It provides a combination of classical and more recent algorithms for drift detection, covering both concept and data drift. We have designed it to be compatible with any machine learning framework and easily adaptable to real-world use cases. The library is developed following best development and continuous integration practices to ensure ease of maintenance and extensibility.}
}
📝 License
Frouros is an open-source software licensed under the BSD-3-Clause license.
🙏 Acknowledgements
Frouros has received funding from the Agencia Estatal de Investigación, Unidad de Excelencia María de Maeztu, ref. MDM-2017-0765.