Concrete ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools built on top of Concrete by Zama.
It simplifies the use of fully homomorphic encryption (FHE) for data scientists so that they can automatically turn machine learning models into their homomorphic equivalents, and use them without knowledge of cryptography.
Concrete ML is designed with ease of use in mind. Data scientists can use models with APIs that are close to the frameworks they already know well, while additional options to those models allow them to run inference or training on encrypted data with FHE. The Concrete ML model classes are similar to those in scikit-learn and it is also possible to convert PyTorch models to FHE.
Main features
Built-in models: Ready-to-use FHE-friendly models with a user interface that is equivalent to their the scikit-learn and XGBoost counterparts
Customs models: Concrete ML supports models that can use quantization-aware training. These are developed by the user using PyTorch or keras/tensorflow and are imported into Concrete ML through ONNX
Learn more about Concrete ML features in the documentation.
Use cases
By leveraging FHE, Concrete ML can unlock a myriad of new use cases for machine learning, such as enabling secure and private data collaboration, protecting sensitive data while still allowing for analysis, and facilitating machine learning on data-sets that are subject to strict data privacy regulations, for instance
Healthcare data analysis: Improve patient care while maintaining privacy by allowing secure, confidential data sharing between healthcare providers.
Financial services: Facilitate secure financial data analysis for risk management and fraud detection, keeping client information encrypted and safe.
Ad campaign tracking: Create targeted advertising and campaign insights in a post-cookie era, ensuring user privacy through encrypted data analysis.
Industries: Enable predictive maintenance in the cloud while keeping sensitive data confidential, enhancing efficiency and data security.
Biometrics: Give the ability to create user authentication applications without having to reveal their identities.
Government: Enable governments to create digitized versions of their services without having to trust cloud providers.
Here is a simple example which is very close to scikit-learn for a logistic regression :
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from concrete.ml.sklearn import LogisticRegression
# Lets create a synthetic data-set
x, y = make_classification(n_samples=100, class_sep=2, n_features=30, random_state=42)
# Split the data-set into a train and test set
X_train, X_test, y_train, y_test = train_test_split(
x, y, test_size=0.2, random_state=42
)
# Now we train in the clear and quantize the weights
model = LogisticRegression(n_bits=8)
model.fit(X_train, y_train)
# We can simulate the predictions in the clear
y_pred_clear = model.predict(X_test)
# We then compile on a representative set
model.compile(X_train)
# Finally we run the inference on encrypted inputs !
y_pred_fhe = model.predict(X_test, fhe="execute")
print("In clear :", y_pred_clear)
print("In FHE :", y_pred_fhe)
print(f"Similarity: {int((y_pred_fhe == y_pred_clear).mean()*100)}%")
# Output:# In clear : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]# In FHE : [0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 0 0]# Similarity: 100%
It is also possible to call encryption, model prediction, and decryption functions separately as follows.
Executing these steps separately is equivalent to calling predict_proba on the model instance.
# Predict probability for a single example
y_proba_fhe = model.predict_proba(X_test[[0]], fhe="execute")
# Quantize an original float input
q_input = model.quantize_input(X_test[[0]])
# Encrypt the input
q_input_enc = model.fhe_circuit.encrypt(q_input)
# Execute the linear product in FHE
q_y_enc = model.fhe_circuit.run(q_input_enc)
# Decrypt the result (integer)
q_y = model.fhe_circuit.decrypt(q_y_enc)
# De-quantize and post-process the result
y0 = model.post_processing(model.dequantize_output(q_y))
print("Probability with `predict_proba`: ", y_proba_fhe)
print("Probability with encrypt/run/decrypt calls: ", y0)
Concrete ML built-in models have APIs that are almost identical to their scikit-learn counterparts. It is also possible to convert PyTorch networks to FHE with the Concrete ML conversion APIs. Please refer to the linear models, tree-based models and neural networks documentation for more examples, showing the scikit-learn-like API of the built-in models.
We want to hear from you! Take 1 minute to share your thoughts and helping us enhance our documentation and libraries. 👉 Click here to participate.
Resources
Demos
Live demos on Hugging Face
Credit card approval: Predicting credit scoring card approval application in which sensitive data can be shared and analyzed without exposing the actual information to neither the three parties involved, nor the server processing it.
Encrypted Large Language Model: converting a user-defined part of a Large Language Model for encrypted text generation. This demo shows the trade-off between quantization and accuracy for text generation and shows how to run the model in FHE.
CIFAR10 FHE-friendly model with Brevitas: training a VGG9 FHE-compatible neural network using Brevitas, and a script to run the neural network in FHE. Execution in FHE takes ~4 minutes per image and shows an accuracy of 88.7%.
CIFAR10 / CIFAR100 FHE-friendly models with Transfer Learning approach: series of three notebooks, that convert a pre-trained FP32 VGG11 neural network into a quantized model using Brevitas. The model is fine-tuned on the CIFAR data-sets, converted for FHE execution with Concrete ML and evaluated using FHE simulation. For CIFAR10 and CIFAR100, respectively, our simulations show an accuracy of 90.2% and 68.2%.
If you have built awesome projects using Concrete ML, please let us know and we will be happy to showcase them here!
To cite Concrete ML in academic papers, please use the following entry:
@Misc{ConcreteML,
title={Concrete {ML}: a Privacy-Preserving Machine Learning Library using Fully Homomorphic Encryption for Data Scientists},
author={Zama},
year={2022},
note={\url{https://github.com/zama-ai/concrete-ml}},
}
This software is distributed under the BSD-3-Clause-Clear license. Read this for more details.
FAQ
Is Zama’s technology free to use?
Zama’s libraries are free to use under the BSD 3-Clause Clear license only for development, research, prototyping, and experimentation purposes. However, for any commercial use of Zama's open source code, companies must purchase Zama’s commercial patent license.
All our work is open source and we strive for full transparency about Zama's IP strategy. To know more about what this means for Zama product users, read about how we monetize our open source products in this blog post.
What do I need to do if I want to use Zama’s technology for commercial purposes?
To commercially use Zama’s technology you need to be granted Zama’s patent license. Please contact us at hello@zama.ai for more information.
Do you file IP on your technology?
Yes, all of Zama’s technologies are patented.
Can you customize a solution for my specific use case?
We are open to collaborating and advancing the FHE space with our partners. If you have specific needs, please email us at hello@zama.ai.
🌟 If you find this project helpful or interesting, please consider giving it a star on GitHub! Your support helps to grow the community and motivates further development.
Concrete ML is an open-source set of tools which aims to simplify the use of fully homomorphic encryption (FHE) for data scientists.
We found that concrete-ml demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.