
Product
Socket for Jira Is Now Available
Socket for Jira lets teams turn alerts into Jira tickets with manual creation, automated ticketing rules, and two-way sync.
baal
Advanced tools
Baal is an active learning library that supports both industrial applications and research usecases.
Read the documentation at https://baal.readthedocs.io.
Our paper can be read on arXiv. It includes tips and tricks to make active learning usable in production.
For a quick introduction to Baal and Bayesian active learning, please see these links:
Baal was initially developed at ElementAI (acquired by ServiceNow in 2021), but is now independant.
Baal requires Python>=3.10.
To install Baal using pip: pip install baal
We use Poetry as our package manager.
To install Baal from source: poetry install
Active learning is a special case of machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points (to understand the concept in more depth, refer to our tutorial).
At the moment Baal supports the following methods to perform active learning.
If you want to propose new methods, please submit an issue.
The Monte-Carlo Dropout method is a known approximation for Bayesian neural networks. In this method, the Dropout layer is used both in training and test time. By running the model multiple times whilst randomly dropping weights, we calculate the uncertainty of the prediction using one of the uncertainty measurements in heuristics.py.
The framework consists of four main parts, as demonstrated in the flowchart below:
To get started, wrap your dataset in our ActiveLearningDataset class. This will ensure
that the dataset is split into
training and pool sets. The pool set represents the portion of the training set which is yet to be labelled.
We provide a lightweight object ModelWrapper similar to keras.Model to make it easier to
train and test the model. If your model is not ready for active learning, we provide Modules to prepare them.
For example, the MCDropoutModule wrapper changes the existing dropout layer to be used
in both training and inference time and the ModelWrapper makes the specifies the number of iterations to run at
training and inference.
Finally, ActiveLearningLoop automatically computes the uncertainty and label the most uncertain items in the pool.
In conclusion, your script should be similar to this:
dataset = ActiveLearningDataset(your_dataset)
dataset.label_randomly(INITIAL_POOL) # label some data
model = MCDropoutModule(your_model)
wrapper = ModelWrapper(model, args=TrainingArgs(...))
experiment = ActiveLearningExperiment(
trainer=wrapper, # Huggingface or ModelWrapper to train
al_dataset=dataset, # Active learning dataset
eval_dataset=test_dataset, # Evaluation Dataset
heuristic=BALD(), # Uncertainty heuristic to use
query_size=100, # How many items to label per round.
iterations=20, # How many MC sampling to perform per item.
pool_size=None, # Optionally limit the size of the unlabelled pool.
criterion=None # Stopping criterion for the experiment.
)
# The experiment will run until all items are labelled.
metrics = experiment.start()
For a complete experiment, see experiments/vgg_mcdropout_cifar10.py .
docker build [--target base_baal] -t baal .
docker run --rm baal --gpus all python3 experiments/vgg_mcdropout_cifar10.py
Simply clone the repo, and create your own experiment script similar to the example at experiments/vgg_mcdropout_cifar10.py. Make sure to use the four main parts of Baal framework. Happy running experiments
To contribute, see CONTRIBUTING.md.
"There is passion, yet peace; serenity, yet emotion; chaos, yet order."
The Baal team tests and implements the most recent papers on uncertainty estimation and active learning.
Current maintainers:
If you used Baal in one of your project, we would greatly appreciate if you cite this library using this Bibtex:
@misc{atighehchian2019baal,
title={Baal, a bayesian active learning library},
author={Atighehchian, Parmida and Branchaud-Charron, Frederic and Freyberg, Jan and Pardinas, Rafael and Schell, Lorne
and Pearse, George},
year={2022},
howpublished={\url{https://github.com/baal-org/baal/}},
}
To get information on licence of this API please read LICENCE
FAQs
Library to enable Bayesian active learning in your research or labeling work.
We found that baal demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Product
Socket for Jira lets teams turn alerts into Jira tickets with manual creation, automated ticketing rules, and two-way sync.

Company News
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.

Security News
NIST will stop enriching most CVEs under a new risk-based model, narrowing the NVD's scope as vulnerability submissions continue to surge.