
Company News
Socket Named Top Sales Organization by RepVue
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.
pymfe
Advanced tools
The pymfe (python meta-feature extractor) provides a comprehensive set of meta-features implemented in python. The package brings cutting edge meta-features, following recent literature propose. The pymfe architecture was thought to systematically make the extraction, which can produce a robust set of meta-features. Moreover, pymfe follows recent meta-feature formalization aiming to make MtL reproducible.
Here, you can use different measures and summary functions, setting their hyperparameters, and also measuring automatically the elapsed time. Moreover, you can extract meta-features from specific models, or even extract meta-features with confidence intervals using bootstrap. There are a lot of other interesting features and you can see more about it looking at the documentation.
In the Meta-learning (MtL) literature, meta-features are measures used to characterize data sets and/or their relations with algorithm bias.
"Meta-learning is the study of principled methods that exploit meta-knowledge to obtain efficient models and solutions by adapting the machine learning and data mining process." - (Brazdil et al. (2008))
Meta-features are used in MtL and AutoML tasks in general, to represent/understand a dataset, to understanding a learning bias, to create machine learning (or data mining) recommendations systems, and to create surrogates models, to name a few.
Pinto et al. (2016) and Rivolli et al. (2018) defined a meta-feature as follows. Let $D \in \mathcal{D}$ be a dataset, $m\colon \mathcal{D} \to \mathbb{R}^{k'}$ be a characterization measure, and $\sigma\colon \mathbb{R}^{k'} \to \mathbb{R}^{k}$ be a summarization function. Both $m$ and $\sigma$ have also hyperparameters associated, $h_m$ and $h_\sigma$ respectively. Thus, a meta-feature $f\colon \mathcal{D} \to \mathbb{R}^{k}$ for a given dataset $D$ is
$$ f\big(D\big) = \sigma\big(m(D,h_m), h_\sigma\big). $$
The measure $m$ can extract more than one value from each data set, i.e., $k'$ can vary according to $D$, which can be mapped to a vector of fixed length $k$ using a summarization function $\sigma$.
In this package, We provided the following meta-features groups:
In the pymfe package, you can use different measures and summary functions, setting their hyperparameters, and automatically measure the elapsed time. Moreover, you can extract meta-features from specific models, or even obtain meta-features with confidence intervals using bootstrap. There are many other exciting features. You can see more about it looking at the documentation.
The main pymfe requirement is:
The installation process is similar to other packages available on pip:
pip install -U pymfe
It is possible to install the development version using:
pip install -U git+https://github.com/ealcobaca/pymfe
or
git clone https://github.com/ealcobaca/pymfe.git
cd pymfe
python setup.py install
The simplest way to extract meta-features is by instantiating the MFE class.
It computes five meta-features groups by default using mean and standard
deviation as summary functions: General, Statistical, Information-theoretic,
Model-based, and Landmarking. The fit method can be called by passing the X
and y. Then the extract method is used to extract the related measures.
A simple example using pymfe for supervised tasks is given next:
# Load a dataset
from sklearn.datasets import load_iris
from pymfe.mfe import MFE
data = load_iris()
y = data.target
X = data.data
# Extract default measures
mfe = MFE()
mfe.fit(X, y)
ft = mfe.extract()
print(ft)
# Extract general, statistical and information-theoretic measures
mfe = MFE(groups=["general", "statistical", "info-theory"])
mfe.fit(X, y)
ft = mfe.extract()
print(ft)
# Extract all available measures
mfe = MFE(groups="all")
mfe.fit(X, y)
ft = mfe.extract()
print(ft)
You can simply omit the target attribute for unsupervised tasks while fitting
the data into the MFE model. The pymfe package automatically finds and
extracts only the metafeatures suitable for this type of task. Examples are
given next:
# Load a dataset
from sklearn.datasets import load_iris
from pymfe.mfe import MFE
data = load_iris()
y = data.target
X = data.data
# Extract default unsupervised measures
mfe = MFE()
mfe.fit(X)
ft = mfe.extract()
print(ft)
# Extract all available unsupervised measures
mfe = MFE(groups="all")
mfe.fit(X)
ft = mfe.extract()
print(ft)
Several measures return more than one value. To aggregate the returned values,
summarization function can be used. This method can compute min, max,
mean, median, kurtosis, standard deviation, among others. The default
methods are the mean and the sd. Next, it is possible to see an example of
the use of this method:
## Extract default measures using min, median and max
mfe = MFE(summary=["min", "median", "max"])
mfe.fit(X, y)
ft = mfe.extract()
print(ft)
## Extract default measures using quantile
mfe = MFE(summary=["quantiles"])
mfe.fit(X, y)
ft = mfe.extract()
print(ft)
You can easily list all available metafeature groups, metafeatures, summary methods and metafeatures filtered by groups of interest:
from pymfe.mfe import MFE
# Check all available meta-feature groups in the package
print(MFE.valid_groups())
# Check all available meta-features in the package
print(MFE.valid_metafeatures())
# Check available meta-features filtering by groups of interest
print(MFE.valid_metafeatures(groups=["general", "statistical", "info-theory"]))
# Check all available summary functions in the package
print(MFE.valid_summary())
It is possible to pass custom arguments to every metafeature using MFE
extract method kwargs. The keywords must be the target metafeature name, and
the value must be a dictionary in the format {argument: value}, i.e., each
key in the dictionary is a target argument with its respective value. In the
example below, the extraction of metafeatures min and max happens as
usual, but the metafeatures sd, nr_norm and nr_cor_attr will receive user
custom argument values, which will interfere in each metafeature result.
# Extract measures with custom user arguments
mfe = MFE(features=["sd", "nr_norm", "nr_cor_attr", "min", "max"])
mfe.fit(X, y)
ft = mfe.extract(
sd={"ddof": 0},
nr_norm={"method": "all", "failure": "hard", "threshold": 0.025},
nr_cor_attr={"threshold": 0.6},
)
print(ft)
If you want to extract metafeatures from a pre-fitted machine learning model
(from sklearn package), you can use the extract_from_model method without
needing to use the training data:
import sklearn.tree
from sklearn.datasets import load_iris
from pymfe.mfe import MFE
# Extract from model
iris = load_iris()
model = sklearn.tree.DecisionTreeClassifier().fit(iris.data, iris.target)
extractor = MFE()
ft = extractor.extract_from_model(model)
print(ft)
# Extract specific metafeatures from model
extractor = MFE(features=["tree_shape", "nodes_repeated"], summary="histogram")
ft = extractor.extract_from_model(
model,
arguments_fit={"verbose": 1},
arguments_extract={"verbose": 1, "histogram": {"bins": 5}})
print(ft)
You can also extract your metafeatures with confidence intervals using bootstrap. Keep in mind that this method extracts each metafeature several times, and may be very expensive depending mainly on your data and the number of metafeature extract methods called.
# Extract metafeatures with confidence interval
mfe = MFE(features=["mean", "nr_cor_attr", "sd", "max"])
mfe.fit(X, y)
ft = mfe.extract_with_confidence(
sample_num=256,
confidence=0.99,
verbose=1,
)
print(ft)
We write a great Documentation to guide you on how to use the pymfe library. You can find in the documentation interesting pages like:
This project is licensed under the MIT License - see the License file for details.
If you use the pymfe in scientific publication, we would appreciate citations
to the following paper:
You can also use the bibtex format:
@article{JMLR:v21:19-348,
author = {Edesio Alcobaça and
Felipe Siqueira and
Adriano Rivolli and
Luís P. F. Garcia and
Jefferson T. Oliva and
André C. P. L. F. de Carvalho
},
title = {MFE: Towards reproducible meta-feature extraction},
journal = {Journal of Machine Learning Research},
year = {2020},
volume = {21},
number = {111},
pages = {1-5},
url = {http://jmlr.org/papers/v21/19-348.html}
}
We would like to thank every Contributor that directly or indirectly has make this project to happen. Thank you all.
FAQs
Meta-feature Extractor
We found that pymfe demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.

Security News
NIST will stop enriching most CVEs under a new risk-based model, narrowing the NVD's scope as vulnerability submissions continue to surge.

Company News
/Security News
Socket is an initial recipient of OpenAI's Cybersecurity Grant Program, which commits $10M in API credits to defenders securing open source software.