ab-test-toolkit
Install
pip install ab_test_toolkit
imports
from ab_test_toolkit.generator import (
generate_binary_data,
generate_continuous_data,
data_to_contingency,
contingency_from_counts,
)
from ab_test_toolkit.power import (
simulate_power_binary,
sample_size_binary,
simulate_power_continuous,
sample_size_continuous,
)
from ab_test_toolkit.plotting import (
plot_power,
plot_distribution,
plot_betas,
plot_binary_power,
)
from ab_test_toolkit.analyze import p_value_binary
Binary target (e.g. conversion rate experiments)
Sample size:
We can calculate the sample size required with the function
“sample_size_binary”. Input needed is:
-
Conversion rate control: cr0
-
Conversion rate variant for minimal detectable effect: cr1 (for
example, if we have a conversion rate of 1% and want to detect an
effect of at least 20% relate, we would set cr0=0.010 and cr1=0.012)
-
Significance threshold: alpha. Usually set to 0.05, this defines our
tolerance for falsely detecting an effect if in reality there is none
(alpha=0.05 means that in 5% of the cases we will detect an effect
even though the samples for control and variant are drawn from the
exact same distribution).
-
Statistical power. Usually set to 0.8. This means that if the effect
is the minimal effect specified above, we have an 80% probability of
identifying it at statistically significant (and hence 20% of not
idenfitying it).
-
one_sided: If the test is one-sided (one_sided=True) or if it is
two-sided (one_sided=False). As a rule of thumb, if there are very
strong reasons to believe that the variant cannot be inferior to the
control, we can use a one sided test. In case of doubts, using a two
sided test is better.
let us calculate the sample size for the following example:
n_sample = sample_size_binary(
cr0=0.01,
cr1=0.012,
alpha=0.05,
power=0.8,
one_sided=True,
)
print(f"Required sample size per variant is {int(n_sample)}.")
Required sample size per variant is 33560.
n_sample_two_sided = sample_size_binary(
cr0=0.01,
cr1=0.012,
alpha=0.05,
power=0.8,
one_sided=False,
)
print(
f"For the two-sided experiment, required sample size per variant is {int(n_sample_two_sided)}."
)
For the two-sided experiment, required sample size per variant is 42606.
Power simulations
What happens if we use a smaller sample size? And how can we understand
the sample size?
Let us analyze the statistical power with synthethic data. We can do
this with the simulate_power_binary function. We are using some default
argument here, see this
page for more
information.
Note: The simulation object return the total sample size, so we need to
split it per variant.
Finally, we can plot the results (note: the plot function show the
sample size per variant):
Compute p-value
n0 = 5000
n1 = 5100
c0 = 450
c1 = 495
df_c = contingency_from_counts(n0, c0, n1, c1)
df_c
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
| users | converted | not_converted | cvr |
---|
group | | | | |
0 | 5000 | 450 | 4550 | 0.090000 |
1 | 5100 | 495 | 4605 | 0.097059 |
p_value_binary(df_c)
0.11824221841149218
The problem of peaking
wip
Contunious target (e.g. average)
Here we assume normally distributed data (which usually holds due to the
central limit theorem).
Sample size
We can calculate the sample size required with the function
“sample_size_continuous”. Input needed is:
-
mu1: Mean of the control group
-
mu2: Mean of the variant group assuming minimal detectable effect
(e.g. if the mean it 5, and we want to detect an effect as small as
0.05, mu1=5.00 and mu2=5.05)
-
sigma: Standard deviation (we assume the same for variant and control,
should be estimated from historical data)
-
alpha, power, one_sided: as in the binary case
Let us calculate an example:
n_sample = sample_size_continuous(
mu1=5.0, mu2=5.05, sigma=1, alpha=0.05, power=0.8, one_sided=True
)
print(f"Required sample size per variant is {int(n_sample)}.")
Let us also do some simulations. These show results for the t-test as
well as bayesian testing (only 1-sided).
Data Generators
We can also use the data generators for example data to analyze or
visualuze as if they were experiments.
Distribution without effect:
df_continuous = generate_continuous_data(effect=0)
Distribution with effect:
df_continuous = generate_continuous_data(effect=1)
Visualizations
Plot beta distributions for a contingency table:
df = generate_binary_data()
df_contingency = data_to_contingency(df)
False positives