Dolvins
This project provides a set of functions and classes for optimization, probability, and statistical analysis, with a focus on handling multi-dimensional data, hyperplanes, and distribution analysis.
Table of Contents
Installation
Dolvins is built on the following packages:
-
psutil
-
numpy
-
pandas
-
tqdm
-
scipy
To install Dolvins automatically with all its dependencies, please run:
pip install dolvins
Usage
General Math Functions
next_power_of_two(x: int) -> int
Returns the next power of two greater than or equal to x
.
Arguments:
x (int)
: The input number.
Returns:
int
: The next power of two.
Example:
x = 5
next_power = next_power_of_two(x)
print(next_power)
>> 8
round_down_to_nearest_power_of_two(x: int) -> int
Rounds down x
to the nearest power of two.
Arguments:
x (int)
: The input number.
Returns:
int
: The nearest power of two.
Example:
x = 10
nearest_power = round_down_to_nearest_power_of_two(x)
print(nearest_power)
>> 8
gcd_of_list(numbers: list) -> int
Returns the GCD of a list of numbers.
Arguments:
numbers (list)
: A list of integers.
Returns:
int
: The GCD of the list.
Example:
numbers = [12, 15, 21]
gcd_result = gcd_of_list(numbers)
print(gcd_result)
>> 3
Mathematical Objects
Hyperplane
A class representing a hyperplane.
Methods:
-
__init__(self, normal: np.array, coef: float)
Initializes a Hyperplane object with a normal vector and coefficient.
Arguments:
-
project_point(self, *point: float) -> np.array
Projects a point onto the hyperplane.
Arguments:
point (float)
: The vector/point to project.
Returns:
np.array
: The projected point.
Example:
normal = np.array([1, 1, 1])
coef = 3
hyperplane = Hyperplane(normal, coef)
projected_point = hyperplane.project_point(2, 4, 0)
print(projected_point)
>> np.array([1, 2, 0])
Probability and Random Variables Functions
sterlings_approximation(n: int) -> float
Returns an approximation of n!
using Sterling's approximation.
Arguments:
n (int)
: The input number.
Returns:
float
: The approximate factorial of n
.
Example:
n = 10
approx_factorial = sterlings_approximation(n)
print(approx_factorial)
>>> 3598695.6187410373
permutate(n: int, r: int) -> int
Calculates permutations of n
objects taken r
at a time (using Sterling's if n
is too large)
Arguments:
Returns:
Example:
n = 5
r = 3
perm_result = permutate(n, r)
print(perm_result)
>> 60
combinate(n: int, r: int) -> int
Calculates combinations of n
objects taken r
at a time where order does not matter.
Arguments:
Returns:
Example:
n = 5
r = 3
comb_result = combinate(n, r)
print(comb_result)
>> 10
discrete_distribution_prob(exp: pd.Series, obs: pd.Series) -> float
Calculates the exact probability of observing the observed distribution given the expected distribution. Note: scale does not matter (i.e., the sum of obs
vs. the sum of exp
does not matter as the exp
is converted to a probability)
Arguments:
Returns:
float
: The probability of observing the distribution.
Example:
exp = pd.Series([50, 50, 50])
obs = pd.Series([2, 1, 2])
prob = discrete_distribution_prob(exp, obs)
print(prob)
>>> 0.1234
generate_combinations(num_classes: int, num_obs: int) -> set
Returns a set of all possible combinations of num_classes
integers that add up to num_obs
.
Arguments:
Returns:
set
: The set of all possible combinations.
Example:
num_classes = 2
num_obs = 4
combinations = generate_combinations(num_classes, num_obs)
print(combinations)
>> {(0, 4), (1, 3), (2, 2), (3, 1), (4, 0)}
generate_normal_exponent(mean: float, std_dev: float) -> Callable
Generates a function representing the exponent of a normal distribution with the specified mean and standard deviation.
Arguments:
Returns:
Callable
: A function representing the exponent.
Example:
mean = 0
std_dev = 1
normal_exp = generate_normal_exponent(mean, std_dev)
normal_exp
= the functional equivalent to $- \frac{1}{2} \cdot (\frac{x - \mu}{\sigma})^2$ where $\mu$ = mean
and $\sigma$ = std_dev
generate_joint_pdf(exp: pd.Series, num_obs: int) -> Callable
Generates a joint probability density function (PDF) for all possible outcomes based on the expected distribution and the total number of observations.
Arguments:
Returns:
Callable
: The joint PDF function.
Explanation:
-
Approximates each classes distribution with a Normal PDF
-
Multiplies each classes approximation to get a Joint PDF
Example:
exp = pd.Series([4, 6])
num_obs = 100
joint_pdf = generate_joint_pdf(exp, num_obs)
joint_pdf
= the functional equivalent to $\frac{1}{\sqrt(2\cdot\pi\cdot40\cdot\frac{6}{10})\sqrt(2\cdot\pi\cdot60\cdot\frac{4}{10})} \cdot e^{- \frac{1}{2} \cdot (\frac{x - 40}{\sqrt(40\cdot\frac{6}{10}})^2 - \frac{1}{2} \cdot (\frac{y - 60}{\sqrt(60\cdot\frac{4}{10}})^2}$
Calculus Functions
hyperplane_integration(f: Callable, hyperplane: list, max_val: float = None, chunk_size: int = "auto", num_samples: int = "auto", random_state: int = 42, pbar: Callable = None) -> float
Integrates the PDF over an N-d hyperplane using quasi-Monte Carlo integration (Sobol sampling) - Currently only supports integration in the positive quadrant.
Arguments:
-
f (Callable)
: The function to integrate.
-
hyperplane (object)
: The hyperplane over which to integrate.
-
max_val (float)
: The max value at which to cap integration (defaulted to None) - any region in which the function goes beyond that value is not counted.
-
chunk_size (int)
: The amount of samples to handle at one time (defaulted to auto).
-
random_state (int)
: Random state to use to ensure the integration is deterministic.
-
pbar (tqdm)
: Progress bar to update with every chunk completed (defaulted to None)
Returns:
float
: The result of integration.
Example:
f = lambda x, y, z: x + y + z
hyperplane = Hyperplane(normal=np.array([1, 1, 1]), coef=3)
result = hyperplane_integration(f, hyperplane)
print(result)
>> 13.5
Distribution Analysis Functions
E(exp: pd.Series, obs: pd.Series, approximate: bool, chunk_size: int = "auto", num_samples: int = "auto", random_state: int = None) -> float
Performs an E-test on an expected distribution and observed distribution.
Arguments:
-
exp (pd.Series)
: The expected (ground-truth) distribution.
-
obs (pd.Series)
: The observed distribution.
-
approximate (bool)
: If False, the exact discrete probability is calculated; if True, an approximate is calculated based on continuous probability.
-
chunk_size (int)
: The amount of samples to do simultaneously (defaulted to "auto").
-
num_samples (int)
: The number of samples to calculate in total - lower is faster but less precise.
-
random_state (int)
: If specified, leads to deterministic results.
Returns:
Explanation:
-
The E-test seeks to generate a more interpretable and accurate probability value (p-value) for testing the statistical difference between two distributions
-
The E-test assumes the expected and observed distributions are identical, and under those assumptions, calculates an E-value which is the probability of receiving a distribution more Extreme or as Extreme than that which has been observed.
-
Thus, the lower the E-value (i.e., the lower the chances of receiving a distribution that extreme if the distributions were in fact identical), the greater the indication that the distributions are different
-
The exact E-value can be calculated using discrete probability, however, an continuous probability estimate must be calculated in cases where there are many observations
-
Note: time complexity in either case is exponential so while continuous can approximate larger observations, it may take a significant amount of time for massive samples without some method of scaling them down (to be researched)
Example:
exp = pd.Series([50, 50, 50])
obs = pd.Series([300, 300, 300])
e_value = E(exp, obs, approximate=True)
print(e_value)
>> 1.0
exp = pd.Series([50, 0, 0])
obs = pd.Series([100, 0, 0])
e_value = E(exp, obs, approximate=True)
print(e_value)
>> 0
exp = pd.Series([15, 15, 15])
obs = pd.Series([155, 145, 150])
e_value = E(exp, obs, approximate=True)
print(e_value)
>> 0.77743
License
This project is licensed under the MIT License.
This README file provides detailed documentation for each function and class, including arguments, return values, and example usage. You can adjust the details based on your specific project and needs.
Written with StackEdit.