Library for calculating Chi abs, resampling, Bootstrapping -- LS 40 @UCLA Life science course
-
read_data
The read_data function reads data from either a DataFrame or a file path. If the input is a DataFrame, it makes a copy. If it's a file path, it checks the file extension (.xls, .xlsx, .csv) and reads the data accordingly. It then applies pandas' to_numeric function to convert the data to numeric types, with non-numeric values coerced to NaN.
Usage Example:
data = read_data('path_to_file.csv') # For CSV files
data = read_data('path_to_file.xlsx') # For Excel files
data = read_data(dataframe) # For existing DataFrame
-
compare_dimensions
This function compares the dimensions of the observed and expected datasets. If the dimensions don't match, it raises a ValueError. This ensures that further statistical calculations are valid.
Usage Example:
compare_dimensions(observed_data, expected_data) # Checks if dimensions of observed and expected data match
-
calculate_chi_squared
Calculates the chi-squared statistic from observed and expected datasets. This is done by summing the squared differences between observed and expected values, divided by the expected values.
Usage Example:
chi_squared = calculate_chi_squared(observed_data, expected_data) # Returns the chi-squared statistic
-
calculate_expected
Computes expected frequencies for a contingency table based on the observed data. It uses row and column sums to calculate these frequencies, assuming independence between rows and columns.
Usage Example:
expected_data = calculate_expected(observed_data) # Returns the expected frequencies
-
calculate_chi_abs
This function calculates the chi absolute statistic for observed and expected data. If expected data is not provided, it calculates the expected data based on the observed data.
Usage Example:
chi_abs = calculate_chi_abs(observed_data, expected_data) # Calculates the chi absolute statistic
-
chi_abs_stat
A wrapper function for calculate_chi_abs. It allows the user to provide only observed data, with an option to provide expected data. It handles dimension comparison and error logging.
Usage Example:
chi_abs = chi_abs_stat(observed_data, expected_data) # Calculates chi absolute statistic with optional expected data
-
calculate_p_value
Calculates the p-value from a chi-squared statistic and degrees of freedom using the chi-squared survival function.
Usage Example:
p_value = calculate_p_value(chi_squared, dof) # Returns the p-value
-
chi_squared_stat
Reads observed and expected data, checks their dimensions, calculates the chi-squared statistic, and then computes the p-value.
Usage Example:
chi_squared_value = chi_squared_stat(observed_data, expected_data) # Calculates chi-squared statistic
-
p_value_stat
Reads observed and expected data, checks dimensions, calculates the chi-squared statistic, and then computes the p-value.
Usage Example:
p_value = p_value_stat(observed_data, expected_data) # Calculates the p-value for chi-squared statistic
-
convert_df_to_numpy
Converts DataFrame data to numpy arrays for use in functions, particularly for bootstrapping. It returns a tuple of numpy arrays for observed and expected data.
Usage Example:
observed_array, expected_array = convert_df_to_numpy(df_observed, df_expected) # Converts DataFrame to numpy arrays
-
bootstrap_chi_abs_distribution
Generates a bootstrap distribution of the chi absolute statistic for an n*n contingency table. Simulates new datasets and calculates the chi absolute for each, returning an array of these statistics.
Usage Example:
simulated_chi_abs = bootstrap_chi_abs_distribution(observed_data) # Returns array of simulated chi absolute statistics
-
calculate_p_value_bootstrap
Calculates the p-value for the chi absolute statistic using bootstrap methods. Compares the observed chi absolute statistic against the distribution of simulated chi absolute values to compute the p-value.
Usage Example:
p_value = calculate_p_value_bootstrap(observed_chi_abs, simulated_chi_abs) # Calculates p-value using bootstrap method
-
plot_chi_abs_distribution
Plots the distribution of simulated chi absolute values along with the observed chi absolute value. Shows the calculated p-value, providing a visual representation of the statistical analysis.
Usage Example:
plot_chi_abs_distribution(simulated_data, observed_data, p_value) # Plots distribution of chi absolute values
Extended:- Relative Risk Analysis Documentation
This documentation provides an overview and usage guide for a set of Python functions designed to calculate the relative risk between two treatments, resample data for statistical analysis, calculate confidence intervals, and plot the distribution of relative risks. These functions are intended for statistical analysis in lifesciences research or any field requiring comparative risk assessment.
bootstrap_chi_abs_distribution :
The bootstrap_chi_abs_distribution function is designed for generating a bootstrap distribution of the chi absolute statistic for an n*n contingency table. This is crucial in statistical analysis, especially when assessing the significance of observed frequencies against expected frequencies in categorical data.
Parameters: