
Research
Malicious npm Packages Impersonate Flashbots SDKs, Targeting Ethereum Wallet Credentials
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
A comprehensive Python package for Exploratory Data Analysis (EDA), advanced statistical testing, and interactive data visualization with Plotly.
rostaing-report is a comprehensive Python toolkit for data professionals, designed to accelerate the entire data analysis workflow, from initial exploration to final presentation. With a single, intuitive interface, it transforms a raw Pandas DataFrame into a rich, interactive analysis environment.
This package provides a powerful, unified interface for:
It is built for Data Scientists, Data Analysts, and Researchers who need to quickly understand, validate, and visualize their data with maximum efficiency.
Install the package and all its dependencies from PyPI:
pip install rostaing-report
rostaing_report
ObjectThe entire toolkit is accessed through a single object. You initialize it once with your DataFrame, and then use that same object to run reports, create plots, and perform tests.
import pandas as pd
from rostaing import rostaing_report
import plotly.express as px
# Load a versatile sample dataset
df = px.data.tips()
# Initialize the report object
# This one object is now your gateway to all functionalities
report = rostaing_report(df)
The fastest way to get insights. Simply displaying the object in a notebook (or printing it in a script) generates a full descriptive analysis.
# In a Jupyter Notebook or similar environment:
report
This command produces a detailed, multi-section report:
Create over 40 types of interactive plots with an API identical to Plotly Express. The report
object automatically handles passing the DataFrame. All methods return a plotly.graph_objects.Figure
that you can display with .show()
or customize further.
.scatter()
: For showing the relationship between two numeric variables.
fig = report.scatter(x="total_bill", y="tip", color="time", title="Tip vs. Total Bill by Time of Day")
# fig.show()
.line()
: For showing trends over a continuous variable, often time.
df_stocks = px.data.stocks()
stock_report = rostaing_report(df_stocks)
fig = stock_report.line(x="date", y=["GOOG", "AAPL"], title="Google vs. Apple Stock Price")
# fig.show()
.bar()
: For comparing categorical data.
fig = report.bar(x="day", y="total_bill", color="sex", barmode="group", title="Total Bill by Day, Grouped by Sex")
# fig.show()
.area()
: Like a line chart, but with the area below the line filled in.
fig = stock_report.area(x="date", y="GOOG", title="Google Stock Price (Area Chart)")
# fig.show()
.funnel()
: For visualizing stages in a process (e.g., a sales funnel).
df_funnel = px.data.funnel()
funnel_report = rostaing_report(df_funnel)
fig = funnel_report.funnel(x='number', y='stage', title="Sales Funnel")
# fig.show()
.timeline()
: A type of Gantt chart for visualizing events over time.
df_gantt = pd.DataFrame([
dict(Task="Job A", Start='2023-01-01', Finish='2023-02-28', Resource="Alex"),
dict(Task="Job B", Start='2023-03-05', Finish='2023-04-15', Resource="Max"),
dict(Task="Job C", Start='2023-02-20', Finish='2023-05-30', Resource="Alex")
])
gantt_report = rostaing_report(df_gantt)
fig = gantt_report.timeline(x_start="Start", x_end="Finish", y="Task", color="Resource", title="Project Timeline")
# fig.show()
.histogram()
: To understand the distribution of a single numeric variable.
fig = report.histogram(x="total_bill", nbins=30, marginal="box", title="Distribution of Total Bill")
# fig.show()
.box()
: For visualizing the five-number summary (min, Q1, median, Q3, max) and identifying outliers.
fig = report.box(x="day", y="tip", color="smoker", title="Tip Distribution by Day and Smoker Status")
# fig.show()
.violin()
: A combination of a box plot and a kernel density estimate.
fig = report.violin(x="day", y="tip", color="sex", box=True, points="all", title="Tip Distribution (Violin Plot)")
# fig.show()
.strip()
: A scatter plot for a single variable, useful for seeing all data points.
fig = report.strip(x="day", y="tip", color="time", title="Individual Tips by Day and Time")
# fig.show()
.ecdf()
: To visualize the Empirical Cumulative Distribution Function.
fig = report.ecdf(x="tip", color="sex", title="ECDF of Tips by Sex")
# fig.show()
.density_heatmap()
: Shows the density of points in a 2D space as a heatmap.
fig = report.density_heatmap(x="total_bill", y="tip", marginal_x="histogram", marginal_y="histogram", title="Density Heatmap of Tips vs. Total Bill")
# fig.show()
.density_contour()
: Shows the density of points as contour lines.
fig = report.density_contour(x="total_bill", y="tip", color="sex", title="Density Contour of Tips vs. Total Bill by Sex")
# fig.show()
.imshow()
: For displaying images or matrices.
# Note: data is passed directly for imshow
from skimage import data
img = data.chelsea()
fig = report.imshow(img=img, title="imshow Example with scikit-image")
# fig.show()
.pie()
: For showing proportions of a whole.
fig = report.pie(names="day", values="tip", title="Total Tips by Day")
# fig.show()
.sunburst()
: A hierarchical pie chart.
fig = report.sunburst(path=['sex', 'day', 'time'], values='total_bill', title="Hierarchical Breakdown of Total Bill")
# fig.show()
.treemap()
: An alternative hierarchical view using nested rectangles.
fig = report.treemap(path=['smoker', 'day'], values='total_bill', title="Treemap of Total Bill by Smoker and Day")
# fig.show()
.icicle()
: Another hierarchical view, laid out as a "flame" chart.
fig = report.icicle(path=['time', 'day'], values='tip', title="Icicle Chart of Tips")
# fig.show()
.funnel_area()
: For analyzing sequences, showing the flow between states.
df_flow = px.data.wind()
flow_report = rostaing_report(df_flow)
fig = flow_report.funnel_area(names='direction', values='frequency', title='Wind Direction Funnel')
# fig.show()
.scatter_3d()
: A scatter plot in 3D space.
df_iris = px.data.iris()
iris_report = rostaing_report(df_iris)
fig = iris_report.scatter_3d(x='sepal_length', y='sepal_width', z='petal_width', color='species', title="3D Iris Dataset")
# fig.show()
.line_3d()
: A line plot in 3D space.
df_election = px.data.election()
election_report = rostaing_report(df_election)
fig = election_report.line_3d(x="Joly", y="Coderre", z="Bergeron", color="winner", title="3D Election Results")
# fig.show()
.scatter_matrix()
: A grid of scatter plots for comparing multiple variables at once (also known as a pairs plot).
df_iris = px.data.iris()
iris_report = rostaing_report(df_iris)
fig = iris_report.scatter_matrix(dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"], color="species", title="Iris Dataset Scatter Matrix")
# fig.show()
.parallel_coordinates()
: For plotting multivariate numerical data.
fig = iris_report.parallel_coordinates(color="species_id", labels={"species_id": "Species", "sepal_width": "Sepal Width", "sepal_length": "Sepal Length"}, title="Parallel Coordinates for Iris Dataset")
# fig.show()
.parallel_categories()
: For plotting multivariate categorical data.
fig = report.parallel_categories(dimensions=['sex', 'smoker', 'day', 'time'], color="size", color_continuous_scale=px.colors.sequential.Inferno, title="Parallel Categories for Tips Dataset")
# fig.show()
.scatter_geo()
and .line_geo()
: For plotting points or lines on an outline map.
df_gap = px.data.gapminder().query("year==2007")
gap_report = rostaing_report(df_gap)
fig = gap_report.scatter_geo(locations="iso_alpha", color="continent", hover_name="country", size="pop", projection="natural earth", title="Global Population in 2007")
# fig.show()
.choropleth()
: For creating filled geographical maps where color represents a value.
fig = gap_report.choropleth(locations="iso_alpha", color="lifeExp", hover_name="country", color_continuous_scale=px.colors.sequential.Plasma, title="Global Life Expectancy in 2007")
# fig.show()
.scatter_mapbox()
, .choropleth_mapbox()
): Similar to geo charts but use tile-based maps. Requires a Mapbox access token.
# Set your token first: px.set_mapbox_access_token("YOUR_TOKEN")
df_car = px.data.carshare()
car_report = rostaing_report(df_car)
# fig = car_report.scatter_mapbox(lat="centroid_lat", lon="centroid_lon", color="peak_hour", size="car_hours", title="Car Share Activity in Montreal")
# fig.show()
.scatter_polar()
and .bar_polar()
: For plotting data in a polar coordinate system.
df_wind = px.data.wind()
wind_report = rostaing_report(df_wind)
fig = wind_report.bar_polar(r="frequency", theta="direction", color="strength", title="Wind Frequency by Direction and Strength")
# fig.show()
.scatter_ternary()
: For plotting data with three components that sum to a constant (e.g., compositions).
df_election = px.data.election()
election_report = rostaing_report(df_election)
fig = election_report.scatter_ternary(a="Joly", b="Coderre", c="Bergeron", color="winner", size="total", hover_name="district", title="Ternary Plot of Election Results")
# fig.show()
Perform a wide range of statistical tests directly from the report
object. Each test returns a well-formatted dictionary or DataFrame.
.normality_test()
: Checks if data is likely drawn from a normal distribution.
# Using 'shapiro', 'jarque_bera', or 'normaltest'
norm_results = report.normality_test('total_bill', test='shapiro')
print(pd.Series(norm_results))
test Shapiro-Wilk
column total_bill
statistic 0.913576
p_value 8.20467e-11
conclusion (alpha=0.05) The null hypothesis (normality) is rejected (p < 0.05).
dtype: object
.ks_test()
: Kolmogorov-Smirnov test for goodness of fit against a theoretical distribution.
ks_results = report.ks_test('tip', dist='norm')
print(pd.Series(ks_results))
test Kolmogorov-Smirnov
column tip
distribution_tested norm
statistic 0.480036
p_value 1.19662e-54
conclusion (alpha=0.05) The data does not follow a 'norm' distribution (p < 0.05).
dtype: object
.ttest_1samp()
: Compares the mean of a single sample to a known or hypothesized population mean.
# Is the average tip significantly different from $2.50?
ttest1_results = report.ttest_1samp(col='tip', popmean=2.5)
print(pd.Series(ttest1_results))
test T-test (one-sample)
column tip
population_mean 2.5
t_statistic 5.62125
p_value 4.3417e-08
conclusion (alpha=0.05) Statistically significant difference from the population mean (p < 0.05).
dtype: object
.ttest_ind()
: Compares the means of two independent columns.
# Note: A more common use is comparing one variable across two groups (see non-parametric section).
# This example tests if the mean of 'total_bill' is equal to the mean of 'tip'.
ttest_ind_results = report.ttest_ind('total_bill', 'tip')
print(pd.Series(ttest_ind_results))
.ttest_paired()
: Compares the means of two dependent (paired) samples (e.g., "before" and "after").
# Create synthetic paired data
paired_df = pd.DataFrame({'before': [20, 21, 25, 18], 'after': [22, 24, 26, 21]})
paired_report = rostaing_report(paired_df)
paired_results = paired_report.ttest_paired('before', 'after')
print(pd.Series(paired_results))
.anova_oneway()
: Compares the means of three or more independent groups.
# Is there a difference in tip amount across different days of the week?
anova1_results = report.anova_oneway(dv='tip', between='day')
print(pd.Series(anova1_results))
.anova_twoway()
: Examines the influence of two different categorical independent variables on one continuous dependent variable.
# How do 'sex' and 'day' (and their interaction) affect the tip amount?
anova2_results = report.anova_twoway(dv='tip', between=['sex', 'day'])
print(anova2_results)
.anova_rm()
: Repeated Measures ANOVA for comparing means across three or more dependent conditions.
# Requires data in long format with a subject identifier.
rm_df = pg.read_dataset('rm_anova')
rm_report = rostaing_report(rm_df)
rm_results = rm_report.anova_rm(dv='Desire', within='Time', subject='Subject')
print(rm_results)
.chi2_test()
: Tests for independence between two categorical variables.
chi2_results = report.chi2_test('sex', 'smoker')
print(pd.Series(chi2_results).drop('contingency_table'))
.mann_whitney_u_test()
: Non-parametric alternative to the independent T-test. Compares distributions of two independent groups.
# Is the tip distribution the same for smokers and non-smokers?
mwu_results = report.mann_whitney_u_test(dv='tip', group_col='smoker')
print(pd.Series(mwu_results))
.wilcoxon_test()
: Non-parametric alternative to the paired T-test.
ttest_paired
):
# paired_report = ... (see ttest_paired example)
# wilcoxon_results = paired_report.wilcoxon_test('before', 'after')
# print(pd.Series(wilcoxon_results))
.kruskal_wallis_test()
: Non-parametric alternative to the one-way ANOVA. Compares distributions of three or more groups.
# Is the tip distribution the same across different days?
kruskal_results = report.kruskal_wallis_test(dv='tip', group_col='day')
print(pd.Series(kruskal_results))
.friedman_test()
: Non-parametric alternative to the repeated measures ANOVA.
# Create synthetic repeated data
friedman_df = pd.DataFrame({'cond1': [1,2,3,4], 'cond2': [2,3,4,5], 'cond3': [3,4,5,6]})
friedman_report = rostaing_report(friedman_df)
friedman_results = friedman_report.friedman_test('cond1', 'cond2', 'cond3')
print(pd.Series(friedman_results))
.correlation_matrix()
: Calculates the full correlation matrix.
# Methods can be 'pearson', 'spearman', or 'kendall'
corr_matrix = report.correlation_matrix(method='spearman')
# print(corr_matrix)
.point_biserial_corr()
: Measures the correlation between a binary and a continuous variable.
# Create a binary column for the example
df_copy = df.copy()
df_copy['is_male'] = (df_copy['sex'] == 'Male').astype(int)
binary_report = rostaing_report(df_copy)
pbc_results = binary_report.point_biserial_corr(binary_col='is_male', continuous_col='tip')
print(pd.Series(pbc_results))
.partial_corr()
: Measures the correlation between two variables while controlling for the effect of one or more other variables.
# What is the correlation between 'tip' and 'total_bill' after controlling for party 'size'?
partial_corr_results = report.partial_corr(x='tip', y='total_bill', covar=['size'])
print(partial_corr_results)
from lifelines.datasets import load_waltons
surv_df = load_waltons()
surv_report = rostaing_report(surv_df)
.kaplan_meier_curve()
: Plots the survival probability over time.
# Plot survival curves for different treatment groups
ax = surv_report.kaplan_meier_curve(duration_col='T', event_col='E', group_col='group')
# ax.figure.show()
.logrank_test()
: Compares the survival curves of two or more groups.
logrank_results = surv_report.logrank_test(duration_col='T', event_col='E', group_col='group')
print(logrank_results)
.cox_ph_regression()
: Models the relationship between covariates and the risk of an event (Cox Proportional Hazards).
cox_summary = surv_report.cox_ph_regression(duration_col='T', event_col='E', 'group')
print(cox_summary)
.test_levene()
: Tests the assumption of equal variances across groups.
levene_results = report.test_levene(dv='tip', group_col='day')
print(pd.Series(levene_results))
.effect_size_cohen_d()
: Calculates Cohen's d, a measure of effect size for the difference between two means.
cohen_d_results = report.effect_size_cohen_d(dv='tip', group_col='sex')
print(pd.Series(cohen_d_results))
.cronbachs_alpha()
: Measures the internal consistency or reliability of a set of scale items (e.g., a questionnaire).
# Create synthetic questionnaire data
q_data = pg.read_dataset('cronbach_alpha')
q_report = rostaing_report(q_data)
item_cols = q_data.columns.tolist()
alpha_results = q_report.cronbachs_alpha(*item_cols)
print(pd.Series(alpha_results))
.cohens_kappa()
and .fleiss_kappa()
: Measures inter-rater agreement for categorical items.
# Create synthetic rater data
rater_df = pd.DataFrame({'rater1': ['A','B','C','A','B'], 'rater2': ['A','B','C','B','B']})
rater_report = rostaing_report(rater_df)
kappa_results = rater_report.cohens_kappa('rater1', 'rater2')
print(pd.Series(kappa_results))
Contributions are welcome! If you have ideas for new features, find a bug, or want to improve the documentation, please feel free to open an issue or submit a pull request on the project's repository.
This project is licensed under the MIT License. See the LICENSE
file for details.
FAQs
A comprehensive Python package for Exploratory Data Analysis (EDA), advanced statistical testing, and interactive data visualization with Plotly.
We found that rostaing-report demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
Security News
Ruby maintainers from Bundler and rbenv teams are building rv to bring Python uv's speed and unified tooling approach to Ruby development.
Security News
Following last week’s supply chain attack, Nx published findings on the GitHub Actions exploit and moved npm publishing to Trusted Publishers.