New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details →
Socket
Book a DemoSign in
Socket

@winm2m/inferential-stats-js

Package Overview
Dependencies
Maintainers
1
Versions
9
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@winm2m/inferential-stats-js

A headless JavaScript SDK for advanced statistical analysis in the browser using WebAssembly (Pyodide). Performs SPSS-level inferential statistics entirely client-side with no backend required.

latest
Source
npmnpm
Version
1.0.0
Version published
Maintainers
1
Created
Source

@winm2m/inferential-stats-js

CI npm version license TypeScript WebAssembly API Docs

A headless JavaScript SDK for advanced statistical analysis in the browser using WebAssembly (Pyodide). Performs SPSS-level inferential statistics entirely client-side with no backend required.

Table of Contents

Architecture Overview

@winm2m/inferential-stats-js runs entirely in the browser — no backend server, no API calls, no data ever leaves the client.

┌─────────────────────────────────────────────────────────┐
│  Main Thread                                            │
│  ┌───────────────────────┐     postMessage()            │
│  │  InferentialStats SDK │ ──── ArrayBuffer ──────┐     │
│  │  (ESM / CJS)          │     (Transferable)     │     │
│  └───────────────────────┘                        ▼     │
│                                 ┌─────────────────────┐ │
│                                 │  Web Worker         │ │
│                                 │  ┌────────────────┐ │ │
│                                 │  │  Pyodide WASM  │ │ │
│                                 │  │  ┌───────────┐ │ │ │
│                                 │  │  │  Python   │ │ │ │
│                                 │  │  │  Runtime  │ │ │ │
│                                 │  │  └───────────┘ │ │ │
│                                 │  └────────────────┘ │ │
│                                 └─────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Key Design Principles

PrincipleDescription
100 % Client-SideStatistical computation runs entirely in-browser via WebAssembly. No network requests to any analytics server.
Web Worker IsolationAll heavy computation is offloaded to a dedicated Web Worker, keeping the main thread responsive and the UI jank-free.
ArrayBuffer / TypedArray TransferData is serialized into a columnar binary format (Float64Array, Int32Array, dictionary-encoded strings) and transferred to the worker using the Transferable Objects API for near-zero-copy performance.
Pyodide WASM RuntimeThe worker loads Pyodide — a full CPython interpreter compiled to WebAssembly — along with pandas, SciPy, statsmodels, scikit-learn, and factor_analyzer.
Progress EventsInitialization and computation stages emit CustomEvent progress events on a configurable EventTarget, enabling real-time progress bars.
Dual Module FormatShips as both ESM (dist/index.js) and CommonJS (dist/index.cjs) with full TypeScript declarations.

Core Analysis Features — Mathematical & Technical Documentation

This section documents the mathematical foundations and internal Python implementations of all 16 analyses.

Note on math rendering: Equations are rendered as images so they display correctly on npm.

① Descriptive Statistics

Frequencies

Computes a frequency distribution for a categorical variable, including absolute counts, relative percentages, and cumulative percentages.

Python implementation: pandas.Series.value_counts(normalize=True)

Relative frequency:

formula

where formula is the count of category formula and formula is the total number of observations. Cumulative percentage is the running sum of formula.

Descriptives

Produces summary statistics for one or more numeric variables: count, mean, standard deviation, min, max, quartiles (Q1, Q2, Q3), skewness, and kurtosis.

Python implementation: pandas.DataFrame.describe(), scipy.stats.skew, scipy.stats.kurtosis

Arithmetic mean:

formula

Sample standard deviation (Bessel-corrected):

formula

Skewness (Fisher):

formula

Excess kurtosis (Fisher):

formula

Crosstabs

Cross-tabulates two categorical variables and tests for independence using Pearson's Chi-square test. Reports observed and expected counts, row/column/total percentages, and Cramér's V as an effect-size measure.

Python implementation: pandas.crosstab, scipy.stats.chi2_contingency

Pearson's Chi-square statistic:

formula

where formula is the observed frequency in cell (formula) and formula is the expected frequency under independence.

Cramér's V:

formula

where formula.

② Compare Means

Independent-Samples T-Test

Compares the means of a numeric variable between two independent groups. Automatically reports results for both equal-variance and unequal-variance (Welch's) assumptions. Includes Levene's test for equality of variances.

Python implementation: scipy.stats.ttest_ind, scipy.stats.levene

T-statistic (equal variance assumed):

formula

Pooled standard deviation:

formula

Degrees of freedom: formula

When Levene's test is significant (formula), Welch's t-test is recommended, which uses the Welch–Satterthwaite approximation for degrees of freedom.

Paired-Samples T-Test

Tests whether the mean difference between two paired measurements is significantly different from zero.

Python implementation: scipy.stats.ttest_rel

T-statistic:

formula

where formula is the mean difference and formula is the standard deviation of the differences.

Degrees of freedom: formula

One-Way ANOVA

Tests whether the means of a numeric variable differ significantly across three or more groups.

Python implementation: scipy.stats.f_oneway

F-statistic:

formula

Sum of Squares Between Groups:

formula

Sum of Squares Within Groups:

formula

Mean Squares:

formula

Effect size (Eta-squared):

formula

Post-hoc Tukey HSD

Performs pairwise comparisons of group means following a significant ANOVA result using the Studentized Range distribution.

Python implementation: statsmodels.stats.multicomp.pairwise_tukeyhsd

Studentized range statistic:

formula

where formula is the within-group mean square from the ANOVA and formula is the harmonic mean of group sizes. The critical formula value is obtained from the Studentized Range distribution with formula groups and formula degrees of freedom.

③ Regression

Linear Regression (OLS)

Fits an Ordinary Least Squares regression model with one or more independent variables. Reports regression coefficients, standard errors, t-statistics, p-values, confidence intervals, formula, adjusted formula, F-test, and the Durbin-Watson statistic for autocorrelation detection.

Python implementation: statsmodels.api.OLS

Model:

formula

where formula.

OLS estimator:

formula

Coefficient of determination:

formula

where formula and formula.

Binary Logistic Regression

Models the probability of a binary outcome as a function of one or more independent variables. Reports coefficients (log-odds), odds ratios, z-statistics, p-values, pseudo-formula, AIC, and BIC.

Python implementation: statsmodels.discrete.discrete_model.Logit

Logit link function:

formula

Predicted probability:

formula

Coefficients are estimated by Maximum Likelihood Estimation (MLE). The odds ratio for predictor j is formula.

Multinomial Logistic Regression

Extends binary logistic regression to outcomes with more than two unordered categories. One category is designated as the reference; the model estimates log-odds of each other category relative to the reference.

Python implementation: sklearn.linear_model.LogisticRegression(multi_class='multinomial')

Log-odds relative to reference category formula:

formula

for each category formula.

Predicted probability via softmax:

formula

④ Classify

K-Means Clustering

Partitions observations into formula clusters by iteratively assigning points to the nearest centroid and updating centroids until convergence.

Python implementation: sklearn.cluster.KMeans

Objective function (inertia):

formula

where formula is the set of observations in cluster j and formula is the centroid. The algorithm minimizes J using Lloyd's algorithm (Expectation-Maximization style).

Hierarchical (Agglomerative) Clustering

Builds a hierarchy of clusters using a bottom-up approach. Supports Ward, complete, average, and single linkage methods. Returns a full linkage matrix and dendrogram data for visualization.

Python implementation: scipy.cluster.hierarchy.linkage, scipy.cluster.hierarchy.fcluster

Ward's minimum variance method (default):

formula

At each step, the pair of clusters (A, B) that produces the smallest increase in total within-cluster variance is merged. Ward's method tends to produce compact, equally sized clusters.

⑤ Dimension Reduction

Exploratory Factor Analysis (EFA)

Discovers latent factors underlying a set of observed variables. Supports varimax, promax, oblimin, and no rotation. Reports factor loadings, communalities, eigenvalues, KMO measure of sampling adequacy, and Bartlett's test of sphericity.

Python implementation: factor_analyzer.FactorAnalyzer(rotation='varimax') — installed at runtime via micropip

Factor model:

formula

where formula is the observed variable vector, formula is the matrix of factor loadings, formula is the vector of latent factors, and formula is the unique variance.

Kaiser-Meyer-Olkin (KMO) measure:

formula

where formula are elements of the correlation matrix and formula are elements of the partial correlation matrix. KMO values above 0.6 are generally considered acceptable for factor analysis.

Principal Component Analysis (PCA)

Finds orthogonal components that maximize variance in the data. Reports component loadings, explained variance, cumulative variance ratios, and singular values. Optionally standardizes the input.

Python implementation: sklearn.decomposition.PCA

Objective: Find the weight vector formula that maximizes projected variance:

formula

This is equivalent to finding the eigenvectors of the covariance matrix formula. The eigenvalues formula represent the variance explained by each component.

Explained variance ratio:

formula

Multidimensional Scaling (MDS)

Projects high-dimensional data into a lower-dimensional space (typically 2D) while preserving pairwise distances. Supports both metric and non-metric MDS.

Python implementation: sklearn.manifold.MDS

Stress function (Kruskal's Stress-1):

formula

where formula is the distance in the reduced space and formula is the original distance (or a monotonic transformation for non-metric MDS). A stress value below 0.1 is generally considered a good fit.

⑥ Scale

Cronbach's Alpha

Measures the internal consistency (reliability) of a set of scale items. Reports raw alpha, standardized alpha, item-total correlations, and alpha-if-item-deleted for diagnostic purposes.

Python implementation: Custom implementation using pandas covariance matrix operations

Cronbach's alpha (raw):

formula

where formula is the number of items, formula is the variance of item i, and formula is the variance of the total score.

Standardized alpha (based on mean inter-item correlation):

formula

where formula is the mean of all pairwise Pearson correlations among items.

Alpha RangeInterpretation
≥ 0.9Excellent
0.8 – 0.9Good
0.7 – 0.8Acceptable
0.6 – 0.7Questionable
< 0.6Poor

Installation

npm install @winm2m/inferential-stats-js

Peer dependency (optional): If you want explicit control over the Pyodide version, install pyodide (>= 0.26.0) as a peer dependency. Otherwise the SDK loads Pyodide from the jsDelivr CDN automatically.

Quick Start

import { InferentialStats, PROGRESS_EVENT_NAME } from '@winm2m/inferential-stats-js';

// 1. Listen for initialization progress
window.addEventListener(PROGRESS_EVENT_NAME, (e: Event) => {
  const { stage, progress, message } = (e as CustomEvent).detail;
  console.log(`[${stage}] ${progress}% — ${message}`);
});

// 2. Create an instance (pass the URL to the bundled worker)
const stats = new InferentialStats({
  workerUrl: new URL('@winm2m/inferential-stats-js/worker', import.meta.url).href,
});

// 3. Initialize (loads Pyodide + Python packages inside the worker)
await stats.init();

// 4. Prepare your data
const data = [
  { group: 'A', score: 85 },
  { group: 'A', score: 90 },
  { group: 'B', score: 78 },
  { group: 'B', score: 82 },
  // ... more rows
];

// 5. Run an analysis
const result = await stats.anovaOneway({
  data,
  variable: 'score',
  groupVariable: 'group',
});

console.log(result);
// {
//   success: true,
//   data: { fStatistic: ..., pValue: ..., groupStats: [...], ... },
//   executionTimeMs: 42
// }

// 6. Clean up when done
stats.destroy();

CDN / CodePen Usage

You can use the SDK directly in a browser or CodePen with no build step. The full demo code is identical to the local page below (except for CDN import paths).

API Reference

All analysis methods are async and return Promise<AnalysisResult<T>>:

interface AnalysisResult<T> {
  success: boolean;
  data: T;
  error?: string;
  executionTimeMs: number;
}

Lifecycle Methods

MethodDescription
new InferentialStats(config)Create an instance. config.workerUrl is required. Optional: config.pyodideUrl, config.eventTarget.
init(): Promise<void>Load Pyodide and install Python packages inside the Web Worker.
isInitialized(): booleanReturns true if the worker is ready.
destroy(): voidTerminate the Web Worker and release resources.

Analysis Methods (16 total)

Descriptive Statistics

#MethodInput → OutputDescription
1frequencies(input)FrequenciesInputFrequenciesOutputFrequency distribution and relative percentages for a categorical variable.
2descriptives(input)DescriptivesInputDescriptivesOutputSummary statistics (mean, std, min, max, quartiles, skewness, kurtosis) for numeric variables.
3crosstabs(input)CrosstabsInputCrosstabsOutputCross-tabulation with observed/expected counts, Chi-square test, and Cramér's V.

Compare Means

#MethodInput → OutputDescription
4ttestIndependent(input)TTestIndependentInputTTestIndependentOutputIndependent-samples t-test with Levene's equality-of-variances test.
5ttestPaired(input)TTestPairedInputTTestPairedOutputPaired-samples t-test for dependent observations.
6anovaOneway(input)AnovaInputAnovaOutputOne-way ANOVA with group descriptives and eta-squared effect size.
7posthocTukey(input)PostHocInputPostHocOutputPost-hoc Tukey HSD pairwise comparisons following ANOVA.

Regression

#MethodInput → OutputDescription
8linearRegression(input)LinearRegressionInputLinearRegressionOutputOLS linear regression with coefficients, R², F-test, and Durbin-Watson statistic.
9logisticBinary(input)LogisticBinaryInputLogisticBinaryOutputBinary logistic regression with odds ratios, pseudo-R², and model fit statistics.
10logisticMultinomial(input)MultinomialLogisticInputMultinomialLogisticOutputMultinomial logistic regression with per-category coefficients and odds ratios.

Classify

#MethodInput → OutputDescription
11kmeans(input)KMeansInputKMeansOutputK-Means clustering with cluster centers, labels, and inertia.
12hierarchicalCluster(input)HierarchicalClusterInputHierarchicalClusterOutputAgglomerative hierarchical clustering with linkage matrix and dendrogram data.

Dimension Reduction

#MethodInput → OutputDescription
13efa(input)EFAInputEFAOutputExploratory Factor Analysis with rotation, KMO, and Bartlett's test.
14pca(input)PCAInputPCAOutputPrincipal Component Analysis with loadings and explained variance.
15mds(input)MDSInputMDSOutputMultidimensional Scaling with stress value and coordinate output.

Scale

#MethodInput → OutputDescription
16cronbachAlpha(input)CronbachAlphaInputCronbachAlphaOutputReliability analysis with Cronbach's alpha, item-total correlations, and alpha-if-deleted.

Sample Data

The repository includes a ready-to-use sample dataset at docs/sample-survey-data.json, also hosted on GitHub Pages at:

https://winm2m.github.io/inferential-stats-js/sample-survey-data.json

This dataset contains 2,000 rows of simulated survey data generated with a seeded pseudo-random number generator for full reproducibility.

Schema

ColumnTypeDescription
idintegerUnique respondent ID (1–2000)
genderstring"Male", "Female", or "Other"
age_groupstring"20s", "30s", "40s", "50s", "60s"
nationalitystringOne of several country labels
favorite_musicstringPreferred music genre
favorite_moviestringPreferred movie genre
favorite_artstringPreferred art form
music_satisfactioninteger (1–5)Satisfaction with music offerings (Likert scale)
movie_satisfactioninteger (1–5)Satisfaction with movie offerings (Likert scale)
art_satisfactioninteger (1–5)Satisfaction with art offerings (Likert scale)
weekly_hours_musicfloatWeekly hours spent on music
weekly_hours_moviefloatWeekly hours spent on movies
monthly_art_visitsintegerNumber of art gallery visits per month

This dataset is suitable for exercising every analysis method in the SDK.

Progress Event Handling

During init(), the SDK dispatches CustomEvents to report progress through multiple stages (loading Pyodide, installing Python packages, etc.). You can use these events to drive a progress bar or loading indicator.

Event Name

The event name is exported as the constant PROGRESS_EVENT_NAME (value: 'inferential-stats-progress').

Event Detail

interface ProgressDetail {
  stage: string;       // Current stage identifier (e.g. "pyodide", "packages")
  progress: number;    // Percentage complete (0–100)
  message: string;     // Human-readable status message
}

Example: Full Progress Listener

import { InferentialStats, PROGRESS_EVENT_NAME } from '@winm2m/inferential-stats-js';

// You can target any EventTarget — window, document, or a custom one.
const eventTarget = window;

const stats = new InferentialStats({
  workerUrl: '/dist/stats-worker.js',
  eventTarget, // Progress events will be dispatched here
});

// Register the listener BEFORE calling init()
eventTarget.addEventListener(PROGRESS_EVENT_NAME, ((event: CustomEvent) => {
  const { stage, progress, message } = event.detail as {
    stage: string;
    progress: number;
    message: string;
  };

  // Update a progress bar
  const progressBar = document.getElementById('progress-bar') as HTMLProgressElement;
  progressBar.value = progress;
  progressBar.max = 100;

  // Update a status label
  const statusLabel = document.getElementById('status');
  if (statusLabel) {
    statusLabel.textContent = `[${stage}] ${message} (${progress}%)`;
  }

  console.log(`[${stage}] ${progress}% — ${message}`);
}) as EventListener);

// Start initialization — progress events will fire throughout
await stats.init();
console.log('Ready!');

Typical Progress Sequence

StageProgressMessage
pyodide0Loading Pyodide runtime…
pyodide30Pyodide runtime loaded
packages40Installing pandas…
packages55Installing scipy…
packages70Installing statsmodels…
packages80Installing scikit-learn…
packages90Installing factor_analyzer…
ready100All packages installed. Ready.

License

MIT © 2026 WinM2M

Keywords

statistics

FAQs

Package last updated on 31 Mar 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts