
Product
Socket Now Protects the Chrome Extension Ecosystem
Socket is launching experimental protection for Chrome extensions, scanning for malware and risky permissions to prevent silent supply chain attacks.
A package implementation of COMBSS, a continuous optimisation method toward best subset selection
Python implementation of a novel continuous optimization method for best subset selection in linear regression.
📄 Reference:
Moka, Liquet, Zhu & Muller (2024)
COMBSS: best subset selection via continuous optimization
Statistics and Computing
🔗 GitHub Repository: saratmoka/combss
The intercept term (if included) is subject to the same selection process as other features.
pip install combss
A simple example:
import combss
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
# Generate sample data
X, y = make_regression(n_samples=1000, n_features=50, noise=0.1, random_state=42)
# Split into training and validation sets (60-40 split)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4, random_state=42)
# Initialize and fit model with validation data
model = combss.linear.model()
model.fit(
X_train=X_train,
y_train=y_train,
X_val=X_val, # Validation features
y_val=y_val, # Validation targets
q=10, # Maximum subset size
nlam=50 # Number of λ values
)
# Results
print("Best subset indices:", model.subset)
print("Best coefficients:", model.coef_)
print("Validation MSE:", model.mse)
print("Optimal lambda:", model.lambda_)
print("Computation time (s):", model.run_time)
An example with known true coefficients:
import combss
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
# Configuration
n_samples = 5000
n_features = 50
n_informative = 5 # the number of non-zero coefficients
noise_level = 0.1
# Generate data with exactly 5 informative features
X, y, true_coef = make_regression(
n_samples=n_samples,
n_features=n_features,
n_informative=n_informative,
noise=noise_level,
coef=True, # Return the actual coefficients used
random_state=42
)
# The true coefficients will be non-zero for first 5 features
print("Number of truly informative features:", sum(true_coef != 0))
# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4, random_state=42)
# Initialize and fit model
model = combss.linear.model()
model.fit(
X_train=X_train,
y_train=y_train,
X_val=X_val,
y_val=y_val,
q=10,
nlam=50
)
# Results analysis
print("\nTrue non-zero coefficients:", np.where(true_coef != 0)[0])
print("Estimated subset:", model.subset)
print("\nValidation MSE:", model.mse)
Parameter | Description | Default |
---|---|---|
q | Maximum subset size | min(n,p) |
nlam | Number of λ values | 50 |
scaling | Enable feature scaling | True |
tau | Threshold parameter | 0.5 |
delta_frac | δ/n in objective function | 1 |
model.fit(
...,
t_init=t_init, # Initial point for vector t
eta=0.001, # Truncation parameter
patience=10, # Early stopping rounds
gd_maxiter=1000, # Maximum number of iterations for the gradient based optimization
gd_tol=1e-5, # Tolerance for the gradient based optimization
cg_maxiter=1000, # Maximum number of iterations allowed in the conjugate gradient method
cg_tol=1e-6 # Conjugate gradient tolerance
)
Attribute | Description |
---|---|
subset | Selected feature indices (0-based) |
coef_ | Regression coefficients |
mse | Mean squared error |
lambda_ | Optimal λ value |
run_time | Execution time (seconds) |
subset_list | The list of subsets over the grid |
lambda_list | The grid of λ values. |
Contributions are welcome! Please open an issue or submit a pull request.
FAQs
A package implementation of COMBSS, a continuous optimisation method toward best subset selection
We found that combss demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket is launching experimental protection for Chrome extensions, scanning for malware and risky permissions to prevent silent supply chain attacks.
Product
Add secure dependency scanning to Claude Desktop with Socket MCP, a one-click extension that keeps your coding conversations safe from malicious packages.
Product
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.