You're Invited:Meet the Socket Team at BlackHat and DEF CON in Las Vegas, Aug 4-6.RSVP
Socket
Book a DemoInstallSign in
Socket

combss

Package Overview
Dependencies
Maintainers
2
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

combss

A package implementation of COMBSS, a continuous optimisation method toward best subset selection

1.1.4
pipPyPI
Maintainers
2

COMBSS Logo

Continuous Optimization Method for Best Subset Selection

PyPI version License

Python implementation of a novel continuous optimization method for best subset selection in linear regression.

📄 Reference:
Moka, Liquet, Zhu & Muller (2024)
COMBSS: best subset selection via continuous optimization
Statistics and Computing

🔗 GitHub Repository: saratmoka/combss

Key Features

  • 🎯 Continuous relaxation of discrete subset selection
  • ⚡ Scalable optimization for high-dimensional data

Intercept Handling

The intercept term (if included) is subject to the same selection process as other features.

Installation

pip install combss

Quick Start

A simple example:

import combss
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Generate sample data
X, y = make_regression(n_samples=1000, n_features=50, noise=0.1, random_state=42)

# Split into training and validation sets (60-40 split)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4, random_state=42)

# Initialize and fit model with validation data
model = combss.linear.model()
model.fit(
    X_train=X_train, 
    y_train=y_train,
    X_val=X_val,      # Validation features
    y_val=y_val,      # Validation targets
    q=10,             # Maximum subset size
    nlam=50           # Number of λ values
)

# Results
print("Best subset indices:", model.subset)
print("Best coefficients:", model.coef_)
print("Validation MSE:", model.mse)
print("Optimal lambda:", model.lambda_)
print("Computation time (s):", model.run_time)

An example with known true coefficients:


import combss
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

# Configuration
n_samples = 5000
n_features = 50
n_informative = 5  # the number of non-zero coefficients
noise_level = 0.1

# Generate data with exactly 5 informative features
X, y, true_coef = make_regression(
    n_samples=n_samples,
    n_features=n_features,
    n_informative=n_informative, 
    noise=noise_level,
    coef=True,  # Return the actual coefficients used
    random_state=42
)

# The true coefficients will be non-zero for first 5 features
print("Number of truly informative features:", sum(true_coef != 0))  

# Split data
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4, random_state=42)

# Initialize and fit model
model = combss.linear.model()
model.fit(
    X_train=X_train, 
    y_train=y_train,
    X_val=X_val,
    y_val=y_val,
    q=10,
    nlam=50
)

# Results analysis
print("\nTrue non-zero coefficients:", np.where(true_coef != 0)[0])
print("Estimated subset:", model.subset)
print("\nValidation MSE:", model.mse)

Documentation

Core Parameters

ParameterDescriptionDefault
qMaximum subset sizemin(n,p)
nlamNumber of λ values50
scalingEnable feature scalingTrue
tauThreshold parameter0.5
delta_fracδ/n in objective function1

Other Parameters

model.fit(
    ...,
    t_init=t_init,     # Initial point for vector t
    eta=0.001,         # Truncation parameter
    patience=10,       # Early stopping rounds
    gd_maxiter=1000,   # Maximum number of iterations for the gradient based optimization
    gd_tol=1e-5,       # Tolerance for the gradient based optimization
    cg_maxiter=1000,   # Maximum number of iterations allowed in the conjugate gradient method
    cg_tol=1e-6        # Conjugate gradient tolerance
)

Output Attributes

AttributeDescription
subsetSelected feature indices (0-based)
coef_Regression coefficients
mseMean squared error
lambda_Optimal λ value
run_timeExecution time (seconds)
subset_listThe list of subsets over the grid
lambda_listThe grid of λ values.

Dependencies

  • Python 3.7+
  • NumPy (≥1.21.0)
  • SciPy (≥1.7.0)

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Developers

Keywords

mathematics

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts