Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Advancing the state of machine learning?
With 5-10 datasets? Wake me up when I'm dead.
Powerlift is all about testing machine learning techniques across many, many datasets. So many, that we had run into design of experiment concerns. So many, that we had to develop a package for it.
Yes, we run this for InterpretML on as many docker containers we can run in parallel on. Why wait days for benchmark evalations when you can wait for minutes? Rhetorical question, please don't hurt me.
def trial_filter(task):
if task.problem == "binary" and task.n_samples <= 10000:
return ["rf", "svm"]
return []
def trial_runner(trial):
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
from sklearn.calibration import CalibratedClassifierCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, FunctionTransformer
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
if trial.task.problem == "binary":
X, y = trial.task.data()
# Holdout split
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3)
# Build preprocessor
is_cat = meta["categorical_mask"]
cat_cols = [idx for idx in range(X.shape[1]) if is_cat[idx]]
num_cols = [idx for idx in range(X.shape[1]) if not is_cat[idx]]
cat_ohe_step = ("ohe", OneHotEncoder(sparse_output=True, handle_unknown="ignore"))
cat_pipe = Pipeline([cat_ohe_step])
num_pipe = Pipeline([("identity", FunctionTransformer())])
transformers = [("cat", cat_pipe, cat_cols), ("num", num_pipe, num_cols)]
ct = Pipeline(
[
("ct", ColumnTransformer(transformers=transformers)),
(
"missing",
SimpleImputer(add_indicator=True, strategy="most_frequent"),
),
]
)
# Connect preprocessor with target learner
if trial.method == "svm":
clf = Pipeline([("ct", ct), ("est", CalibratedClassifierCV(LinearSVC()))])
else:
clf = Pipeline([("ct", ct), ("est", RandomForestClassifier())])
# Train
clf.fit(X_tr, y_tr)
# Predict
predictions = clf.predict_proba(X_te)[:, 1]
# Score
auc = roc_auc_score(y_te, predictions)
trial.log("auc", auc)
import os
from powerlift.bench import Benchmark, Store
from powerlift.bench import populate_with_datasets
# Initialize database (if needed).
conn_str = f"sqlite:///{os.getcwd()}/powerlift.db"
store = Store(conn_str, force_recreate=False)
# This downloads datasets once and feeds into the database.
populate_with_datasets(store, cache_dir="~/.powerlift", exist_ok=True)
# Run experiment
benchmark = Benchmark(f"sqlite:///{os.getcwd()}/powerlift.db", name="SVM vs RF")
benchmark.run(trial_runner, trial_filter)
benchmark.wait_until_complete()
This can also be run on Azure Container Instances where needed.
# Run experiment (but in ACI).
from powerlift.executors import AzureContainerInstance
store = Store(os.getenv("AZURE_DB_URL"))
azure_tenant_id = os.getenv("AZURE_TENANT_ID")
subscription_id = os.getenv("AZURE_SUBSCRIPTION_ID")
azure_client_id = os.getenv("AZURE_CLIENT_ID")
azure_client_secret = os.getenv("AZURE_CLIENT_SECRET")
resource_group = os.getenv("AZURE_RESOURCE_GROUP")
executor = AzureContainerInstance(
store,
azure_tenant_id,
subscription_id,
azure_client_id,
azure_client_secret=azure_client_secret,
resource_group=resource_group,
n_running_containers=5
)
benchmark = Benchmark(store, name="SVM vs RF")
benchmark.run(trial_runner, trial_filter, timeout=10, executor=executor)
benchmark.wait_until_complete()
pip install powerlift[datasets]
That's it, go get 'em boss.
FAQs
Interactive Benchmarking for Machine Learning.
We found that powerlift demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.