
Security News
AI Agent Lands PRs in Major OSS Projects, Targets Maintainers via Cold Outreach
An AI agent is merging PRs into major OSS projects and cold-emailing maintainers to drum up more work.
submodlib-py
Advanced tools
SubModLib is an easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine. Submodlib finds its application in summarization, data subset selection, hyper parameter tuning, efficient training etc. Through a rich API, it offers a great deal of flexibility in the way it can be used.
SubModLib is an easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine. Submodlib finds its application in summarization, data subset selection, hyper parameter tuning, efficient training etc. Through a rich API, it offers a great deal of flexibility in the way it can be used.
Please check out our latest arxiv preprint: https://arxiv.org/abs/2202.10680
$ pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ submodlib$ git clone https://github.com/decile-team/submodlib.git$ cd submodlib$ pip install .$ pip install -U sphinx$ pip install sphinxcontrib-bibtex$ pip install sphinx-rtd-theme$ cd docs$ make clean html$ pip install pytest$ pytest # this runs ALL tests$ pytest -m <marker> --verbose --disable-warnings -rA # this runs test specified by the . Possible markers are mentioned in pyproject.toml file.It is very easy to get started with submodlib. Using a submodular function in submodlib essentially boils down to just two steps:
The most frequently used methods are:
For example,
from submodlib import FacilityLocationFunction
objFL = FacilityLocationFunction(n=43, data=groundData, mode="dense", metric="euclidean")
greedyList = objFL.maximize(budget=10,optimizer='NaiveGreedy')
For a more detailed discussion on all possible usage patterns, please see Different Options of Usage
We demonstrate the representational power and modeling capabilities of different functions qualitatively in the following Google Colab notebooks:
This notebook contains a quantitative analysis of performance of different functions and role of the parameterization in aspects like query-coverage, query-relevance, privacy-irrelevance and diversity for different SMI, CG and CMI functions as observed on synthetically generated dataset. This notebook contains similar analysis on ImageNette dataset.
To gauge the performance of submodlib, selection by Facility Location was performed on a randomly generated dataset of 1024-dimensional points. Specifically the following code was run for the number of data points ranging from 50 to 10000.
K_dense = helper.create_kernel(dataArray, mode="dense", metric='euclidean', method="other")
obj = FacilityLocationFunction(n=num_samples, mode="dense", sijs=K_dense, separate_rep=False,pybind_mode="array")
obj.maximize(budget=budget,optimizer=optimizer, stopIfZeroGain=False, stopIfNegativeGain=False, verbose=False, show_progress=False)
The above code was timed using Python's timeit module averaged across three executions each. We report the following numbers:
| Number of data points | Time taken (in seconds) |
|---|---|
| 50 | 0.00043 |
| 100 | 0.001074 |
| 200 | 0.003024 |
| 500 | 0.016555 |
| 1000 | 0.081773 |
| 5000 | 2.469303 |
| 6000 | 3.563144 |
| 7000 | 4.667065 |
| 8000 | 6.174047 |
| 9000 | 8.010674 |
| 10000 | 9.417298 |
If your research makes use of SUBMODLIB, please consider citing:
SUBMODLIB (Submodlib: A Submodular Optimization Library (Kaushal et al., 2022))
@article{kaushal2022submodlib,
title={Submodlib: A submodular optimization library},
author={Kaushal, Vishal and Ramakrishnan, Ganesh and Iyer, Rishabh},
journal={arXiv preprint arXiv:2202.10680},
year={2022}
}
Should you face any issues or have any feedback or suggestions, please feel free to contact vishal[dot]kaushal[at]gmail.com
This work is supported by the Ekal Fellowship (www.ekal.org). This work is also supported by the National Science Foundation(NSF) under Grant Number 2106937, a startup grant from UT Dallas, as well as Google and Adobe awards.
FAQs
SubModLib is an easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine. Submodlib finds its application in summarization, data subset selection, hyper parameter tuning, efficient training etc. Through a rich API, it offers a great deal of flexibility in the way it can be used.
We found that submodlib-py demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
An AI agent is merging PRs into major OSS projects and cold-emailing maintainers to drum up more work.

Research
/Security News
Chrome extension CL Suite by @CLMasters neutralizes 2FA for Facebook and Meta Business accounts while exfiltrating Business Manager contact and analytics data.

Security News
After Matplotlib rejected an AI-written PR, the agent fired back with a blog post, igniting debate over AI contributions and maintainer burden.