
Research
Malicious npm Packages Impersonate Flashbots SDKs, Targeting Ethereum Wallet Credentials
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
Toolkit for identifying third-party data buyers in U.S. federal job postings from USAJobs.
Table of Contents | |
1. Overview | 2. Folder Structure |
3. Function Inputs and Outputs | 4. Quick Visual Summary |
5. When to Use Each Function | 6. Installation Instructions |
7. License | 8. Contributions |
9. Usage Examples Notebook |
The data_buyer_toolkit
package is a modular, published Python library for analyzing, preprocessing, and scoring U.S. federal job postings for third-party data acquisition demand.
It is a core component of the broader Public Sector Data Demand Research Framework, but can also be used independently as a lightweight toolkit for real-time job analysis and data buyer detection.
Specifically, this package allows users to:
By operationalizing job text analysis, this package helps commercial data vendors, researchers, and policy analysts identify promising government leads and map market demand trends for external data products.
Root Directory:
├── setup.py
├── pyproject.toml
Python Package: data_buyer_toolkit/
├── __init__.py
├── toolkit.py
├── nlp_pipeline_with_smote.joblib
├── README.md
├── examples/
│ └── usage_examples.ipynb
Import the Package
from data_buyer_toolkit.toolkit import (
load_pipeline,
preprocess_job_api_response,
fetch_and_score_job,
search_job_ids_by_title,
batch_fetch_and_score_jobs,
fetch_and_score_top_by_use_case_auto,
fetch_top_data_buyers_by_industry_auto,
fetch_and_score_top_by_use_case_custom,
fetch_top_data_buyers_by_industry_custom,
)
load_pipeline()
pipeline = load_pipeline()
Purpose:
Load the trained NLP pipeline stored inside the package (nlp_pipeline_with_smote.joblib
).
Inputs:
Outputs:
preprocessor
)classifier
)preprocess_job_api_response(job_json)
Purpose:
Preprocess a single USAJobs API job posting into a structured, model-ready pandas DataFrame.
Inputs:
job_json
(dict
):PositionTitle
OrganizationName
UserArea -> Details -> JobSummary
MajorDuties
Outputs:
df_processed
(pd.DataFrame
):fetch_and_score_job(job_id, api_key, email)
score_result = fetch_and_score_job(job_id="1234567", api_key="YOUR_USAJOBS_API_KEY", email="YOUR_EMAIL@example.com")
print(score_result)
Purpose:
Fetch a job posting by its ID from USAJobs, preprocess it, and score its likelihood of being a third-party data buyer using the NLP model.
Inputs:
job_id
(str
or int
):api_key
(str
):email
(str
):User-Agent
for the API call (must match your registered account).Outputs:
result
(dict
):data_buyer_score
(float
): The predicted probability (0 to 1) that this job is a data buyer.title
(str
): The job's title.agency
(str
): The hiring agency.search_job_ids_by_title(position_title, api_key, email, max_results=10)
job_matches = search_job_ids_by_title(position_title="Data Scientist", api_key="YOUR_USAJOBS_API_KEY", email="YOUR_EMAIL@example.com")
Purpose:
Search the USAJobs API for job postings by job title keyword.
Inputs:
position_title
(str
):api_key
(str
):email
(str
):User-Agent
.max_results
(int
, default = 10):Outputs:
jobs
(list
of dict
):job_id
(str
)title
(str
)agency
(str
)``batch_fetch_and_score_jobs(job_titles, api_key, email)
titles = ["Data Analyst", "Contract Specialist", "Program Manager"]
batch_scores = batch_fetch_and_score_jobs(titles, api_key="YOUR_USAJOBS_API_KEY", email="YOUR_EMAIL@example.com")
print(batch_scores)
Purpose:
Search and score multiple job titles in batch.
Inputs:
job_titles
(list
of str
):api_key
(str
):email
(str
):Outputs:
results_df
(pd.DataFrame
):fetch_and_score_top_by_use_case_auto(api_key, email, use_case="Fraud", top_n=100)
top_fraud_jobs = fetch_and_score_top_by_use_case_auto(api_key="YOUR_USAJOBS_API_KEY", email="YOUR_EMAIL@example.com", use_case="Fraud", top_n=50)
print(top_fraud_jobs)
Purpose:
Automatically search a broad set of keywords, pull all matches, and rank top-scoring jobs for a selected use case (e.g., Fraud, Sentiment).
Inputs:
api_key
(str
):email
(str
):User-Agent
.use_case
(str
, default = "Fraud"
):Fraud
Sentiment
PatientMatching
AdTargeting
top_n
(int
, default = 100):Outputs:
top_jobs_df
(pd.DataFrame
):fetch_and_score_top_by_use_case_custom(api_key, email, use_case="Fraud", top_n=100, search_keywords=None)
fetch_and_score_top_by_use_case_custom(
api_key="YOUR_USAJOBS_API_KEY",
email="YOUR_EMAIL@example.com",
use_case="Fraud",
top_n=50,
search_keywords=["cybersecurity", "finance", "clinical", "artificial intelligence"])
Purpose:
Search live USAJobs postings using custom keywords and return top jobs matching a selected use case.
Inputs:
api_key
(str
):email
(str
):User-Agent
.use_case
(str
, default = "Fraud"
):Fraud
Sentiment
PatientMatching
AdTargeting
top_n
(int
, default = 100):search_keywords
(list
, optional):Outputs:
top_jobs_df
(pd.DataFrame
):fetch_top_data_buyers_by_industry_custom(api_key, email, industry_name, top_n=100, search_keywords=None)
fetch_top_data_buyers_by_industry_custom(
api_key="YOUR_USAJOBS_API_KEY",
email="YOUR_EMAIL@example.com",
industry_name="Security/Tech",
top_n=30,
search_keywords=["software engineering", "cybersecurity", "cloud", "AI"])
Purpose:
Search live USAJobs postings using custom keywords and return top jobs matching a selected industry.
Inputs:
api_key
(str
):email
(str
):User-Agent
.industry_name
(str
):Medical
Finance
Marketing
Policy
Security/Tech
Other
top_n
(int
, default = 100):search_keywords
(list
, optional):Outputs:
top_buyers_df
(pd.DataFrame
):fetch_top_data_buyers_by_industry_auto(api_key, email, industry_name="Medical", top_n=100)
Purpose:
Search live USAJobs postings using a standard keyword list and return top jobs matching a selected industry.
Inputs:
api_key
(str
):email
(str
):User-Agent
.industry_name
(str
, default = "Medical"
):Medical
Finance
Marketing
Policy
Security/Tech
Other
top_n
(int
, default = 100):Outputs:
top_buyers_df
(pd.DataFrame
):Function | Input | Output |
---|---|---|
load_pipeline() | None | Scikit-learn pipeline |
preprocess_job_api_response() | job_json dict | Preprocessed DataFrame |
fetch_and_score_job() | job_id , api_key , email | Dict: score, title, agency |
search_job_ids_by_title() | position_title , api_key , email , max_results | List of job dicts |
batch_fetch_and_score_jobs() | List of titles, api_key , email | Results DataFrame |
fetch_and_score_top_by_use_case_auto() | api_key , email , use_case , top_n | Top jobs DataFrame |
fetch_top_data_buyers_by_industry_auto() | api_key , email , industry_name , top_n | Top buyers DataFrame |
fetch_and_score_top_by_use_case_custom() | api_key , email , use_case , top_n , search_keywords | Top jobs DataFrame |
fetch_top_data_buyers_by_industry_custom() | api_key , email , industry_name , top_n , search_keywords | Top buyers DataFrame |
Situation | Recommended Function |
---|---|
Load the trained machine learning model | load_pipeline() |
Preprocess a raw USAJobs API posting | preprocess_job_api_response(job_json) |
Score a job by specific USAJobs ID | fetch_and_score_job(job_id, api_key, email) |
Search by job title keyword | search_job_ids_by_title(position_title, api_key, email) |
Batch search and score multiple titles | batch_fetch_and_score_jobs(job_titles, api_key, email) |
Search broadly using default keywords and filter by use case | fetch_and_score_top_by_use_case_auto(api_key, email, use_case) |
Search broadly using default keywords and filter by industry | fetch_top_data_buyers_by_industry_auto(api_key, email, industry_name) |
Search with custom keywords and filter by use case | fetch_and_score_top_by_use_case_custom(api_key, email, use_case, search_keywords) |
Search with custom keywords and filter by industry | fetch_top_data_buyers_by_industry_custom(api_key, email, industry_name, search_keywords) |
You can install the package directly from PyPI:
pip install data-buyer-toolkit
Or for local development (editable mode with GitHub clone):
data_buyer_toolkit
for local development or usage inside Jupyter notebooks.First, clone the full project to your local machine:
git clone https://github.com/RoryQo/Public-Sector-Data-Demand_Research-Framework-For-Market-Analysis-And-Classification.git
cd Public-Sector-Data-Demand_Research-Framework-For-Market-Analysis-And-Classification
It is strongly recommended to use a virtual environment for this project.
conda create -n data-buyer-env python=3.10 -y
conda activate data-buyer-env
Inside the project root directory:
pip
-e
) so local changes are immediately reflected without reinstallingpip install --upgrade pip
pip install -e .
notebook
and ipykernel
pip install notebook ipykernel
python -m ipykernel install --user --name=data-buyer-env --display-name "Data Buyer Toolkit"
This project is licensed under the MIT License.
You are free to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software, subject to the following conditions:
For the full license text, see the LICENSE file included in this repository.
Contributions are welcome and encouraged!
If you would like to suggest improvements, add new features, or report bugs, please follow these guidelines:
A full Jupyter notebook with hands-on examples is provided to demonstrate the data_buyer_toolkit
in action.
📂 Access it here: examples/usage_examples.ipynb
DataBuyerScore
.examples/usage_examples.ipynb
after installing the package.The notebook provides a practical guide for integrating the toolkit into custom workflows for real-time scoring, lead generation, and market targeting.
FAQs
Toolkit for identifying third-party data buyers in U.S. federal job postings from USAJobs.
We found that data-buyer-toolkit demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
Security News
Ruby maintainers from Bundler and rbenv teams are building rv to bring Python uv's speed and unified tooling approach to Ruby development.
Security News
Following last week’s supply chain attack, Nx published findings on the GitHub Actions exploit and moved npm publishing to Trusted Publishers.