
Security News
Attackers Are Hunting High-Impact Node.js Maintainers in a Coordinated Social Engineering Campaign
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.
jobsdb-scraper
Advanced tools
A customizable CLI job extraction tool for hk.jobsdb.com and th.jobsdb.com.
Due to critical limitations and bugs with the underlying web scraping framework (Ulixee),
this scraper is no longer being maintained.

About this scraper:
A few cool highlights:
Node.js version 20 or 22 and npm >=8.0.0 If not installed, go here to download it (npm should come bundled with it). You can check by doing node -v, npm -v to ensure you have installed the correct versions. To switch versions use nvm use <node_version>, or nvm alias default <node_version> if you want to set default node version. Warning, if you use the wrong node version you may get an error when trying to install.
While not strictly required, a residential IP address is highly recommended. Run this from your home for safest guarantees to avoid bot detection. If you must run from outside of home/work, I recommend using a residential IP proxy.
# 1. In your CLI install the package globally with, this may take a few minutes.
npm install -g jobsdb-scraper
# To find the max available pages to scrape for a given JobsDB search results url:
jobsdb-scraper maxPages <searchResultsUrl>
#For instructions on how to run the scraper (can take up to ~10m):
jobsdb-scraper scrape -h
# Warning. These operations are **not** thread-safe.
# Scrape 50 pages of jobs in Hong Kong and return results in ndjson and csv format
jobsdb-scraper hk.jobsdb.com/jobs -n 50 -f ndjson csv
# Scrape all Software Engineering jobs and return results in csv format, save to a folder called results from the current working directory.
jobsdb-scraper hk.jobsdb.com/Software-Engineer-jobs -f csv -n 'all' -s './results'
# Scrape all accounting jobs in Thailand and return results in ndjson format, set the output file name to "accounting_jobs"
jobsdb-scraper th.jobsdb.com/jobs-in-accounting -f ndjson -n 'all' --fn accounting_jobs
If you are spawning the scraper process from within a script (e.g. system call), and you want to kill the scraper. You can simply send the process a 'SIGINT', and wait for the 'exit' event to allow it to shutdown gracefully.
# This may take a few minutes.
npm install --save jobsdb-scraper
// Warning: These operations are **NOT** thread-safe.
import {scrapeJobsdb, findMaxPages} from 'jobsdb-scraper/dist/scrape_jobsdb.js';
import { ScrapeOptions } from 'jobsdb-scraper/dist/types.js';
import type { ScrapeStats } from 'jobsdb-scraper/dist/types.js';
(async () => {
const scrapeops = new ScrapeOptions(
//searchResultUrlString (required): The URL of the first page of search results to start scraping from.
'hk.jobsdb.com/jobs',
//numPages (optional): The number of pages to scrape, 'all' by default
1,
//saveDir (optional): The directory relative to the current working directory where you want to save results.
'./jobsdb-scrape-results',
//Export formats : The format(s) in which you want to save the results. Ndjson or csv or both. e.g. ['ndjson', 'csv'].
'ndjson',
//The name of the result file (optional, jobsdb-<region>-<num_pages>-<yyyy-MM-dd_HH-mm-ss>.<format> by default)
'my_scrape_results',
)
try {
//Promise will reject if invalid search results URL provided
const {maxPagesPromise, abortController : AbortController} = findMaxPages('hk.jobsdb.com/jobs')
console.log(`Max Pages in HK JobsDB: ${await maxPagesPromise}`)
/*If aborting do this instead
abortController.abort()
const pages = await maxPagesPromise (will resolve -1)
*/
//Promise will reject with message if any invalid scrape options
const {scrapeResultPromise, abortController : AbortController} = scrapeJobsdb(scrapeops)
const scrape_result = await scrapeResultPromise
/* Do instead of above line if you want to abort
abortController.abort()
const scrape_result = await maxPagesPromise (may be undefined or return results depending on when you abort)
*/
if(scrape_result){
//May be more than one result path if more than one export format is specified.
const { resultPaths, scrape_stats } = scrape_result
const { totalJobsScraped, totalPagesScraped }: ScrapeStats = scrape_stats
console.log(`Total Jobs Scraped: ${totalJobsScraped}`)
console.log(`Total Pages Scraped: ${totalPagesScraped}`)
console.log(`Results saved to: ${resultPaths}`);
}
} catch (error: any){
//handle any scraping error here
}
})();
The name format of the result file is jobsdb-<region>-<num_pages>-<yyyy-MM-dd_HH-mm-ss>.<format> by default and saved to <current_working_directory>/jobsdb_scrape_results by default. The results folder will be created if found not to exist. UTC time is used for the date. Jobs are not ordered.
FAQs
A customizable CLI job extraction tool for hk.jobsdb.com and th.jobsdb.com.
We found that jobsdb-scraper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Multiple high-impact npm maintainers confirm they have been targeted in the same social engineering campaign that compromised Axios.

Security News
Axios compromise traced to social engineering, showing how attacks on maintainers can bypass controls and expose the broader software supply chain.

Security News
Node.js has paused its bug bounty program after funding ended, removing payouts for vulnerability reports but keeping its security process unchanged.