
Research
2025 Report: Destructive Malware in Open Source Packages
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.
The Australian government's data.gov.au website references a growing abundance of public and open data government data resources - more than 3700 datasets at the time of writing. While in some cases, data.gov.au provides an API to access a dataset, it doesn't always. For this reason and others, there are often advantages in downloading the data for local use or to re-package it. The dga-sync utility eases the task of synchronising that data to a local file system.
dga-sync uses the JSON metadata stored on data.gov.au for each dataset to ensure that data files are only downloaded if they are newer than what has previously been downloaded. A local copy of the metadata is also stored.
npm install dga-sync
For each data.gov.au dataset, there is a JSON metadata file (accessed from the JSON button on the web page) that leads to a URL of the following form:
http://data.gov.au/api/3/action/package_show?id=23218e8f-babe-4e37-81d1-5424a4d1c568
Use the id parameter to identify the package to sync:
var sync = require('dga-sync');
sync.syncByPackageId('23218e8f-babe-4e37-81d1-5424a4d1c568');
This is what the console output looks like (actual output is colourised where supported):
fetching metadata for package ID: 23218e8f-babe-4e37-81d1-5424a4d1c568
found: "Public Barbeques"
reply lists 5 resources:
barbeque.kmz "2014 Public Barbeques" @ 2014-09-16T02:05:54.523Z
wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv "Public Barbeques CSV" @ 2014-09-16T02:05:54.523Z
wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json "Public Barbeques GeoJSON" @ 2014-09-16T02:05:54.523Z
wms?request=GetCapabilities "Public Barbeques - Preview this Dataset (WMS)" @ 2014-09-16T02:05:54.523Z
wfs?request=GetCapabilities "Public Barbeques Web Feature Service API Link" @ 2014-09-16T02:05:54.523Z
preparing to download barbeque.kmz
preparing to download wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv
preparing to download wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json
preparing to download wms?request=GetCapabilities
preparing to download wfs?request=GetCapabilities
downloading completed
.. moving data/._DGA_DOWNLOAD_barbeque.kmz to data/barbeque.kmz
.. moving data/._DGA_DOWNLOAD_wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv to data/wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv
.. moving data/._DGA_DOWNLOAD_wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json to data/wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json
.. moving data/._DGA_DOWNLOAD_wms?request=GetCapabilities to data/wms?request=GetCapabilities
.. moving data/._DGA_DOWNLOAD_wfs?request=GetCapabilities to data/wfs?request=GetCapabilities
writing download metadata to: data/._METADATA_.json
At this point, a directory called data under the current working directory
will have been created and will contain the downloaded resources plus a metadata
file created by dga-sync:
$ ls -lhA data
total 744K
-rw-r--r-- 1 sam sam 44K Sep 24 11:14 barbeque.kmz
-rw-r--r-- 1 sam sam 6.0K Sep 24 11:15 ._METADATA_.json
-rw-r--r-- 1 sam sam 72K Sep 24 11:15 wfs?request=GetCapabilities
-rw-r--r-- 1 sam sam 95K Sep 24 11:14 wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv
-rw-r--r-- 1 sam sam 384K Sep 24 11:14 wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json
-rw-r--r-- 1 sam sam 139K Sep 24 11:14 wms?request=GetCapabilities
The metadata file will ensure that next time we check, only newer resources than we already have will be downloaded, saving on bandwidth.
As you can see from above, all resources are downloaded by default. This can
be changed by adding an idFilter regex option. So if we only want the KMZ
files in our example:
sync.syncByPackageId(
'23218e8f-babe-4e37-81d1-5424a4d1c568',
{
idFilter: /.*\.kmz$/,
deleteUnlisted: true
}
);
The use of deleteUnlisted is optional - it tells dga-sync to delete
previously downloaded files now excluded by the filter. The contents of
data is now:
$ ls -lhA data
total 48K
-rw-r--r-- 1 sam sam 44K Sep 24 11:14 barbeque.kmz
-rw-r--r-- 1 sam sam 1.4K Sep 24 11:26 ._METADATA_.json
There is currently only one method:
syncByPackageId(packageId, options, andThen)
packageId - the ID of the package/dataset
options - an object with the following options:
idFieldName - specifies the field in a resource to use as the resource
ID [default: 'url']
idCanonicaliser - a function that takes the resource ID (according to the
idFieldName option) and creates a canonical ID for future comparison
in later sync operations [default: split the ID at '/'s and use use the last part:
this assumes that idFieldName is the default value of 'url']
idFilter - applied to the (canonicalised) resource ID to choose which
resources will be synced [default: undefined - that is, accept all IDs]
dataDestination - the directory to store the downloaded resources in
deleteUnlisted - boolean: true means delete extraneous files in the
destination directory that don't correspond to a resource IDs in the
filtered list [default: false]
andThen(err) - optional callback, where err is any error encountered that
prevented successful completion
FAQs
Sync datasets from data.gov.au, the Australian government's open data website.
We found that dga-sync demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
Destructive malware is rising across open source registries, using delays and kill switches to wipe code, break builds, and disrupt CI/CD.

Security News
Socket CTO Ahmad Nassri shares practical AI coding techniques, tools, and team workflows, plus what still feels noisy and why shipping remains human-led.

Research
/Security News
A five-month operation turned 27 npm packages into durable hosting for browser-run lures that mimic document-sharing portals and Microsoft sign-in, targeting 25 organizations across manufacturing, industrial automation, plastics, and healthcare for credential theft.