
Company News
Socket Named Top Sales Organization by RepVue
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.
dictionaryutils
Advanced tools
python wrapper and metaschema for datadictionary. It can be used to:
Say you have a dictionary you are building locally and you want to see if it will pass the tests.
You can add a simple alias to your .bash_profile to enable a quick test command:
testdict() { docker run --rm -v $(pwd):/dictionary quay.io/cdis/dictionaryutils:master; }
Then from the directory containing the gdcdictionary directory run testdict.
If you wish to generate fake simulated data you can also do that with dictionaryutils and the data-simulator.
simdata() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master; /bin/bash -c "cd /dictionary/dictionaryutils; bash dockerrun.bash; cd /dictionary/dictionaryutils; poetry run python bin/simulate_data.py --path /dictionary/simdata $*; export SUCCESS=$?; cd /dictionary; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS "; }
Then from the directory containing the gdcdictionary directory run simdata and a folder will be created called simdata with the results of the simulator run. You can also pass in additional arguments to the data-simulator script such as simdata --max_samples 10.
The --max_samples argument will define a default number of nodes to simulate, but you can override it using the --node_num_instances_file argument. For example, if you create the following instances.json:
{
"case": 100,
"demographic": 100
}
Then run the following:
docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/bash -c "cd /dictionaryutils; bash dockerrun.bash; cd /dictionary/dictionaryutils; poetry run python bin/simulate_data.py --path /simdata/ --program workshop --project project1 --max_samples 10 --node_num_instances_file /dictionary/instances.json; export SUCCESS=$?; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS";
Then you'll get 100 each of case and demographic nodes and 10 each of everything else. Note that the above example also defines program and project names.
You can also run the simulator for an arbitrary json url with the --url parameter. The alias can be simplified to skip the set up of the parent directory virtual env (ie, skip the docker_run.bash):
simdataurl() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/bash -c "python /dictionaryutils/bin/simulate_data.py simulate --path /simdata/ $*; chmod -R a+rwX /simdata"; }
Then run simdataurl --url https://datacommons.example.com/schema.json.
It is possible to use a local build of the dictionaryutils Docker image instead of the master branch stored in quay.
From a local copy of the dictionaryutils repo, build and tag a Docker image, for example
docker build -t dictionaryutils-mytag .
Then use this image in any of the aliases and commands mentioned
above by replacing quay.io/cdis/dictionaryutils:master with dictionaryutils-mytag.
from dictionaryutils import DataDictionary
dict_fetch_from_remote = DataDictionary(url=URL_FOR_THE_JSON)
dict_loaded_locally = DataDictionary(root_dir=PATH_TO_SCHEMA_DIR)
import json
from dictionaryutils import dump_schemas_from_dir
with open('dump.json', 'w') as f:
json.dump(dump_schemas_from_dir('../datadictionary/gdcdictionary/schemas/'), f)
FAQs
Python wrapper and metaschema for datadictionary.
We found that dictionaryutils demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.

Security News
NIST will stop enriching most CVEs under a new risk-based model, narrowing the NVD's scope as vulnerability submissions continue to surge.

Company News
/Security News
Socket is an initial recipient of OpenAI's Cybersecurity Grant Program, which commits $10M in API credits to defenders securing open source software.