Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
This repository documents code used to gather, QC, standardize, and analyze data uploaded by institutes participating in AACR's Project GENIE (Genomics, Evidence, Neoplasia, Information, Exchange).
For more information about the AACR genie repository, visit the GitHub Pages site.
This package contains both R, Python and cli tools. These are tools or packages you will need, to be able to reproduce these results:
pip install -r requirements.txt
renv::install()
brew install java
brew install wget
One of the features of the aacrgenie
package is that is provides a local validation tool that GENIE data contributors and install and use to validate their files locally prior to uploading to Synapse.
pip install aacrgenie
genie -v
This will install all the necessary components for you to run the validator locally on all of your files, including the Synapse client. Please view the help to see how to run to validator.
genie validate -h
genie validate data_clinical_supp_SAGE.txt SAGE
Please view contributing guide to learn how to contribute to the GENIE package.
These are instructions on how you would develop and test the pipeline locally.
Make sure you have read through the GENIE Onboarding Docs and have access to all of the required repositories, resources and synapse projects for Main GENIE.
Be sure you are invited to the Synapse GENIE Admin team.
Make sure you are a Synapse certified user: Certified User - Synapse User Account Types
Clone this repo and install the package locally.
pip install -e .
pip install -r requirements.txt
pip install -r requirements-dev.txt
If you are having trouble with the above, try installing via pipenv
Specify a python version that is supported by this repo:
pipenv --python <python_version>
Activate your pipenv
:
pipenv shell
Configure the Synapse client to authenticate to Synapse.
~/.synapseConfig
file
[authentication]
authtoken = <PAT here>
export SYNAPSE_AUTH_TOKEN=<PAT here>
synapse login
Run the different pipelines on the test project. The --project_id syn7208886
points to the test project.
Validate all the files excluding vcf files:
python bin/input_to_database.py main --project_id syn7208886 --onlyValidate
Validate all the files:
python bin/input_to_database.py mutation --project_id syn7208886 --onlyValidate --genie_annotation_pkg ../annotation-tools
Process all the files aside from the mutation (maf, vcf) files. The mutation processing was split because it takes at least 2 days to process all the production mutation data. Ideally, there is a parameter to exclude or include file types to process/validate, but that is not implemented.
python bin/input_to_database.py main --project_id syn7208886 --deleteOld
Process the mutation data. Be sure to clone this repo: https://github.com/Sage-Bionetworks/annotation-tools and git checkout
the version of the repo pinned to the Dockerfile. This repo houses the code that re-annotates the mutation data with genome nexus. The --createNewMafDatabase
will create a new mutation tables in the test project. This flag is necessary for production data for two main reasons:
Tables
in the test synapse project and try again.python bin/input_to_database.py mutation --project_id syn7208886 --deleteOld --genie_annotation_pkg ../annotation-tools --createNewMafDatabase
Create a consortium release. Be sure to add the --test
parameter. Be sure to clone the cbioportal repo: https://github.com/cBioPortal/cbioportal and git checkout
the version of the repo pinned to the Dockerfile
python bin/database_to_staging.py Jan-2017 ../cbioportal TEST --test
Create a public release. Be sure to add the --test
parameter. Be sure to clone the cbioportal repo: https://github.com/cBioPortal/cbioportal and git checkout
the version of the repo pinned to the Dockerfile
python bin/consortium_to_public.py Jan-2017 ../cbioportal TEST --test
The production pipeline is run on Nextflow Tower and the Nextflow workflow is captured in nf-genie. It is wise to create an ec2 via the Sage Bionetworks service catalog to work with the production data, because there is limited PHI in GENIE.
FAQs
AACR Project GENIE ETL
We found that aacrgenie demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.