Security News
Introducing the Socket Python SDK
The initial version of the Socket Python SDK is now on PyPI, enabling developers to more easily interact with the Socket REST API in Python projects.
A tool to assist in the automatic conversion of hca metadata to scea metadata MAGE-TAB files.
A tool to assist in the automatic conversion of an hca metadata spreadsheet to scea metadata MAGE-TAB files.
pip install hca2scea
The tool takes as input an HCA metadata spreadsheet and converts the metadata to SCEA MAGE-TAB files which are then saved to an output directory.
To run it as a package, after installing it via pip:
$ hca2scea -h
usage: hca2scea [-h] -s SPREADSHEET -id PROJECT_UUID -study STUDY [-name {cs_name,cs_id,sp_name,sp_id,other}] -ac
ACCESSION_NUMBER -c CURATORS [CURATORS ...] -et {baseline,differential} [-facs] -f EXPERIMENTAL_FACTORS
[EXPERIMENTAL_FACTORS ...] -pd PUBLIC_RELEASE_DATE -hd HCA_UPDATE_DATE [-o OUTPUT_DIR] [-zip]
run hca -> scea tool
optional arguments:
-h, --help show this help message and exit
-s SPREADSHEET, --spreadsheet SPREADSHEET
Please provide a path to the HCA project spreadsheet.
-id PROJECT_UUID, --project_uuid PROJECT_UUID
Please provide an HCA ingest project submission id.
-study STUDY Please provide the SRA or ENA study accession.
-name {cs_name,cs_id,sp_name,sp_id,other}
Please indicate which field to use as the sample name. cs=cell suspension, sp = specimen.
-ac ACCESSION_NUMBER, --accession_number ACCESSION_NUMBER
Provide an E-HCAD accession number. Please find the next suitable accession number by checking
the google tracker sheet.
-c CURATORS [CURATORS ...], --curators CURATORS [CURATORS ...]
space separated names of curators
-et {baseline,differential}, --experiment_type {baseline,differential}
Please indicate whether this is a baseline or differential experimental design
-facs Please specify if FACS was used to isolate single cells.
-f EXPERIMENTAL_FACTORS [EXPERIMENTAL_FACTORS ...], --experimental_factors EXPERIMENTAL_FACTORS [EXPERIMENTAL_FACTORS ...]
space separated list of experimental factors
-pd PUBLIC_RELEASE_DATE, --public_release_date PUBLIC_RELEASE_DATE
Please enter the public release date in this format: YYYY-MM-DD
-hd HCA_UPDATE_DATE, --hca_update_date HCA_UPDATE_DATE
Please enter the last time the HCA prohect submission was updated in this format: YYYY-MM-DD
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Provide full path to preferred output dir
-zip, --zip_format Please indicate whether you would like the script to output alltxt files separately or together
in 1 zip file.```
To run it as a python module:
```shell script
cd /path-to/hca2scea
python -m hca-to-scea-tools.hca_to_scea.hca2scea -h
Argument | Argument name | Description | Required? |
---|---|---|---|
-s | HCA spreadsheet | Path to HCA spreadsheet (.xlsx) | yes |
-id | HCA project uuid | This is added to the 'secondary accessions' field in idf file | yes |
-c | Curator initials | HCA Curator initials. Space-separated list. | yes |
-ac | accession number | Provide an SCEA accession number (integer). | yes |
-et | Experiment type | Must be 1 of [differential,baseline] | yes |
-f | Factor value | A space-separated list of user-defined factor values e.g. age disease | yes |
-pd | Dataset publication date | provide in YYYY-MM-DD E.g. from GEO | yes |
-hd | HCA last update date | provide in YYYY-MM-DD The last time the HCA project was updated in ingest UI (production) | yes |
-study | study accession (SRPxxx) | The study accession will be used to find the paths to the fastq files for the given runs | yes |
-name | HCA name field | Which HCA field to use for the biomaterial names columns. Must be 1 of | no |
[cs_name, cs_id, sp_name, sp_id, other] where cs indicates cell suspension and sp indicates | |||
specimen from organism. Default is cs_name. | |||
-facs | optional argument | If FACS was used as a single cell isolation method, indicate this by adding the -facs argument. | no |
-o | optional argument | An output dir path can optionally be provided. If it does not exist, it will be created. | no |
-zip | optional argument | Indicate if you would like the resulting output files to be output in a single zip file. | no |
Factor values
A factor value is a chosen experimental characteristic which can be used to group or differentiate samples. Multiple factor values can be entered and should be chosen from the following list.
There must be at least 1 factor value. If you cannot identify a factor value i.e. all donors and samples share the same metadata with respect to the above list of factor values, then enter 'Individual'.
Datasets with more than 1 technolgoy type are not eligible for SCEA. Therefore, technology type is not a valid factor value.
Experiment type
An experiment with samples which can be grouped or differentiatied by a factor value is classified as 'differential'. The list of possible factor values can be found above.
If 1 or more factor values other than 'Individual' is identified, then the experiment type should be 'Differential'. If the only factor value is 'Individual', then the experiment type should be 'Baseline'.
Related E-HCAD-ID
If the project has been split into two separate E-HCAD datasets, due to different technologies being used in the same project, or any other reason, then enter the E-HCAD-ID for the other dataset here. Example: E-HCAD-50.
Required arguments only
python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12
Specify optional name argument
python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -name cs_name -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12
Specify that FACS was used
python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12 -facs
Specify optional output dir
python3 hca2scea.py -s /home/aday/GSE111976-endometrium_MC_SCEA.xlsx -id 379ed69e-be05-48bc-af5e-a7fc589709bf -study SRP135922 -ac 50 -c AD -et differential -f menstrual cycle day -pd 2021-06-29 -hd 2021-02-12 -o my_output_dir
Using pip
's editable mode, projects using hca-to-scea as a dependency can refer to the latest code in this repository
directly without installing it through PyPI. This can be done either by manually cloning the code
base:
pip install -e path/to/hca2scea
or by having pip
do it automatically by providing a reference to this repository:
pip install -e \
git+https://github.com/ebi-ait/hca-to-scea-tools.git\
#egg=hca2scea
Create PyPI Account through the registration page.
Take note that PyPI requires email addresses to be verified before publishing.
Package the project for distribution.
python setup.py sdist
Install Twine
pip install twine
Upload the distribution package to PyPI.
twine upload dist/*
Running python setup.py sdist
will create a package in the dist
directory of the project
base directory.
FAQs
A tool to assist in the automatic conversion of hca metadata to scea metadata MAGE-TAB files.
We found that hca2scea demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
The initial version of the Socket Python SDK is now on PyPI, enabling developers to more easily interact with the Socket REST API in Python projects.
Security News
Floating dependency ranges in npm can introduce instability and security risks into your project by allowing unverified or incompatible versions to be installed automatically, leading to unpredictable behavior and potential conflicts.
Security News
A new Rust RFC proposes "Trusted Publishing" for Crates.io, introducing short-lived access tokens via OIDC to improve security and reduce risks associated with long-lived API tokens.