
Security News
Vite Releases Technical Preview of Rolldown-Vite, a Rust-Based Bundler
Vite releases Rolldown-Vite, a Rust-based bundler preview offering faster builds and lower memory usage as a drop-in replacement for Vite.
Easily retrieve UniProt data and map protein identifiers using this Python package for UniProt's Retrieve & ID Mapping RESTful APIs. Read the full documentation.
UniProtMapper is a tool for bioinformatics and proteomics research that supports:
~
(NOT), |
(OR), &
(AND).For the first two functionalities, check the examples Mapping IDs and Retrieving Information below. The third, see Field-based Querying.
The ID mapping API can also be accessed through the CLI. For more information, check CLI.
python -m pip install uniprot-id-mapper
python -m pip install git+https://github.com/David-Araripe/UniProtMapper.git
git clone https://github.com/David-Araripe/UniProtMapper
cd UniProtMapper
python -m pip install .
Use UniProtMapper to easily map between different protein identifiers:
from UniProtMapper import ProtMapper
mapper = ProtMapper()
result, failed = mapper.get(
ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="Ensembl"
)
The result
is a pandas DataFrame containing the mapped IDs (see below), while failed
is a list of identifiers that couldn't be mapped.
UniProtKB_AC-ID | Ensembl | |
---|---|---|
0 | P30542 | ENSG00000163485.17 |
1 | Q16678 | ENSG00000138061.12 |
2 | Q02880 | ENSG00000077097.17 |
A DataFrame with the supported return fields is accessible through the attribute ProtMapper.fields_table
:
from UniProtMapper import ProtMapper
mapper = ProtMapper()
df = mapper.fields_table
df.head()
label | returned_field | field_type | has_full_version | type | |
---|---|---|---|---|---|
0 | Entry | accession | Names & Taxonomy | - | uniprot_field |
1 | Entry Name | id | Names & Taxonomy | - | uniprot_field |
2 | Gene Names | gene_names | Names & Taxonomy | - | uniprot_field |
3 | Gene Names (primary) | gene_primary | Names & Taxonomy | - | uniprot_field |
4 | Gene Names (synonym) | gene_synonym | Names & Taxonomy | - | uniprot_field |
From the DataFrame, all return_field
entries can be used to access UniProt data programmatically:
# To retrieve the default fields:
result, failed = mapper.get(["Q02880"])
>>> Fetched: 1 / 1
# Retrieve custom fields:
fields = ["accession", "organism_name", "structure_3d"]
result, failed = mapper.get(["Q02880"], fields=fields)
>>> Fetched: 1 / 1
Further, for the cross-referenced fields that have has_full_version
set to yes
, returning the same field with extra information is supported by passing <field_name>_full
, such as xref_pdb_full
.
All available return fields are also accessible through the attribute ProtMapper.supported_return_fields
:
from UniProtMapper import ProtMapper
mapper = ProtMapper()
print(mapper.supported_return_fields)
>>> ['accession',
>>> 'id',
>>> 'gene_names',
>>> ...
>>> 'xref_smart_full',
>>> 'xref_supfam_full']
UniProtMapper supports complex field-based protein queries using boolean operators (AND, OR, NOT) through the uniprotkb_fields
module. This allows you to create sophisticated searches combining multiple criteria. For example:
from UniProtMapper import ProtKB
from UniProtMapper.uniprotkb_fields import (
organism_name,
length,
reviewed,
date_modified
)
# Find reviewed human proteins with length between 100-200 amino acids
# that were modified after January 1st, 2024
query = (
organism_name("human") &
reviewed(True) &
length(100, 200) &
date_modified("2024-01-01", "*")
)
protkb = ProtKB()
result = protkb.get(query)
For a list of all fields and their descriptions, check the API reference for the uniprotkb_fields module reference.
UniProtMapper provides a CLI for the ID Mapping class, ProtMapper
, for easy access to lookups and data retrieval. Here is a list of the available arguments, shown by protmap -h
:
usage: UniProtMapper [-h] -i [IDS ...] [-r [RETURN_FIELDS ...]] [--default-fields] [-o OUTPUT]
[-from FROM_DB] [-to TO_DB] [-over] [-pf]
Retrieve data from UniProt using UniProt's RESTful API. For a list of all available fields, see: https://www.uniprot.org/help/return_fields
Alternatively, use the --print-fields argument to print the available fields and exit the program.
optional arguments:
-h, --help show this help message and exit
-i [IDS ...], --ids [IDS ...]
List of UniProt IDs to retrieve information from. Values must be
separated by spaces.
-r [RETURN_FIELDS ...], --return-fields [RETURN_FIELDS ...]
If not defined, will pass `None`, returning all available fields.
Else, values should be fields to be returned separated by spaces. See
--print-fields for available options.
--default-fields, -def
This option will override the --return-fields option. Returns only the
default fields stored in: <pkg_path>/resources/cli_return_fields.txt
-o OUTPUT, --output OUTPUT
Path to the output file to write the returned fields. If not provided,
will write to stdout.
-from FROM_DB, --from-db FROM_DB
The database from which the IDs are. For the available cross
references, see: <pkg_path>/resources/uniprot_mapping_dbs.json
-to TO_DB, --to-db TO_DB
The database to which the IDs will be mapped. For the available cross
references, see: <pkg_path>/resources/uniprot_mapping_dbs.json
-over, --overwrite If desired to overwrite an existing file when using -o/--output
-pf, --print-fields Prints the available return fields and exits the program.
Usage example, retrieving default fields from <pkg_path>/resources/cli_return_fields.txt
:
For issues, feature requests, or questions, please open an issue on the GitHub repository.
FAQs
A Python wrapper for the UniProt Mapping RESTful API.
We found that uniprot-id-mapper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Vite releases Rolldown-Vite, a Rust-based bundler preview offering faster builds and lower memory usage as a drop-in replacement for Vite.
Research
Security News
A malicious npm typosquat uses remote commands to silently delete entire project directories after a single mistyped install.
Research
Security News
Malicious PyPI package semantic-types steals Solana private keys via transitive dependency installs using monkey patching and blockchain exfiltration.