🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
DemoInstallSign in
Socket

uniprot-id-mapper

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

uniprot-id-mapper

A Python wrapper for the UniProt Mapping RESTful API.

1.1.4
PyPI
Maintainers
1

License: MIT Ruff Code style: black Imports: isort GitHub Actions Downloads:PyPI

UniProtMapper

Easily retrieve UniProt data and map protein identifiers using this Python package for UniProt's Retrieve & ID Mapping RESTful APIs. Read the full documentation.

📚 Table of Contents

⛏️ Features

UniProtMapper is a tool for bioinformatics and proteomics research that supports:

  • Mapping any UniProt cross-referenced IDs to other identifiers & vice-versa;
  • Programmatically retrieving any of the supported return and cross-reference fields from both UniProt-SwissProt and UniProt-TrEMBL (unreviewed) databases. For a full table containing all the supported resources, refer to the supported fields in the docs;
  • Querying UniProtKB entries using complex field-based queries with boolean operators ~ (NOT), | (OR), & (AND).

For the first two functionalities, check the examples Mapping IDs and Retrieving Information below. The third, see Field-based Querying.

The ID mapping API can also be accessed through the CLI. For more information, check CLI.

📦 Installation

python -m pip install uniprot-id-mapper

Directly from GitHub:

python -m pip install git+https://github.com/David-Araripe/UniProtMapper.git

From source:

git clone https://github.com/David-Araripe/UniProtMapper
cd UniProtMapper
python -m pip install .

🛠️ Usage

Mapping IDs

Use UniProtMapper to easily map between different protein identifiers:

from UniProtMapper import ProtMapper

mapper = ProtMapper()

result, failed = mapper.get(
    ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="Ensembl"
)

The result is a pandas DataFrame containing the mapped IDs (see below), while failed is a list of identifiers that couldn't be mapped.

UniProtKB_AC-IDEnsembl
0P30542ENSG00000163485.17
1Q16678ENSG00000138061.12
2Q02880ENSG00000077097.17

Retrieving Information

A DataFrame with the supported return fields is accessible through the attribute ProtMapper.fields_table:

from UniProtMapper import ProtMapper

mapper = ProtMapper()
df = mapper.fields_table
df.head()
labelreturned_fieldfield_typehas_full_versiontype
0EntryaccessionNames & Taxonomy-uniprot_field
1Entry NameidNames & Taxonomy-uniprot_field
2Gene Namesgene_namesNames & Taxonomy-uniprot_field
3Gene Names (primary)gene_primaryNames & Taxonomy-uniprot_field
4Gene Names (synonym)gene_synonymNames & Taxonomy-uniprot_field

From the DataFrame, all return_field entries can be used to access UniProt data programmatically:

# To retrieve the default fields:
result, failed = mapper.get(["Q02880"])
>>> Fetched: 1 / 1

# Retrieve custom fields:
fields = ["accession", "organism_name", "structure_3d"]
result, failed = mapper.get(["Q02880"], fields=fields)
>>> Fetched: 1 / 1

Further, for the cross-referenced fields that have has_full_version set to yes, returning the same field with extra information is supported by passing <field_name>_full, such as xref_pdb_full.

All available return fields are also accessible through the attribute ProtMapper.supported_return_fields:

from UniProtMapper import ProtMapper
mapper = ProtMapper()
print(mapper.supported_return_fields)

>>> ['accession',
>>>  'id',
>>>  'gene_names',
>>>  ...
>>>  'xref_smart_full',
>>>  'xref_supfam_full']

Field-based Querying

UniProtMapper supports complex field-based protein queries using boolean operators (AND, OR, NOT) through the uniprotkb_fields module. This allows you to create sophisticated searches combining multiple criteria. For example:

from UniProtMapper import ProtKB
from UniProtMapper.uniprotkb_fields import (
    organism_name, 
    length, 
    reviewed, 
    date_modified
)

# Find reviewed human proteins with length between 100-200 amino acids
# that were modified after January 1st, 2024
query = (
    organism_name("human") & 
    reviewed(True) & 
    length(100, 200) & 
    date_modified("2024-01-01", "*")
)

protkb = ProtKB()
result = protkb.get(query)

For a list of all fields and their descriptions, check the API reference for the uniprotkb_fields module reference.

📖 Documentation

💻 Command Line Interface (CLI)

UniProtMapper provides a CLI for the ID Mapping class, ProtMapper, for easy access to lookups and data retrieval. Here is a list of the available arguments, shown by protmap -h:

usage: UniProtMapper [-h] -i [IDS ...] [-r [RETURN_FIELDS ...]] [--default-fields] [-o OUTPUT]
                     [-from FROM_DB] [-to TO_DB] [-over] [-pf]

Retrieve data from UniProt using UniProt's RESTful API. For a list of all available fields, see: https://www.uniprot.org/help/return_fields 

Alternatively, use the --print-fields argument to print the available fields and exit the program.

optional arguments:
  -h, --help            show this help message and exit
  -i [IDS ...], --ids [IDS ...]
                        List of UniProt IDs to retrieve information from. Values must be
                        separated by spaces.
  -r [RETURN_FIELDS ...], --return-fields [RETURN_FIELDS ...]
                        If not defined, will pass `None`, returning all available fields.
                        Else, values should be fields to be returned separated by spaces. See
                        --print-fields for available options.
  --default-fields, -def
                        This option will override the --return-fields option. Returns only the
                        default fields stored in: <pkg_path>/resources/cli_return_fields.txt
  -o OUTPUT, --output OUTPUT
                        Path to the output file to write the returned fields. If not provided,
                        will write to stdout.
  -from FROM_DB, --from-db FROM_DB
                        The database from which the IDs are. For the available cross
                        references, see: <pkg_path>/resources/uniprot_mapping_dbs.json
  -to TO_DB, --to-db TO_DB
                        The database to which the IDs will be mapped. For the available cross
                        references, see: <pkg_path>/resources/uniprot_mapping_dbs.json
  -over, --overwrite    If desired to overwrite an existing file when using -o/--output
  -pf, --print-fields   Prints the available return fields and exits the program.

Usage example, retrieving default fields from <pkg_path>/resources/cli_return_fields.txt:

Image displaying the output of UniProtMapper's CLI, protmap

👏🏼 Credits

For issues, feature requests, or questions, please open an issue on the GitHub repository.

Keywords

uniprot

FAQs

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts