UniProt Database Web Parser Project
TLDR: This parser can be used to parse UniProt accession id and obtain related data from the UniProt web database.
To use:
python -m pip install uniprotparser
or
python3 -m pip install uniprotparser
With version 1.2.0, we have exposed to
and from
mapping parameters for UniProt API where you can indicate which database you want to map to and from.
from uniprotparser import get_from_fields, get_to_fields
from_fields = get_from_fields()
print(from_fields)
to_fields = get_to_fields()
print(to_fields)
These parameters can be passed to the parse
method of the UniprotParser
class as follow
from uniprotparser.betaparser import UniprotParser
parser = UniprotParser()
for p in parser.parse(ids=["P06493"], to_key="UniProtKB", from_key="UniProtKB_AC-ID"):
print(p)
With version 1.1.0, a simple CLI interface has been added to the package.
Usage: uniprotparser [OPTIONS]
Options:
-i, --input FILENAME Input file containing a list of accession ids
-o, --output FILENAME Output file
--help Show this message and exit.
With version 1.0.5, support for asyncio through aiohttp
has been added to betaparser
. Usage can be seen as follow
from uniprotparser.betaparser import UniprotParser
from io import StringIO
import asyncio
import pandas as pd
async def main():
example_acc_list = ["Q99490", "Q8NEJ0", "Q13322", "P05019", "P35568", "Q15323"]
parser = UniprotParser()
df = []
async for r in parser.parse_async(ids=example_acc_list):
df.append(pd.read_csv(StringIO(r), sep="\t"))
if len(df) > 0:
df = pd.concat(df, ignore_index=True)
else:
df = df[0]
asyncio.run(main())
With version 1.0.2, support for the new UniProt REST API have been added under betaparser
module of the package.
In order to utilize this new module, you can follow the example bellow
from uniprotparser.betaparser import UniprotParser
from io import StringIO
import pandas as pd
example_acc_list = ["Q99490", "Q8NEJ0", "Q13322", "P05019", "P35568", "Q15323"]
parser = UniprotParser()
df = []
for r in parser.parse(ids=example_acc_list):
df.append(pd.read_csv(StringIO(r), sep="\t"))
if len(df) > 0:
df = pd.concat(df, ignore_index=True)
else:
df = df[0]
To parse UniProt accession with legacy API
from uniprotparser.parser import UniprotSequence
protein_id = "seq|P06493|swiss"
acc_id = UniprotSequence(protein_id, parse_acc=True)
acc_id.accession
acc_id.isoform
To get additional data from UniProt online database
from uniprotparser.parser import UniprotParser
from io import StringIO
import pandas as pd
protein_accession = "P06493"
parser = UniprotParser([protein_accession])
result = []
for i in parser.parse("tab"):
tab_data = pd.read_csv(i, sep="\t")
last_column_name = tab_data.columns[-1]
tab_data.rename(columns={last_column_name: "query"}, inplace=True)
result.append(tab_data)
fin = pd.concat(result, ignore_index=True)
with open("fasta_output.fasta", "wt") as fasta_output:
for i in parser.parse():
fasta_output.write(i)