Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Python interface for the RCSB PDB Search API.
This package requires Python 3.7 or later.
Get it from PyPI:
pip install rcsbsearchapi
Or, download from GitHub
Full documentation available at readthedocs
To perform a "full-text" search for structures associated with the term "Hemoglobin", you can create a TextQuery
:
from rcsbsearchapi import TextQuery
# Search for structures associated with the phrase "Hemoglobin"
query = TextQuery(value="Hemoglobin")
# Execute the query by running it as a function
results = query()
# Results are returned as an iterator of result identifiers.
for rid in results:
print(rid)
To perform a search for specific structure or chemical attributes, you can create an AttributeQuery
.
from rcsbsearchapi import AttributeQuery
# Construct a query searching for structures from humans
query = AttributeQuery(
attribute="rcsb_entity_source_organism.scientific_name",
operator="exact_match", # Other operators include "contains_phrase", "exists", and more
value="Homo sapiens"
)
# Execute query and construct a list from results
results = list(query())
print(results)
Refer to the Search Attributes and Chemical Attributes documentation for a full list of attributes and applicable operators.
Alternatively, you can also construct attribute queries with comparative operators using the rcsb_attributes
object (which also allows for names to be tab-completed):
from rcsbsearchapi import rcsb_attributes as attrs
# Search for structures from humans
query = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
# Run query and construct a list from results
results = list(query())
print(results)
You can combine multiple queries using Python bitwise operators.
from rcsbsearchapi import rcsb_attributes as attrs
# Query for human epidermal growth factor receptor (EGFR) structures (UniProt ID P00533)
# with investigational or experimental drugs bound
q1 = attrs.rcsb_polymer_entity_container_identifiers.reference_sequence_identifiers.database_accession == "P00533"
q2 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
q3 = attrs.drugbank_info.drug_groups == "investigational"
q4 = attrs.drugbank_info.drug_groups == "experimental"
# Structures matching UniProt ID P00533 AND from humans
# AND (investigational OR experimental drug group)
query = q1 & q2 & (q3 | q4)
# Execute query and print first 10 ids
results = list(query())
print(results[:10])
These examples are in operator
syntax. You can also make queries in fluent
syntax. Learn more about both syntaxes and implementation details in Constructing and Executing Queries.
The list of supported search service types are listed in the table below. For more details on their usage, see Search Service Types.
Search service | QueryType |
---|---|
Full-text | TextQuery() |
Attribute (structure or chemical) | AttributeQuery() |
Sequence similarity | SequenceQuery() |
Sequence motif | SequenceMotifQuery() |
Structure similarity | StructSimilarityQuery() |
Structure motif | StructMotifQuery() |
Chemical similarity | ChemSimilarityQuery() |
Learn more about available search services on the RCSB PDB Search API docs.
A runnable jupyter notebook is available in notebooks/quickstart.ipynb, or can be run online using Google Colab:
An additional Covid-19 related example is in notebooks/covid.ipynb:
The following table lists the status of current and planned features.
contains
, in_
(fluent only)Contributions are welcome for unchecked items!
Code is licensed under the BSD 3-clause license. See LICENSE for details.
Please cite the rcsbsearchapi package by URL:
You should also cite the RCSB PDB service this package utilizes:
Yana Rose, Jose M. Duarte, Robert Lowe, Joan Segura, Chunxiao Bi, Charmi Bhikadiya, Li Chen, Alexander S. Rose, Sebastian Bittrich, Stephen K. Burley, John D. Westbrook. RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive, Journal of Molecular Biology, 2020. DOI: 10.1016/j.jmb.2020.11.003
The source code for this project was originally written by Spencer Bliven and forked from sbliven/rcsbsearch. We would like to express our tremendous gratitude for his generous efforts in designing such a comprehensive public utility Python package for interacting with the RCSB PDB search API.
For information about building and developing rcsbsearchapi
, see
CONTRIBUTING.md
FAQs
Python package interface for the RCSB PDB search API service
We found that rcsbsearchapi demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.