Security News
PyPI’s New Archival Feature Closes a Major Security Gap
PyPI now allows maintainers to archive projects, improving security and helping users make informed decisions about their dependencies.
impc_api
is a Python package which provides several helper functions that wrap around the IMPC SOLR API.
The functions in this package are intended for use in a Jupyter Notebook.
python3 -m venv .venv
source .venv/bin/activate
pip install impc_api
pip install jupyter
jupyter notebook
After executing the command, the Jupyter interface should open in your browser. If it does not, follow the instructions provided in the terminal.
Create a Jupyter Notebook and try some of the examples below:
The available functions can be imported as:
from impc_api import solr_request, batch_solr_request
The most basic request to the IMPC solr API
num_found, df = solr_request(
core='genotype-phenotype',
params={
'q': '*:*',
'rows': 10,
'fl': 'marker_symbol,allele_symbol,parameter_stable_id'
}
)
solr_request
allows facet requests
num_found, df = solr_request(
core="genotype-phenotype",
params={
"q": "*:*",
"rows": 0,
"facet": "on",
"facet.field": "zygosity",
"facet.limit": 15,
"facet.mincount": 1,
}
)
A common pitfall when writing a query is the misspelling of core
and fields
arguments. For this, we have included a validate
argument that raises a warning when these values are not as expected. Note this does not prevent you from executing a query; it just alerts you to a potential issue.
num_found, df = solr_request(
core='invalid_core',
params={
'q': '*:*',
'rows': 10
},
validate=True
)
> InvalidCoreWarning: Invalid core: "genotype-phenotyp", select from the available cores:
> dict_keys(['experiment', 'genotype-phenotype', 'impc_images', 'phenodigm', 'statistical-result'])
num_found, df = solr_request(
core='genotype-phenotype',
params={
'q': '*:*',
'rows': 10,
'fl': 'invalid_field,marker_symbol,allele_symbol'
},
validate=True
)
> InvalidFieldWarning: Unexpected field name: "invalid_field". Check the spelling of fields.
> To see expected fields check the documentation at: https://www.ebi.ac.uk/mi/impc/solrdoc/
batch_solr_request
is available for large queries. This solves issues where a request is too large to fit into memory or where it puts a lot of strain on the API.
Use batch_solr_request
for:
json
or csv
format.For large queries you can choose between seeing them in a DataFrame or downloading them in json
or csv
format.
This will fetch your data using the API responsibly and return a Pandas DataFrame
When your request is larger than recommended and you have not opted for downloading the data, a warning will be presented and you should follow the instructions to proceed.
df = batch_solr_request(
core='genotype-phenotype',
params={
'q':'*:*'
},
download=False,
batch_size=30000
)
print(df.head())
When using the download=True
option, a file with the requested information will be saved as filename
. The format is selected based on the wt
parameter.
A DataFrame may be returned, provided it does not exceed the memory available on your laptop. If the DataFrame is too large, an error will be raised. For these cases, we recommend you read the downloaded file in batches/chunks.
df = batch_solr_request(
core='genotype-phenotype',
params={
'q':'*:*',
'wt':'csv'
},
download=True,
filename='geno_pheno_query',
batch_size=100000
)
print(df.head())
batch_solr_request
also allows to search multiple items in a list provided they belong to them same field.
Pass the list to the field_list
param and specify the type of fl
in field_type
.
# List of gene symbols
genes = ["Zfp580", "Firrm", "Gpld1", "Mbip"]
df = batch_solr_request(
core='genotype-phenotype',
params={
'q':'*:*',
'fl': 'marker_symbol,mp_term_name,p_value',
'field_list': genes,
'field_type': 'marker_symbol'
},
download = False
)
print(df.head())
This can be downloaded too:
# List of gene symbols
genes = ["Zfp580", "Firrm", "Gpld1", "Mbip"]
df = batch_solr_request(
core='genotype-phenotype',
params={
'q':'*:*',
'fl': 'marker_symbol,mp_term_name,p_value',
'field_list': genes,
'field_type': 'marker_symbol'
},
download = True,
filename='gene_list_query'
)
print(df.head())
FAQs
A package to facilitate making API requests to the IMPC Solr API
We found that impc-api demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
PyPI now allows maintainers to archive projects, improving security and helping users make informed decisions about their dependencies.
Research
Security News
Malicious npm package postcss-optimizer delivers BeaverTail malware, targeting developer systems; similarities to past campaigns suggest a North Korean connection.
Security News
CISA's KEV data is now on GitHub, offering easier access, API integration, commit history tracking, and automated updates for security teams and researchers.