Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

bio-data-to-db

Package Overview
Dependencies
Maintainers
1
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

bio-data-to-db

Create DB from bio data

  • 0.1.5
  • PyPI
  • Socket score

Maintainers
1

bio-data-to-db: make Uniprot PostgreSQL database

image PyPI - Downloads image image

Ruff rustfmtActions status
Ruff ClippyActions status
doctestActions status
uvActions status
Built with Material for MkDocsActions status

Bio data is a little messy to work with, and everybody deserves a clean database. This package helps you to convert bio data to a database.

Partially written in Rust, thus equipped with an extremely fast uniprot xml parser. Packaged for python, so anyone can easily install and use it.

This package focuses more on parsing the data and inserting it into the database, rather than curating the data. Main features include:

  • Uniprot: Parse the uniprot xml file and insert the data into the database.
  • BindingDB: Fix the HTML entities in the assay table.
  • PostgreSQL Helpers: Useful functions to work with PostgreSQL.
  • SMILES: Canonicalize SMILES strings.
  • Polars: Useful functions to work with Polars.

📚 Documentation has the API reference and the usage.

🛠️ Installation

pip install bio-data-to-db

🚦 Usage

You can use the command line interface or the python API.

Uniprot

# It will create a db 'uniprot' and a table named 'public.uniprot_info' in the database.
# If you want another name, you can optionally pass it as the last argument.
bio-data-to-db uniprot create-empty-table 'postgresql://username@localhost:5432/uniprot'

# It will parse the xml file and insert the data into the table.
# It requires that the table is already created.
bio-data-to-db uniprot xml-to-postgresql '~/Downloads/uniprot_sprot.xml' 'postgresql://username@localhost:5432/uniprot'

# Create a table that maps accession to pk_id.
# Columns: accession (text), uniprot_pk_ids (integer[])
bio-data-to-db uniprot create-accession-to-pk-id 'postgresql://username@localhost:5432/uniprot'

# It will parse the keywords tsv file and insert the data into the table.
bio-data-to-db uniprot keywords-tsv-to-postgresql '~/Downloads/keywords_all_2024_06_26.tsv' 'postgresql://username@localhost/uniprot'
from bio_data_to_db.uniprot import (
    create_accession_to_pk_id_table,
    create_empty_table,
    uniprot_xml_to_postgresql,
    keywords_tsv_to_postgresql,
)

create_empty_table("postgresql://user:password@localhost:5432/uniprot")
# It requires that the table is already created.
uniprot_xml_to_postgresql("~/Downloads/uniprot_sprot.xml", "postgresql://user:password@localhost:5432/uniprot")
create_accession_to_pk_id_table("postgresql://user:password@localhost:5432/uniprot")
keywords_tsv_to_postgresql("~/Downloads/keywords_all_2024_06_26.tsv", "postgresql://user:password@localhost:5432/uniprot")

BindingDB

# Decode HTML entities and strip the strings in the `assay` table (column: description and assay_name).
# Currently, only assay table is supported.
bio-data-to-db bindingdb fix-table assay 'mysql://username:password@localhost/bind'
from bio_data_to_db.bindingdb.fix_tables import fix_assay_table

fix_assay_table("mysql://username:password@localhost/bind")

PostgreSQL Helpers, SMILES, Polars utils and more

This package also provides some useful functions to work with PostgreSQL, SMILES, Polars and more.

from bio_data_to_db.utils.postgresql import (
    create_db_if_not_exists,
    create_schema_if_not_exists,
    make_int_column_primary_key_identity,
    make_columns_primary_key,
    make_columns_unique,
    make_large_columns_unique,
    split_column_str_to_list,
    polars_write_database,  # addressed issues with list columns
)

from bio_data_to_db.utils.smiles import (
    canonical_smiles_wo_salt,
    polars_canonical_smiles_wo_salt,
)

from bio_data_to_db.utils.fasta import (
    polars_standardize_fasta,
)

from bio_data_to_db.utils.polars import (
    w_pbar,
)

You can find the usage in the 📚 documentation.

👨‍💻️ Maintenance Notes

Install from source

Install uv, rustup and maturin. Activate a virtual environment. Then,

bash scripts/install.sh
uv pip install -r deps/requirements_dev.in

Generate lockfiles

Use GitHub Actions: apply-pip-compile.yml. Manually launch the workflow and it will make a commit with the updated lockfiles.

Publish a new version to PyPI

Use GitHub Actions: deploy.yml. Manually launch the workflow and it will compile on all architectures and publish the new version to PyPI.

About sqlx

Sqlx offline mode should be configured so you can compile the code without a database present.

First, you need to make sure the database is connected to enable the offline mode.

# .env file
DATABASE_URL=postgresql://postgres:password@localhost:5432/uniprot
SQLX_OFFLINE=true  # force offline mode. Otherwise, use the DATABASE_URL to connect to the database.

The database needs to have the table. It can be empty but the structure has to be accessible.
If it doesn't already exist, you can create the table with the following command.

bio-data-to-db uniprot create-empty-table 'postgresql://username@localhost:5432/uniprot'

Then, you can run the following command to prepare the SQL queries. This defines the type information for the queries.

cargo install sqlx-cli --no-default-features --features postgres
cargo sqlx prepare

This will make .sqlx/ directory with the type information for the queries.

FAQs


Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc