Product
Introducing License Enforcement in Socket
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
|ci|
Sample data generation is a common step used for testing and verifying new and existing features that make use of the data commons dictionary. Without validation tools, this step can be super hard and prone to errors. This project aims to provide tooling that helps with generating and visualizing sample data. It is dictionary agnostic, so should work for any given gdc compatible dictionary.
Sample data graphs are represented using a customized GraphML_ format which can be represented in either json or yaml files. This projects provides tools for creating this schema based on selected dictionary and validating data that is targeting this schema.
psqlgml aims to provide the following for projects that makes use of psqlgraph_:
from pypi
.. code-block:: bash
$ pip install psqlgml
Command Line ++++++++++++ .. code-block:: bash
# install
$ pip install psqlgml
# validate install
$ psqlgml --help
# generate internal schema to aid validation
$ psqlgml generate -v 2.4.0 -n test_dictionary
# validation
$ psqlgml validate --help
# visualize
$ psqlgml visualize --help
API +++ .. code-block:: python
import psqlgml
# load the default dictionary
dictionary: psqlgml.Dictionary = psqlgml.load(version="2.3.0")
This is a customized GraphML_ format based on JSON schema. It allows graphs to be represented as a set of nodes and edges. The schema makes it possible to validate a sample data.
.. code-block:: yaml
unique_field: node_id
nodes:
- label: program
node_id: p_1
name: SM-KD
- label: project
node_id: pr_1
edges:
- src: p_1
dst: pr_1
label: programs
This example creats two nodes Program
and Project
that are linked together using the node_id
property. The name of the edge connecting them is programs
psqlgml can be used to generate dictionary specific schemas using exposed command line scripts. By default, gdcdictionary_ is assumed but parameters can be updated to work with a different project.
Generate schema using version 2.4.0 of the gdcdictionary
.. code-block::
psqlgml generate -v 2.4.0 -n gdcdictionary
The generated schema can be used for validating sample data. It can also be added to IDEs like PyCharm for intellisense while creating sample data.
.. code-block::
$ psqlgml validate -f sample.yaml --data-dir <resource dir> -d <dictionary name> -v <dictionary version>
The following validations are currently supported:
JSON Schema Validation ++++++++++++++++++++++ Checks the sample data is compliant with the dictionary. It validates things like:
Duplicate Definition Validation +++++++++++++++++++++++++++++++ Raises an error whenever a unique id is used for more than one node
Undefined Link Validation +++++++++++++++++++++++++ This is raised as a warning, since it is very possible to link to nodes not defined with the sample data. For example, appending data to an existing database.
Association Validation ++++++++++++++++++++++ Raises an error whenever an edge exists between nodes that the dictionary does not define an edge for.
.. |ci| image:: https://app.travis-ci.com/NCI-GDC/psqlgml.svg?token=5s3bZRahNJnkspYEMwZC&branch=master :target: https://app.travis-ci.com/github/NCI-GDC/psqlgml/branches :alt: build .. |action| image:: https://img.shields.io/github/workflow/status/kulgan/psqlgml/psqlgml-ci :target: https://github.com/kulgan/psqlgml/actions :alt: psqlgml ci .. _graphviz: https://graphviz.org/ .. _GraphML: http://graphml.graphdrawing.org/primer/graphml-primer.html .. _gdcdictionary: https://github.com/NCI-GDC/gdcdictionary .. _psqlgraph: https://github.com/NCI-GDC/psqlgraph
FAQs
Unknown package
We found that psqlgml demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Ensure open-source compliance with Socket’s License Enforcement Beta. Set up your License Policy and secure your software!
Product
We're launching a new set of license analysis and compliance features for analyzing, managing, and complying with licenses across a range of supported languages and ecosystems.
Product
We're excited to introduce Socket Optimize, a powerful CLI command to secure open source dependencies with tested, optimized package overrides.