SemSQL: standard SQL views for RDF/OWL ontologies
This project provides a standard collection of SQL tables/views for ontologies, such that you can make queries like this,
to find all terms starting with Abnormality
in HPO.
$ sqlite db/hp.db
sqlite> SELECT * FROM rdfs_label_statement WHERE value LIKE 'Abnormality of %';
stanza | subject | predicate | object | value | datatype | language |
---|
HP:0000002 | HP:0000002 | rdfs:label | | Abnormality of body height | xsd:string | |
HP:0000014 | HP:0000014 | rdfs:label | | Abnormality of the bladder | xsd:string | |
HP:0000022 | HP:0000022 | rdfs:label | | Abnormality of male internal genitalia | xsd:string | |
HP:0000032 | HP:0000032 | rdfs:label | | Abnormality of male external genitalia | xsd:string | |
Ready-made SQLite3 builds can also be downloaded for any ontology in OBO, using URLs such as https://s3.amazonaws.com/bbop-sqlite/hp.db
relation-graph is used to pre-generate tables of entailed edges. For example,
all is-a and part-of ancestors of finger in Uberon:
$ sqlite db/uberon.db
sqlite> SELECT * FROM entailed_edge WHERE subject='UBERON:0002389' and predicate IN ('rdfs:subClassOf', 'BFO:0000050');
subject, predicate, object |
---|
UBERON:0002389, BFO:0000050, UBERON:0015212 |
UBERON:0002389, BFO:0000050, UBERON:5002389 |
UBERON:0002389, BFO:0000050, UBERON:5002544 |
UBERON:0002389, rdfs:subClassOf, UBERON:0000061 |
UBERON:0002389, rdfs:subClassOf, UBERON:0000465 |
UBERON:0002389, rdfs:subClassOf, UBERON:0000475 |
SQLite provides many advantages
- files can be downloaded and subsequently queried without network latency
- compared to querying a static rdf, owl, or obo file, there is no startup/parse delay
- robust and performant
- excellent support in many languages
Although the focus is on SQLite, this library can also be used for other DBMSs like PostgreSQL, MySQL, Oracle, etc
Tutorials
Installation
SemSQL comes with a helper Python library. Use of this is optional. To install:
pip install semsql
Download ready-made SQLite databases
Pre-generated SQLite database are created weekly for all OBO ontologies and a selection of others (see ontologies.yaml)
To download:
semsql download obi -o obi.db
Or simply download using URL of the form:
Attaching databases
If you are using sqlite3, then databases can be attached to facilitate cross-database joins.
For example, many ontologies use ORCID URIs as the object of dcterms:contributor
and dcterms:creator
statements, but these are left "dangling". Metadata about these orcids are available in the semsql orcid database instance (derived from wikidata-orcid-ontology), in the Orcid table.
You can use ATTACH DATABASE to connect two databases, for example:
$ sqlite3 db/cl.dl
sqlite> attach 'db/orcid.db' as orcid_db;
sqlite> select * from contributor inner join orcid_db.orcid on (orcid.id=contributor.object) where orcid.label like 'Chris%';
obo:cl.owl|obo:cl.owl|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
CL:0010001|CL:0010001|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
CL:0010002|CL:0010002|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
CL:0010003|CL:0010003|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
CL:0010004|CL:0010004|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
UBERON:0000093|UBERON:0000093|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
UBERON:0000094|UBERON:0000094|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
UBERON:0000095|UBERON:0000095|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
UBERON:0000179|UBERON:0000179|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
UBERON:0000201|UBERON:0000201|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
UBERON:0000202|UBERON:0000202|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
UBERON:0000203|UBERON:0000203|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
UBERON:0000204|UBERON:0000204|dcterms:contributor|orcid:0000-0002-6601-2165||||orcid:0000-0002-6601-2165|Christopher J. Mungall
Creating a SQLite database from an OWL file
There are two protocols for doing this:
- install build dependencies
- use Docker
In either case:
- The input MUST be in RDF/XML serialization and have the suffix
.owl
: - use robot to convert if format is different
We are planning to simplify this process in future.
1. Build a SQLite database directly
This requires some basic technical knowledge about how to install things on your machine
and how to put things in your PATH. It does not require Docker.
Requirements:
After installing these and putting both relation-graph
and rdftab.rs
in your path:
semsql make foo.db
This assumes foo.owl
is in the same folder
2. Use Docker
There are two docker images that can be used:
The ODK image may lag behind
docker run -v $PWD:/work -w /work -ti linkml/semantic-sql semsql make foo.db
Schema
See Schema Documentation
The source schema is in LinkML - this is then compiled down to SQL Tables and Views
The basic idea is as follows:
There are a small number of "base tables":
All other tables are actually views (derived tables), and are provided for convenience.
ORM Layer
A SemSQL relational database can be accessed in exactly the same way as any other SQLdb
For convenience, we provide a Python Object-Relational Mapping (ORM) layer using SQL Alchemy.
This allows for code uchlike the following, which joins RdfsSubclassOfStatement and existential restrictions:
engine = create_engine(f"sqlite:////path/to/go.db")
SessionClass = sessionmaker(bind=engine)
session = SessionClass()
q = session.query(RdfsSubclassOfStatement)
q = q.add_entity(OwlSomeValuesFrom)
q = q.join(OwlSomeValuesFrom, RdfsSubclassOfStatement.object == OwlSomeValuesFrom.id)
lines = []
for ax, ex in q.all():
line = f'{ax.subject} subClassOf {ex.on_property} SOME {ex.filler}'
logging.info(line)
lines.append(line)
(this example is just for illustration - to do the same thing there is a simpler Edge relation)
Applications
The semsql python library is intentionally low level - we recommend using the ontology-access-kit
For example:
runoak -i db/envo.db search t~biome
You can also pass in an OWL file and have the sqlite be made on the fly
runoak -i sqlite:envo.owl search t~biome
Even if using OAK, it can be useful to access SQL tables directly to do complex multi-join queries in a performant way.
Optimization
poetry run semsql view2table edge --full-index | sqlite3 $db/mydb.db
See indexes for some ready-made indexes