SPARQL Guideline
The hPSCreg Cell Line Ontology (HCLO) provides a semantic framework for describing cell lines in the human pluripotent stem cell registry (hPSCreg).
Data about each cell line — including user input and metadata — is standardized using controlled vocabularies, registered identifiers,
and major biological ontologies for diseases, anatomy, cell types, and genes.
The HCLO, formatted in OWL, organizes these relationships as a knowledge graph, which can be explored using SPARQL queries.
Please find below some example SPARQL queries to explore the cell line data. Simply copy and paste the text into the Query Text box of the SPARQL Endpoint and click “Run Query”.
You may find further instructions in the SPARQL Guide.
Example SPARQL Queries
-
Cell lines associated to a donor having a disease.
Question: Which cell lines are associated to a donor with any kind of eye disease? Please group the results by disease.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT DISTINCT (STR(?clname) AS ?line) (STR(?dislab) AS ?disease) (STR(?sub) AS ?diseaseID) (STR(?prop) AS ?diseaseRelation) WHERE { ?dis rdfs:label ?label . FILTER(CONTAINS(LCASE(STR(?label)), "eye disease")) ?sub rdfs:subClassOf* ?dis . ?cell rdfs:subClassOf ?rest . ?rest owl:onProperty ?prop . VALUES ?prop { <http://purl.obolibrary.org/obo/CLO_0000015> # Patient has disease <http://purl.obolibrary.org/obo/CLO_0000006> # Embryo has disease <http://purl.obolibrary.org/obo/CLO_0100005> # Embryo is carrier of disease <http://purl.obolibrary.org/obo/CLO_0100003> # Patient is carrier of disease } ?rest owl:someValuesFrom ?sub . ?sub rdfs:label ?dislab . ?cell rdfs:label ?clname . } GROUP BY ?cell ?sub ?dislab ?prop ORDER BY ?lineVariations on this query:
- for best results, replace “eye disease” with a human readable label that is found in Disease Ontology, e.g.”neurodegenerative”.
-
Cell lines from Dravet syndrome patients and the gene symbol of variants (if any).
Question: Please find all cell lines from Dravet syndrome patients and the gene symbol of variants (if any).
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT DISTINCT (STR(?clname) AS ?line) (STR(?dislab) AS ?disease) (STR(?sub) AS ?diseaseID) (GROUP_CONCAT(DISTINCT CONCAT(STR(?symbol), " [", STR(?genvar), "]"); separator=", ") AS ?geneSymbolVariant) WHERE { # Match diseases with "dravet" in label ?sub rdfs:label ?label . FILTER(CONTAINS(LCASE(STR(?label)), "dravet")) BIND(STR(?label) AS ?dislab) # Disease restrictions on cell lines ?cell rdfs:subClassOf ?rest . ?rest owl:onProperty ?prop . VALUES ?prop { <http://purl.obolibrary.org/obo/CLO_0000015> # Patient has disease <http://purl.obolibrary.org/obo/CLO_0000006> # Embryo has disease <http://purl.obolibrary.org/obo/CLO_0100005> # Embryo is carrier of disease <http://purl.obolibrary.org/obo/CLO_0100003> # Patient is carrier of disease } ?rest owl:someValuesFrom ?sub . ?cell rdfs:label ?clname . # OPTIONAL gene symbol of genetic variant OPTIONAL { ?cell rdfs:subClassOf [ a owl:Restriction ; owl:onProperty <http://purl.obolibary.org/obo/hpscreg_0000001> ; owl:someValuesFrom ?genvar ] . OPTIONAL { ?genvar rdfs:label ?symbol . } } } GROUP BY ?cell ?sub ?dislab ?clname ORDER BY ?lineVariations on this query:
- replace “dravet” with another keyword for a disease, e.g. “diabetes”, “alzheimer”.
-
Cell lines generated in Germany.
Question: Which cell lines have been generated in Germany and what are the generating institutions?
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX obo: <http://purl.obolibrary.org/obo/> SELECT DISTINCT ?cell (STR(?clname) AS ?cell_line_name) (STR(?locationLabel) AS ?location) (STR(?generator) AS ?generator_name) WHERE { # Cell line derived in Germany ?cell rdfs:subClassOf ?restriction . ?restriction owl:onProperty obo:CL_0100008 . ?restriction owl:someValuesFrom ?location . ?location rdfs:label ?locationLabel . FILTER(CONTAINS(LCASE(STR(?locationLabel)), "germany")) # Generator literal ?cell <http://purl.obolibrary.org/obo/hpscreg/generator> ?generator . # Cell line name ?cell rdfs:label ?clname . } ORDER BY ?cell_line_nameVariations on this query:
- replace “germany” with a different country, e.g. “france”, “united states”, “china”.
-
Cell Lines being used in clinical studies.
Question: Which hPSC lines are being used in clinical studies?
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX obo: <http://purl.obolibrary.org/obo/> SELECT DISTINCT (STR(?clname) AS ?CellLine) (STR(?propertyLabel) AS ?Property) (STR(?processLabel) AS ?Process) WHERE { ?cell rdfs:subClassOf ?restriction . ?restriction owl:onProperty obo:RO_0000056 . ?restriction owl:someValuesFrom ?process . ?cell rdfs:label ?clname . OPTIONAL { obo:RO_0000056 rdfs:label ?propertyLabel . } OPTIONAL { ?process rdfs:label ?processLabel . } FILTER(STRSTARTS(STR(?cell), "http://purl.obolibrary.org/obo/CLO_")) . FILTER(LCASE(STR(?processLabel)) = "clinical study") } ORDER BY ?CellLineWhat this query does:
- Cell lines are associated to the process “clinical study” using the OBO Relations Ontology property: “participates in” <http://purl.obolibrary.org/obo/RO_0000056 >.
- This query retrieves all cell lines that match this filter.
-
Explore data associated with a specific cell line.
Question: What kinds of data are associated with MHHi001-A?
Data can be annotated in subclasses or as literal properties.
-
Part A. Data annotated in structured subclasses
Each cell line is structured in a class, and some of the data (i.e. diseases, source donor (individual or embryo) is organized in subclasses with ontological relationships. These kinds of data can be directly queried by SPARQL statements as subclass restrictions.
Question: What subclasses are associated to the cell line “MHHi001-A”?
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> SELECT DISTINCT (STR(?clname) AS ?CellLine) (STR(?property) AS ?PropertyIRI) (STR(SAMPLE(?propertyLabel)) AS ?PropertyLabel) (STR(?valueLabel) AS ?ValueLabel) WHERE { # Match only the class with label exactly "MHHi001-A" ?cell rdfs:label ?cellLabel. FILTER(LCASE(STR(?cellLabel)) = "mhhi001-a"). BIND(?cellLabel AS ?clname) # Get all subclass restrictions ?cell rdfs:subClassOf ?restriction. ?restriction owl:onProperty ?property. ?restriction owl:someValuesFrom ?value. # Get label for the property (predicate) OPTIONAL { ?property rdfs:label ?propertyLabel. FILTER(lang(?propertyLabel) = "" || lang(?propertyLabel) = "en") } # Get label for the value (object) OPTIONAL { ?value rdfs:label ?valueLabel. FILTER(lang(?valueLabel) = "" || lang(?valueLabel) = "en") } } GROUP BY ?cell ?clname ?valueLabel ?property ORDER BY ?CellLine -
Part B. Data annotated as literal properties
Data that are not organized within subclasses can be annotated as literal properties. Such data elements are not indexed for querying by SPARQL, but they still can be retrieved by filtering for specific properties or values.
Question: What literal properties are associated to the cell line “MHHi001-A”?
Step 1: Starting from the cell line name, retrieve the IRI.Step 2: Using the IRI, retrieve all triples associated with the cell line.PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT DISTINCT ?cellLine WHERE { ?cellLine rdfs:label ?label . FILTER (STR(?label) = "MHHi001-A") }PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT DISTINCT <http://purl.obolibrary.org/obo/CLO_0100751> AS ?cell ?predicate ?value WHERE { <http://purl.obolibrary.org/obo/CLO_0037342> ?predicate ?value . }
-
-
Publications associated to cell lines.
Question: What publications are associated to the cell line “WTSIi018-B”?
Step 1: First retrieve the IRI of the cell line “WTSIi018-B”.Step 2: Using the IRI for WTSIi018-B, retrieve all publications associated to this cell line and show clickable links to the publications.PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT DISTINCT ?cellLine WHERE { ?cellLine rdfs:label ?label . FILTER (STR(?label) = "WTSIi018-B") }SELECT DISTINCT <http://purl.obolibrary.org/obo/CLO_0100975> AS ?cell ?predicate (STR(?value) AS ?citationText) (IF(CONTAINS(LCASE(STR(?value)), "doi"), IRI(CONCAT("https://doi.org/", REPLACE(STR(?value), "DOI:", ""))), IF(CONTAINS(LCASE(STR(?value)), "pmid"), IRI(CONCAT("https://pubmed.ncbi.nlm.nih.gov/", REPLACE(STR(?value), "PMID:", ""))), UNDEF ) ) AS ?CitationLink) WHERE { <http://purl.obolibrary.org/obo/CLO_0100975> ?predicate ?value . FILTER ( isLiteral(?value) && (CONTAINS(LCASE(STR(?value)), "doi") || CONTAINS(LCASE(STR(?value)), "pmid")) ) } -
Cell lines from healthy tissue.
Question: Question: Which hiPSC cell lines have been derived from healthy tissue? Display the results sorted by chromosomal sex.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX obo: <http://purl.obolibrary.org/obo/> SELECT DISTINCT ?cellLine (STR(?label) AS ?cellLineLabel) (STR(?sexLabel) AS ?sex) WHERE { ?cellLine a owl:Class ; rdfs:subClassOf ?donorRestriction . ?donorRestriction a owl:Restriction ; owl:onProperty obo:RO_0001000 ; owl:someValuesFrom obo:CLO_0100000 . OPTIONAL { ?cellLine rdfs:label ?label . FILTER(lang(?label) = "") } OPTIONAL { ?cellLine rdfs:subClassOf ?sexRestriction . ?sexRestriction owl:onProperty obo:RO_0001000 ; owl:someValuesFrom ?sexIRI . VALUES ?sexIRI { obo:UBERON_0003101 # male organism obo:UBERON_0003100 # female organism } BIND( IF(?sexIRI = obo:UBERON_0003101, "male", IF(?sexIRI = obo:UBERON_0003100, "female", "unknown") ) AS ?sexLabel ) } }