Help:WikiPathways SPARQL queries
From WikiPathways
On http://sparql.wikipathways.org/ WikiPathways content is replicated in a SPARQL endpoint. Queries can be performed in three ways:
1. Either go to the endpoint directly and create your own SPARQL query.
2. Copy and paste an example query listed below in the endpoint.
3. Adapt a code examples to programmatically make a SPARQL query
This project is written up in the "Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources" paper.
Resources
- WikiPathways internal vocabularies: http://vocabularies.wikipathways.org
- WikiPathways SPARQL endpoint https://sparql.wikipathways.org/sparql
- Identifiers.org: https://identifiers.org
- Searches prefixes: http://prefix.cc
Submit ideas
Prefixes
Within the example queries, we have omitted the prefixes. These prefixes are automatically used in the SPARQL endpoint. The following prefixes are used in the WikiPathways RDF:
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#> PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX cur: <http://vocabularies.wikipathways.org/wp#Curation:> PREFIX wprdf: <http://rdf.wikipathways.org/> PREFIX biopax: <http://www.biopax.org/release/biopax-level3.owl#> PREFIX cas: <https://identifiers.org/cas/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX ncbigene:<https://identifiers.org/ncbigene/> PREFIX pubmed: <http://www.ncbi.nlm.nih.gov/pubmed/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
Example queries
Queries with a * require a bit more time for results.
Metadata queries
List the information about the data sets in the SPARQL endpoint:
select distinct ?dataset (str(?titleLit) as ?title) ?date ?license where { ?dataset a void:Dataset ; dcterms:title ?titleLit ; dcterms:license ?license ; pav:createdOn ?date . }
Pathway oriented queries
Get the species currently in WikiPathways with their respective URI's
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?organism (str(?label) as ?name) WHERE { ?concept wp:organism ?organism ; wp:organismName ?label . }
List pathways and their species
PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT (str(?title) as ?pathway) (str(?label) as ?organism) WHERE { ?pw dc:title ?title ; wp:organism ?organism ; wp:organismName ?label . }
List the species captured in WikiPathways and the number of pathways per species
PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?organism (str(?label) as ?name) (count(?pw) as ?pathwayCount) WHERE { ?pw dc:title ?title ; wp:organism ?organism ; wp:organismName ?label . } ORDER BY DESC(?pathwayCount)
List all pathways for species "Mus musculus"
The following query list all mouse pathways. ?wpIdentifier is the link through identifiers.org, ?pathway points to the RDF version of WikiPathways and ?page is the revision which is loaded in the SPARQL endpoint.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT ?wpIdentifier ?pathway ?page WHERE { ?pathway dc:title ?title . ?pathway foaf:page ?page . ?pathway dc:identifier ?wpIdentifier . ?pathway wp:organismName "Mus musculus" . } ORDER BY ?wpIdentifier
Get all pathways with a particular gene
List all pathways per instance of a particular gene or protein (wp:GeneProduct)
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dcterms: <http://purl.org/dc/terms/> SELECT DISTINCT ?pathway (str(?label) as ?geneProduct) WHERE { ?geneProduct a wp:GeneProduct . ?geneProduct rdfs:label ?label . ?geneProduct dcterms:isPartOf ?pathway . ?pathway a wp:Pathway . FILTER regex(str(?label), "CYP"). }
Get all groups and complexes containing a particular gene
List all groups and complexes per instance of a particular gene or protein (wp:GeneProduct)
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dcterms: <http://purl.org/dc/terms/> SELECT DISTINCT ?pathway (str(?label) as ?geneProduct) WHERE { ?geneProduct a wp:GeneProduct . ?geneProduct rdfs:label ?label . ?geneProduct dcterms:isPartOf ?pathway . FILTER NOT EXISTS { ?pathway a wp:Interaction } . FILTER NOT EXISTS { ?pathway a wp:Pathway } . FILTER regex(str(?label), "CYP"). }
Get all the genes on a particular pathway
List all the genes and proteins (wp:GeneProduct) associated with a particular pathway WPID.
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dcterms: <http://purl.org/dc/terms/> select distinct ?pathway (str(?label) as ?geneProduct) where { ?geneProduct a wp:GeneProduct . ?geneProduct rdfs:label ?label . ?geneProduct dcterms:isPartOf ?pathway . ?pathway a wp:Pathway . ?pathway dcterms:identifier "WP1560" . }
Count the number of pathways per ontology term
In WikiPathways, pathways can be tagged with ontology terms from Pathway, Cell Line and Disease ontology. The following query returns a pathway count for each term from any of the available ontologies. These terms are collectively modeled as wp:pathwayOntology; but this includes all ontologies, not just the "Pathway" ontology.
SELECT DISTINCT ?pwOntologyTerm count(?pwOntologyTerm) as ?pathwayCount WHERE { ?pathwayRDF wp:ontologyTag ?pwOntologyTerm . } ORDER BY DESC(?pathwayCount)
Get all pathways with a particular ontology term
In WikiPathways, pathways can be tagged with ontology terms from Pathway, Cell Line and Disease ontology. The following query returns a list of pathways tagged with PW_0000296.
PREFIX obo: <http://purl.obolibrary.org/obo/> SELECT ?pathway (str(?titleLit) AS ?title) WHERE { ?pathwayRDF wp:ontologyTag obo:PW_0000296 ; foaf:page ?pathway ; dc:title ?titleLit . }
Get all ontology terms for a particular pathway
List all the ontology terms tagged on a particular pathway.
SELECT (?o as ?pwOntologyTerm) (str(?titleLit) as ?title) ?pathway WHERE { ?pathwayRDF wp:ontologyTag ?o ; foaf:page ?pathway ; dc:title ?titleLit ; dcterms:identifier "WP1560" . FILTER (! regex(str(?pathway), "group")) }
Get all Reactome pathways
List all the ontology terms tagged on a particular pathway.
PREFIX cur: <http://vocabularies.wikipathways.org/wp#Curation:> SELECT DISTINCT ?pathway (str(?titleLit) as ?title) WHERE { ?pathway wp:ontologyTag cur:Reactome_Approved ; dc:title ?titleLit . }
Get all Proteins from Community pathways
Obtain all Protein DataNodes (so not GeneProducts) from specific communities. Note: All community tags are available in the WikiPathways SPARQL endpoint now (the "Blau" community was updated to "IEM").
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX cur: <http://vocabularies.wikipathways.org/wp#Curation:> SELECT DISTINCT ?pathway (str(?label) as ?Protein) WHERE { ?pathway wp:ontologyTag cur:Lipids ; # Other communities: AOP; CIRM_Related ; COVID19 ; IEM ; RareDiseases ; SGD_Approved ; WormBase_Approved a wp:Pathway . ?protein a wp:Protein ; rdfs:label ?label ; dcterms:isPartOf ?pathway . }
Get all pathways with PubMed references
SELECT DISTINCT ?pathway ?pubmed WHERE {?pubmed a wp:PublicationReference . ?pubmed dcterms:isPartOf ?pathway } ORDER BY ?pathway LIMIT 100
Get all pathways with a particular PubMed reference
SELECT DISTINCT ?pathway ?pubmed WHERE { ?pubmed a wp:PublicationReference . ?pubmed dcterms:isPartOf ?pathway . FILTER regex(str(?pubmed), "14769483$") } ORDER BY ?pathway
The $ at the end of the PubMed identifier ensures that, for example, 147694831 does not match too; the regex instruction '$' means "end of the string".
Get all pathways and the number of refences per pathway
SELECT DISTINCT ?pathway COUNT(?pubmed) AS ?numberOfReferences WHERE {?pubmed a wp:PublicationReference . ?pubmed dcterms:isPartOf ?pathway } ORDER BY DESC(?numberOfReferences)
Get a full dump of all pathways and their pathway ontological terms
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX schema: <http://schema.org/> PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX dcterms: <http://purl.org/dc/terms/> SELECT DISTINCT ?depicts (str(?titleLit) as ?title) (str(?speciesLabelLit) as ?speciesLabel) ?identifier ?ontology WHERE { ?pathway foaf:page ?depicts . ?pathway dc:title ?titleLit . ?pathway wp:organism ?species . ?pathway wp:organismName ?speciesLabelLit . ?pathway dc:identifier ?identifier . OPTIONAL {?pathway wp:ontologyTag ?ontology .} } LIMIT 100
Count how many genes from a list occur in one or more pathways
The gene list is given in the line starting with VALUES. For this example, we use the HGNC gene nomenclature.
Other gene IDs can be used to fill the list. However, keep in mind they use different URLs and therefore the prefix (first line of query) should be adapted. A list of these URLs can be found on identifiers.org (or by looking at the "raw" RDF files).
prefix hgnc: <https://identifiers.org/hgnc.symbol/> select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (count(distinct ?hgncId) AS ?GenesInPWs) where { VALUES ?hgncId {hgnc:GPD1L hgnc:SCN3B hgnc:BAD} ?gene a wp:GeneProduct ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbHgncSymbol ?hgncId . ?pathwayRes a wp:Pathway ; wp:organismName "Homo sapiens" ; dcterms:identifier ?wpid ; dc:title ?title . } ORDER BY DESC(?GenesInPWs)
LIPID MAPS related queries:
Count amount of lipids per pathways in WikiPathways with LIPID MAPS identifier
Converts all Metabolite identifiers to LipidMaps (provided by BridgeDb), and create an ordered list of pathways including lipid compounds.
prefix lipidmaps: <https://identifiers.org/lipidmaps/> select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (count(distinct ?lipidID) AS ?LipidsInPWs) where { ?metabolite a wp:Metabolite ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbLipidMaps ?lipidID . ?pathwayRes a wp:Pathway ; wp:organismName "Homo sapiens" ; dcterms:identifier ?wpid ; dc:title ?title . } ORDER BY DESC(?LipidsInPWs)
Count amount of lipids per LIPID MAPS ontology class
Counts unique LIPID MAPS identifier (provided by BridgeDb) for the fatty acid (FA) class, other examples are provided as a comment.
select count(distinct ?lipidID) as ?IndividualLipidsPerClass_FA where { ?metabolite a wp:Metabolite ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbLipidMaps ?lipidID . ?pathwayRes a wp:Pathway ; wp:organismName "Homo sapiens" ; dcterms:identifier ?wpid ; dc:title ?title . FILTER regex(str(?lipidID), "FA" ). # Other classes: GL, GP, SP, ST, PR, SL, PK }
Find pathways per LIPID MAPS ontology class, sorted on amount of unique lipids
Filter all unique LIPID MAPS identifier (provided by BridgeDb) for the fatty acid (FA) class, and find all pathways with individual lipids in there.
select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (count(distinct ?lipidID) AS ?FA_LipidsInPWs) where { ?metabolite a wp:Metabolite ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbLipidMaps ?lipidID . ?pathwayRes a wp:Pathway ; wp:organismName "Homo sapiens" ; dcterms:identifier ?wpid ; dc:title ?title . FILTER regex(str(?lipidID), "FA" ). # Fatty acids, Other classes: GL, GP, SP, ST, PR, SL, PK } ORDER BY DESC(?FA_LipidsInPWs)
Data statistics oriented queries
Count the number of metabolites per species
Though strictly speaking, it guesstimates it, because it counts the number of unique metabolite identifiers. Normalization in the RDF generation code ensures we do not double count metabolites with identifiers from different databases, but it still differentially counts metabolites with different charge states.
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> select (count(distinct ?metabolite) as ?count) (str(?label) as ?species) where { ?metabolite a wp:Metabolite ; dcterms:isPartOf ?pw . ?pw dc:title ?title ; wp:organism ?organism ; wp:organismName ?label . } GROUP BY ?label ORDER BY DESC(?count)
Interaction oriented queries
Get all interactions for a particular datanode
Find all interactions that are connected to a particular datanode. (wp:Interaction).
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> #Find all interactions that are connected to a particular datanode. SELECT DISTINCT ?interaction ?pathway WHERE { ?pathway a wp:Pathway . ?interaction dcterms:isPartOf ?pathway . ?interaction a wp:Interaction . ?interaction wp:participants <https://identifiers.org/ensembl/ENSG00000125845> . }
Find all datanodes (GeneProducts, Metabolites, Pathways) that are connected to a particular datanode via any type of interaction (wp:Interaction)
SELECT DISTINCT ?participants ?DataNodeLabel ?interaction WHERE { ?interaction a wp:Interaction . ?interaction wp:participants <https://identifiers.org/ensembl/ENSG00000125845> . ?interaction wp:participants ?participants . ?participants a wp:DataNode . ?participants rdfs:label ?DataNodeLabel . }
Get all interactions for a particular pathway
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?pathway ?interaction WHERE { ?pathway a wp:Pathway . ?pathway dc:identifier <https://identifiers.org/wikipathways/WP1425> . ?interaction dcterms:isPartOf ?pathway . ?interaction a wp:Interaction . }
Get all interactions for a particular pathway and their participants
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?pathway ?interaction ?participants ?DataNodeLabel WHERE { ?pathway a wp:Pathway . ?pathway dc:identifier <https://identifiers.org/wikipathways/WP1425> . ?interaction dcterms:isPartOf ?pathway . ?interaction a wp:Interaction . ?interaction wp:participants ?participants . ?participants a wp:DataNode . ?participants rdfs:label ?DataNodeLabel . }
Get all Interactions
Limited to 100 interactions.
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?pathway ?interaction ?participant WHERE { ?pathway a wp:Pathway . ?interaction dcterms:isPartOf ?pathway . ?interaction a wp:Interaction . ?interaction wp:participants ?participant . } LIMIT 100
Get all Interactions for a species (Homo sapiens)
We've added a LIMIT to this query (since it takes some time to run). Remove the 'LIMIT 100' to obtain all interactions
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?pathway ?interaction WHERE { ?pathway a wp:Pathway . ?pathway wp:organismName "Homo sapiens" . ?interaction dcterms:isPartOf ?pathway . ?interaction a wp:Interaction . } LIMIT 100
Get downstream adjacent nodes from a source
A directed interaction always runs from source to target. (eg. s --> t)
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?source ?label ?target ?label1 ?pathway ?interaction WHERE { ?source dc:identifier <https://identifiers.org/ensembl/ENSG00000125845> . ?source dcterms:isPartOf ?pathway . ?pathway a wp:Pathway . ?interaction dcterms:isPartOf ?pathway . ?interaction a wp:Interaction . ?interaction wp:source ?source . ?interaction wp:target ?target . ?source rdfs:label ?label . ?target rdfs:label ?label1 . }
Get upstream adjacent nodes from a target
A directed interaction always runs from source to target. (eg. t <-- s)
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?target ?label1 ?source ?label ?pathway ?interaction WHERE { ?target dc:identifier <https://identifiers.org/ncbigene/659> . ?target dcterms:isPartOf ?pathway . ?pathway a wp:Pathway . ?interaction dcterms:isPartOf ?pathway . ?interaction a wp:Interaction . ?interaction wp:target ?target . ?interaction wp:source ?source . ?target rdfs:label ?label1 . ?source rdfs:label ?label . }
Get the number of interactions with some data source
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT DISTINCT (STR(?sourceLit) AS ?source) (COUNT(DISTINCT ?identifier) AS ?count) WHERE { ?interaction a wp:Interaction ; dcterms:identifier ?identifier ; dc:source ?sourceLit . } GROUP BY ?sourceLit ORDER BY DESC(?count)
Get the number of normalized identifiers by data source
This query gives some idea of how the interaction identifier mapping database works in practice:
PREFIX wp: <http://vocabularies.wikipathways.org/wp#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT DISTINCT (STR(?sourceLit) AS ?source) ?normalizedIdentifier (COUNT(DISTINCT ?identifier) AS ?count) WHERE { VALUES ?normalizedIdentifier { wp:bdbRhea wp:bdbReactome } ?interaction ?normalizedIdentifier ?identifier ; dc:source ?sourceLit . } GROUP BY ?sourceLit ?normalizedIdentifier ORDER BY DESC(?count) ?source
Datasource oriented queries
Get all datasources currently captured in WikiPathways
SELECT DISTINCT (str(?datasourceLit) as ?datasource) WHERE { ?concept dc:source ?datasourceLit }
Get the number of entries per datasource in WikiPathways
SELECT DISTINCT (str(?datasourceLit) as ?datasource) (count(?dataNode) as ?numberEntries) WHERE { ?concept dc:source ?datasourceLit ; wp:isAbout ?dataNode . } ORDER BY DESC(?numberEntries)
Count the identifiers per data source
SELECT (str(?datasourceLit) as ?datasource) (count(distinct ?identifier) AS ?numberEntries) WHERE { ?concept dc:source ?datasourceLit . ?concept dc:identifier ?identifier }
Count the identifiers per data source and order them from high to low
SELECT (str(?datasourceLit) as ?datasource) (count(distinct ?identifier) AS ?numberEntries) WHERE { ?concept dc:source ?datasourceLit . ?concept dc:identifier ?identifier } ORDER BY DESC(?numberEntries)
Return all compounds annotated with the "ChEMBL compound" as data source and the pathways they are in
SELECT DISTINCT ?identifier ?pathway WHERE { ?concept dcterms:isPartOf ?pathway . ?concept dc:source "ChEMBL compound" . ?concept dc:identifier ?identifier . }
Curators oriented queries
Get the pathway with the erroneous data source "null"
SELECT DISTINCT ?identifier ?pathway ?label WHERE { ?concept dc:source "null" . ?concept dc:identifier ?identifier . ?concept dcterms:isPartOf ?pathway . ?concept rdfs:label ?label }
Get all geneproducts that lack either a DataSource or an Identifier
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select distinct ?pathway ?label where {?geneProduct a wp:GeneProduct . ?geneProduct rdfs:label ?label . ?geneProduct dcterms:isPartOf ?pathway . FILTER regex(str(?geneProduct), "^node"). FILTER regex(str(?pathway), "^http"). }
Get entities with more than one identifier
This query contains a limit, since it might take some time to run.
select ?entity count(?identifier) as ?count where { ?entity <http://purl.org/dc/terms/identifier> ?identifier . } order by desc(?count) LIMIT 100
PubChem-compound 1004
Warning: may time out, which indicates there is no current use of this identifier value.
Wrongly used for phosphate. It is the uncharged compound. Phosphate is, instead, and particularly thinkgs like "Pi", CID 1061 for ortho-phosphate, aka [PO4]2-.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select ?pathway ?source where { ?mb dc:source ?source ; dcterms:isPartOf ?pathway ; dc:source "Pubchem Compound" ; dcterms:identifier "1004". }
Entrez Gene 1004
This query will return results, since the ID '1004' is in use by Entrez Gene, and we are not filtering for a specific database name.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select ?pathway ?source where { ?mb dc:source ?source ; dcterms:isPartOf ?pathway ; dcterms:identifier "1004". }
Outdated HMDB identifiers
These results show HMDB identifiers used in WikiPathways but that are revoked or have become secondary identifiers.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select distinct (str(?identifierStr) as ?identifier) where { ?mb a wp:Metabolite ; dc:source "HMDB"^^xsd:string ; dcterms:identifier ?identifierStr . OPTIONAL { ?mb wp:bdbHmdb ?bridgedb . } FILTER (!BOUND(?bridgedb)) } order by ?identifierStr
Metabolites not classified as such
One can list all data sources for non-metabolites with this query.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select (str(?datasourceLit) as ?datasource) (count(?identifier) as ?count) where { ?mb dc:source ?datasourceLit ; dcterms:identifier ?identifier . FILTER NOT EXISTS { ?mb a wp:Metabolite } } order by asc(?count)
Check the full list, to see which databases normally used for chemicals are not typed as Metabolites. Gene identifier sources are most likely at the bottom. Note: Wikidata can be used for Metabolites, Proteins, and Genes (but is preferably used only for Metabolites, since there are no mappings available yet to map to other gene databases).
Example (data from 2022-01-10)
ChEBI 1 XMetDB 2 Wikidata 90
Metabolites sometimes marked as DataNode@Type Metabolite
Based on label comparisons, we can find things that are labeled the same as a data node with the same label. Of course, this can give false positives, because genes can be incorrectly marked as metabolite in some pathway, but that is another SPARQL query. Another reasons is that sometimes genes and metabolites actually have the same name!
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select ?pathway ?nonmb ?mb (str(?labelLit) as ?label) where { ?nonmb rdfs:label ?labelLit . ?mb rdfs:label ?labelLit . ?nonmb dcterms:isPartOf ?pathway . FILTER ( ?nonmb != ?mb ) FILTER NOT EXISTS { ?nonmb a wp:Metabolite } FILTER EXISTS { ?pathway a wp:Pathway } FILTER EXISTS { ?mb a wp:Metabolite } }
Metabolites with too many labels
This query can result in false positives too, particularly with the new RDF.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select distinct ?mb count(distinct ?labelLit) as ?labelCount where { ?mb a wp:Metabolite ; rdfs:label ?labelLit ; dcterms:isPartOf ?pathway . ?pathway a wp:Pathway . } order by desc(?labelCount) ?mb
And get the actual labels (and more) with (perform a ctrl+f to search for the term "label'):
Describe <https://identifiers.org/chebi/CHEBI:35366>
Or use this one, but mind that pathway/label combinations are combinatorial, because they share the same node:
select distinct ?pathway (str(?labelLit) as ?label) where { <https://identifiers.org/hmdb/HMDB0001401> a wp:Metabolite; rdfs:label ?labelLit ; dcterms:isPartOf ?pathway . ?pathway a wp:Pathway . } order by ?pathway
Metabolites with an Entrez Gene identifier
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select distinct ?pathway ?mb (str(?labelLit) as ?label) (str(?identifierLit) as ?identifier) where { ?mb a wp:Metabolite ; rdfs:label ?labelLit ; dc:source "Entrez Gene" ; dcterms:identifier ?identifierLit ; dcterms:isPartOf ?pathway . ?pathway a wp:Pathway . } order by ?pathway
Metabolites without a link to Wikidata
This query provides a list of IDs which might be relevant to add to Wikidata.
PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT DISTINCT ?metabolite WHERE { ?metabolite a wp:Metabolite . OPTIONAL { ?metabolite wp:bdbWikidata ?wikidata . } FILTER (!BOUND(?wikidata)) }
A variant sorting the metabolites by the number of pathways they occur in:
PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT ?metabolite (count(DISTINCT ?pathwayRes) as ?pathways) WHERE { ?metabolite a wp:Metabolite ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes . ?pathwayRes a wp:Pathway . OPTIONAL { ?metabolite wp:bdbWikidata ?wikidata . } FILTER (!BOUND(?wikidata)) } GROUP BY ?metabolite ORDER BY DESC(?pathways)
And finally, finding double mappings to Wikidata (which is also a nice curation task):
PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT DISTINCT ?metaboliteID (GROUP_CONCAT(DISTINCT ?wikidata;separator=", ") AS ?results) WHERE { ?metaboliteID a wp:Metabolite . ?metaboliteID wp:bdbWikidata ?wikidata . ?metaboliteID wp:bdbWikidata ?wikidata2 . FILTER(?wikidata != ?wikidata2) } GROUP BY ?metaboliteID
Pathways without (annotated) datanodes
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT ?pathway WHERE{ ?pathway a wp:Pathway. FILTER NOT EXISTS {?node dcterms:isPartOf ?pathway. ?node a wp:DataNode} }
Pathways without literature references
SELECT (STR(?speciesLabelLit) AS ?species) (STR(?titleLit) AS ?title) ?pathway WHERE { ?pathway a wp:Pathway ; dc:title ?titleLit ; wp:organismName ?speciesLabelLit . MINUS { ?pubmed a wp:PublicationReference . ?pubmed dcterms:isPartOf ?pathway } } ORDER BY ASC(?species) ASC(?title)
Or if you just want to know how many these are:
SELECT count(DISTINCT ?pathway) WHERE { ?pathway a wp:Pathway ; dc:title ?titleLit ; wp:organismName ?speciesLabelLit . MINUS { ?pubmed a wp:PublicationReference . ?pubmed dcterms:isPartOf ?pathway } }
Literature queries
Articles cited by Reactome but not by WikiPathways
PREFIX cur: <http://vocabularies.wikipathways.org/wp#Curation:> SELECT (COUNT(DISTINCT ?pubmed) AS ?count) WHERE { ?pubmed a wp:PublicationReference . MINUS { ?pubmed dcterms:isPartOf/wp:ontologyTag cur:AnalysisCollection } { ?pubmed dcterms:isPartOf/wp:ontologyTag cur:Reactome_Approved } }
Articles cited by WikiPathways but not by Reactome
PREFIX cur: <http://vocabularies.wikipathways.org/wp#Curation:> SELECT (COUNT(DISTINCT ?pubmed) AS ?count) WHERE { ?pubmed a wp:PublicationReference . { ?pubmed dcterms:isPartOf/wp:ontologyTag cur:AnalysisCollection } MINUS { ?pubmed dcterms:isPartOf/wp:ontologyTag cur:Reactome_Approved } }
Articles cited by both Reactome and WikiPathways
PREFIX cur: <http://vocabularies.wikipathways.org/wp#Curation:> SELECT (COUNT(DISTINCT ?pubmed) AS ?count) WHERE { ?pubmed a wp:PublicationReference . { ?pubmed dcterms:isPartOf/wp:ontologyTag cur:AnalysisCollection } { ?pubmed dcterms:isPartOf/wp:ontologyTag cur:Reactome_Approved } }
Federated queries - !Under Construction!
Other SPARQL endpoints used in the federated queries
Note that the EBI endpoints are not maintained and therefore do not yield results at the moment. We created a mirror for the ChEMBL endpoint at:
- ChEMBL: https://chemblmirror.rdf.bigcat-bioinformatics.org/ (old: https://www.ebi.ac.uk/rdf/services/sparql and http://rdf.farmbio.uu.se/chembl/sparql/ ).
Down/unavailable (EVI):
- Arrayexpress Atlas: http://wwwdev.ebi.ac.uk/fgpt/gxa-sparql/index.jsp
- Reactome endpoint.
Other endpoints:
- Gene Wiki: http://genewiki.semwebinsi.de
- Text mining from Fraunhofer Institutes at University of Bonn: http://ops-virtuoso.scai.fraunhofer.de:8893/sparql
- Wikidata: query.wikidata.org
- IDSM (MolMeDB): https://idsm.elixir-czech.cz/sparql/endpoint/molmedb
WikiPathways with ChEMBL: all ChEMBL assays for pathways
PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#> SELECT ?pathway ?ensembl ?assay WHERE { { SELECT DISTINCT ?pathway ?ensembl WHERE { VALUES ?ensembl { <https://identifiers.org/ensembl/ENSG00000150093> } ?s wp:bdbEnsembl ?ensembl ; dcterms:isPartOf ?pathway . } } SERVICE <https://chemblmirror.rdf.bigcat-bioinformatics.org/sparql> { OPTIONAL { ?assay a chembl:Assay ; chembl:hasTarget/chembl:hasTargetComponent/chembl:targetCmptXref ?ensembl . } } } limit 100
WikiPathways with ChEMBL: all molecules targeting pathways
Here a limit is used too, as well as an IRI rewrite (from HTTPS to HTTP).
PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#> PREFIX sio: <http://semanticscience.org/resource/> SELECT ?pathway ?ensembl ?molecule ?smiles WHERE { { SELECT DISTINCT ?pathway ?ensembl WHERE { ?s wp:bdbEnsembl ?ensembl ; dcterms:isPartOf ?pathway . ?pathway dcterms:identifier "WP15". ##The IRI for Ensembl from WikiPathways starts with https:// , where the one from ChEMBL starts with http:// , so we need to rewrite the IRI BIND( # Bind the created IRI into a new variable (called ?newIRI) IRI( # Convert the string back to an IRI CONCAT( # Concatenate item 1 and 2 together as one string "http", # First item to concat (more items can be added with a comma #Second item to concat: SUBSTR( # Obtain a substring STR(?ensembl), # Convert the Ensembl IRI from WikiPathways to a string, 6) # removing the first 6 characters (<https) )) AS ?newIRI # Name for the new variable ) } } SERVICE <https://chemblmirror.rdf.bigcat-bioinformatics.org/sparql> { SELECT DISTINCT ?newIRI ?molecule ?smiles WHERE { ?assay a chembl:Assay ; chembl:hasTarget/chembl:hasTargetComponent/chembl:targetCmptXref ?newIRI . ?activity chembl:hasAssay ?assay ; chembl:hasMolecule ?molecule . OPTIONAL { ?molecule sio:SIO_000008 ?attrib . ?attrib a sio:CHEMINF_000018 ; sio:SIO_000300 ?smiles . } } limit 100 } }
WikiPathways with EBI Atlas RDF
Unfortunately, this data is no longer hosted on a SPARQL endpoint by EBI (so the federated queries will not work).
Genes differentially expressed in asthma and Pathways
For the genes differentially expressed in asthma, get the gene products associated to a WikiPathways pathway. (Built upon example query 5 in: http://www.ebi.ac.uk/rdf/services/atlas/sparql ). You can substitute the EFO number for other disease codes.
PREFIX identifiers:<https://identifiers.org/ensembl/> PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> PREFIX efo: <http://www.ebi.ac.uk/efo/> SELECT DISTINCT ?wpURL ?pwTitle ?expressionValue ?pvalue where { SERVICE <https://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?factor rdf:type efo:EFO_0000270 . ?value atlasterms:hasFactorValue ?factor . ?value atlasterms:isMeasurementOf ?probe . ?value atlasterms:pValue ?pvalue . ?value rdfs:label ?expressionValue . ?probe atlasterms:dbXref ?dbXref . } ?pwElement dcterms:isPartOf ?pathway . ?pathway dc:title ?pwTitle . ?pathway dc:identifier ?wpURL . ?pwElement wp:bdbEnsembl ?dbXref . } ORDER BY ASC(?pvalue)
Genes differentially expressed in type II diabetes mellitus and Pathways
PREFIX identifiers:<https://identifiers.org/ensembl/> PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> PREFIX efo: <http://www.ebi.ac.uk/efo/> SELECT DISTINCT ?wpURL ?pwTitle ?expressionValue ?pvalue where { SERVICE <https://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?factor rdf:type efo:EFO_0001360 . ?value atlasterms:hasFactorValue ?factor . ?value atlasterms:isMeasurementOf ?probe . ?value atlasterms:pValue ?pvalue . ?value rdfs:label ?expressionValue . ?probe atlasterms:dbXref ?dbXref . } ?pwElement dcterms:isPartOf ?pathway . ?pathway dc:title ?pwTitle . ?pathway dc:identifier ?wpURL . ?pwElement wp:bdbEnsembl ?dbXref . } ORDER BY ASC(?pvalue)
Genes differentially expressed in obesity and Pathways
PREFIX identifiers:<https://identifiers.org/ensembl/> PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> PREFIX efo: <http://www.ebi.ac.uk/efo/> SELECT DISTINCT ?wpURL ?pwTitle ?expressionValue ?pvalue where { SERVICE <https://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?factor rdf:type efo:EFO_0001073 . ?value atlasterms:hasFactorValue ?factor . ?value atlasterms:isMeasurementOf ?probe . ?value atlasterms:pValue ?pvalue . ?value rdfs:label ?expressionValue . ?probe atlasterms:dbXref ?dbXref . } ?pwElement dcterms:isPartOf ?pathway . ?pathway dc:title ?pwTitle . ?pathway dc:identifier ?wpURL . ?pwElement wp:bdbEnsembl ?dbXref . } ORDER BY ASC(?pvalue)
WikiPathways with Wikidata - !Under Construction!
The following queries are run on the WikiPathways SPARQL endpoint. But you can run federated queries on the Wikidata SPARQL endpoint too. Example queries for that are found here.
Metabolites in Wikipedia with InChIKeys from Wikidata
The corresponding query on Wikidata is here.
PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT ?metabolite ?wikidata ?inchikey WHERE { { SELECT ?metabolite ?wikidata WHERE { ?metabolite a wp:Metabolite ; wp:bdbWikidata ?wikidata . } LIMIT 50 } SERVICE <https://query.wikidata.org/sparql> { ?wikidata wdt:P235 ?inchikey . } } LIMIT 50
MolMeDB and WikiPathways
Find all pathways, which link out to one compound in MolMeDB database (http://identifiers.org/molmedb/MM00431) through Pubchem mapping.
SELECT DISTINCT ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) ((substr(str(?COMPOUND),46)) as ?PubChem) WHERE { SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/molmedb> { <http://identifiers.org/molmedb/MM00431> skos:exactMatch ?COMPOUND. filter (strstarts(str(?COMPOUND), 'http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID')) } ?gene a wp:Metabolite ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbPubChem ?COMPOUND . ?pathwayRes a wp:Pathway ; wp:organismName "Homo sapiens" ; dcterms:identifier ?wpid ; dc:title ?title . }
Find which pathways out of a subset of pathway (in the VALUES clause), link out to one compound in MolMeDB database (http://identifiers.org/molmedb/MM00431) through Pubchem mapping.
SELECT DISTINCT ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) ((substr(str(?COMPOUND),46)) as ?PubChem) WHERE { SERVICE <https://idsm.elixir-czech.cz/sparql/endpoint/molmedb> { SERVICE <https://sparql.wikipathways.org/sparql> { VALUES ?wpid {"WP4224" "WP4225" "WP4571"} ?gene a wp:Metabolite ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbPubChem ?COMPOUND . ?pathwayRes a wp:Pathway ; wp:organismName "Homo sapiens" ; dcterms:identifier ?wpid ; dc:title ?title . } <http://identifiers.org/molmedb/MM00431> skos:exactMatch ?COMPOUND. } }
Identifier to WikiPathways lists
List of WikiPathways for Ensembl identifiers
select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (fn:substring(?ensId,33) as ?ensembl) where { ?gene a wp:GeneProduct ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbEnsembl ?ensId . ?pathwayRes a wp:Pathway ; dcterms:identifier ?wpid ; dc:title ?title . } LIMIT 100
List of WikiPathways for HGNC symbols
select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (fn:substring(?hgncId,36) as ?HGNC) where { ?gene a wp:GeneProduct ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbHgncSymbol ?hgncId . ?pathwayRes a wp:Pathway ; dcterms:identifier ?wpid ; dc:title ?title . } LIMIT 100
List of WikiPathways for NCBI Gene identifiers
select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (fn:substring(?ncbiGeneId,34) as ?NCBIGene) where { ?gene a wp:GeneProduct ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbEntrezGene ?ncbiGeneId . ?pathwayRes a wp:Pathway ; dcterms:identifier ?wpid ; dc:title ?title . } LIMIT 100
List of WikiPathways for HMDB identifiers
select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (fn:substring(?hmdbId,29) as ?hmdb) where { ?gene a wp:Metabolite ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbHmdb ?hmdbId . ?pathwayRes a wp:Pathway ; dcterms:identifier ?wpid ; dc:title ?title . }
List of WikiPathways for ChemSpider identifiers
select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (fn:substring(?csId,35) as ?chemspider) where { ?gene a wp:Metabolite ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbChemspider ?csId . ?pathwayRes a wp:Pathway ; dcterms:identifier ?wpid ; dc:title ?title . }
List of WikiPathways for PubChem CID identifiers
select distinct ?pathwayRes (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (fn:substring(?cid,46) as ?PubChem) where { ?gene a wp:Metabolite ; dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes ; wp:bdbPubChem ?cid . ?pathwayRes a wp:Pathway ; dcterms:identifier ?wpid ; dc:title ?title . }
Code examples
Perl
There is an RDF api available. Below is an example that extracts the data by converting the query into a url and extracts the data as CSV.
#!/usr/bin/perl use LWP::Simple; use URI::Escape; my $sparql = "SELECT DISTINCT ?wpIdentifier ?elementneedsattention ?elementLabel WHERE { ?pathway dc:title ?title . ?elementneedsattention a gpml:requiresCurationAttention . ?elementneedsattention dcterms:isPartOf ?pathway . ?elementneedsattention rdfs:label ?elementLabel . ?pathway wp:organism ?organism . ?pathway foaf:page ?page . ?pathway dc:identifier ?wpIdentifier . ?organism rdfs:label \"Mus musculus\"^^<http://www.w3.org/2001/XMLSchema#string> . } ORDER BY ?wpIdentifier"; my $url = 'https://sparql.wikipathways.org/sparql?default-graph-uri=&query='.uri_escape($sparql).'&format=text%2Fcsv&timeout=0&debug=on'; my $content = get $url; die "Couldn't get $url" unless defined $content; print $content;
Java
For java we recommend the Jena Framework.
import com.hp.hpl.jena.query.Query; import com.hp.hpl.jena.query.QueryExecution; import com.hp.hpl.jena.query.QueryExecutionFactory; import com.hp.hpl.jena.query.QueryFactory; import com.hp.hpl.jena.query.QuerySolution; import com.hp.hpl.jena.query.ResultSet; public class javaCodeExample { public static void main(String[] args) { String sparqlQueryString = "SELECT * WHERE {?s ?p ?o} LIMIT 10"; Query query = QueryFactory.create(sparqlQueryString); QueryExecution queryExecution = QueryExecutionFactory.sparqlService("https://sparql.wikipathways.org/sparql", query); ResultSet resultSet = queryExecution.execSelect(); while (resultSet.hasNext()) { QuerySolution solution = resultSet.next(); System.out.print(solution.get("s")); System.out.print("\t"+solution.get("p")); System.out.println("\t"+solution.get("o")); } } }
php
For php we recommend the arc2: Easy RDF and SPARQL for LAMP systems
R
The R package rrdf can be found and installed from https://github.com/egonw/rrdf.
library(rrdf) sparql.remote( "https://sparql.wikipathways.org/sparql", "SELECT DISTINCT ?p WHERE { ?s ?p ?o }" )
Another option is to use the SPARQL-package (tested on Ubuntu 18.04.5 LTS, R-studio version 1.4.1717, R version 4.1.0 (2021-05-18)).
- Note the backslashes in front of the quotation marks in the VALUES claim; this is specifically needed in R to read these characters correctly.
- Note this query is an example of how to perform a UNION query in WikiPathways.
if(!"SPARQL" %in% installed.packages()){ install.packages("SPARQL") } library(SPARQL) ##Connect to Endpoint WikiPathways endpointwp <- "https://sparql.wikipathways.org/sparql" queryDatanodeContent <- " select distinct (str(?wpid) as ?pathway) (str(?title) as ?pathwayTitle) (count(distinct ?hgncIdProtein) AS ?ProteinsInPWs) (count(distinct ?chebiMetabolite) AS ?MetabolitesInPWs) where { VALUES ?wpid {\'WP4224\' \'WP4225\' \'WP4571\' } ?datanode dcterms:identifier ?id ; dcterms:isPartOf ?pathwayRes . ?pathwayRes a wp:Pathway ; dcterms:identifier ?wpid ; dc:title ?title . {?datanode a wp:Protein ; wp:bdbHgncSymbol ?hgncIdProtein .} UNION {?datanode a wp:Metabolite ; wp:bdbChEBI ?chebiMetabolite .} } ORDER BY ASC(?wpid) " resultsDatanodeContent <- SPARQL(endpointwp,queryDatanodeContent,curl_args=list(useragent=R.version.string)) showresultsDatanodeContent <- resultsDatanodeContent$results
Bioclipse
The below code works in both the JavaScript and the Groovy console:
rdf.sparqlRemote( "https://sparql.wikipathways.org/sparql", "SELECT DISTINCT ?p WHERE { ?s ?p ?o }" )
SPARQL from the command line
For quick and easy querying, we recommend to use curl (Linux and OS X)
curl -F "query=SELECT * WHERE {?s ?p ?o} LIMIT 10" https://sparql.wikipathways.org/sparql