Help:WikiPathways Metabolomics
From WikiPathways
On this page we collect SPARQL queries to see the state of the Metabolome in WikiPathways. Triggered by User:Andra's RDF / SPARQL work, curation started with metabolites without database identifiers. But this soon led to the observation that metabolites are often not even annotated as being a metabolite (using <Label> rather than <DataNode>). Therefore, User:Egonw started at Pathway:WP1 to curate them one by one and fix these issues:
- connect lines between metabolites
- convert metabolites to use <Label> rather than <DataNode>
The reason for this is that these are some basic underlying properties we need for metabolomics research fields.
Contents |
Metabolome
The following queries provide an overview of the Metabolome captures by WikiPathways.
The key type for metabolites is the wp:Metabolite. We can see all available properties with:
prefix wp: <http://vocabularies.wikipathways.org/wp#> select distinct ?p where { ?mb a wp:Metabolite ; ?p [] . }
Likewise, we can get all pathway properties with:
prefix wp: <http://vocabularies.wikipathways.org/wp#> select distinct ?p where { ?mb a wp:Pathway ; ?p [] . }
Latest data only
To only get analysis of the most recent pathways, add this snippet to your SPARQL, assuming ?pathway is the used variable name:
?mb dcterms:isPartOf ?pathway . ?pathway pav:version ?version . ?mb dcterms:isPartOf ?pathway2 . ?pathway2 pav:version ?version2 . FILTER (?version2 > ?version)
However, it should be kept in mind that this is not a fool-proof solution.
All Metabolites
Count
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select count(?mb) where { ?mb a wp:Metabolite . }
List
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select distinct ?mb ?label where { ?mb a wp:Metabolite ; rdfs:label ?label . }
Metabolic Data Sources
Sorted by use
HMDB, ChEBI, and KEGG are the main data sources for identifiers. InChI/InChIKey should also be there but is missing. A big curation process in January 2013 ensured that "PubChem compound" is now used as data source for PubChem CIDs.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select ?datasource count(?identifier) as ?count where { ?mb a wp:Metabolite ; dc:source ?datasource ; dc:identifier ?identifier . } order by desc(?count)
All metabolites from one source
All KEGG identifiers
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select distinct ?identifier where { ?mb a wp:Metabolite ; dc:source "Kegg Compound"^^xsd:string ; dc:identifier ?identifier . } order by ?identifier
All HMDB identifiers
At the time of writing, this showed a number of XRefs with HMDB as data source but no identifiers, which needs curation:
http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP1002_r35260 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP1119_r35265 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP1250_r41240 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP1266_r41328 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP1285_r41669 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP1304_r41670 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP1310_r41659 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP1339_r35269 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP167_r45138 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP2267_r53133 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP28_r38852 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP28_r38852/group/ac37a http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP295_r41324 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP337_r41644 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP495_r41327 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP59_r41653 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP678_r41165 http://www.hmdb.ca/metabolites/noIdentifier http://rdf.wikipathways.org/Pathway/WP716_r45017
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select distinct ?identifier where { ?mb a wp:Metabolite ; dc:source "HMDB"^^xsd:string ; dc:identifier ?identifier . } order by ?identifier
Metabolic Pathways
Metabolomes
Human Metabolome
This only returns 244 metabolites, which is not a lot at all, and does not even take account the metabolite identity. Something wrong with wp:organism? It finds 107 human pathways.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix dcterms: <http://purl.org/dc/terms/> prefix ncbi: <http://purl.obolibrary.org/obo/NCBITaxon_> select distinct ?mb where { ?mb a wp:Metabolite ; dcterms:isPartOf ?pw . ?pw wp:organism ncbi:9606 . } order by ?mb
Pathways with the most metabolites
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix pav: <http://purl.org/pav/> select ?pathway count(?mb) as ?mbCount where { ?mb a wp:Metabolite ; dcterms:isPartOf ?pathway . } order by desc(?mbCount)
Metabolites in the most Pathways
With the remark that BridgeDB is not involved yet.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix pav: <http://purl.org/pav/> select ?mb count(?pathway) as ?pwCount where { ?mb a wp:Metabolite ; dcterms:isPartOf ?pathway . } order by desc(?pwCount)
Curation
Common wrong identifiers
PubChem-compound 1004
Wrongly used for phosphate. It is the uncharged compound. Phosphate is, instead, and particularly thinkgs like "Pi", CID 1061 for ortho-phosphate, aka [PO4]2-.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select ?pathway ?source where { ?mb dc:source ?source ; dcterms:isPartOf ?pathway ; dcterms:identifier "1004"^^xsd:string . }
Outdated HMDB identifiers
These results show HMDB identifiers used in WikiPathways but that are revoked or have become secondary identifiers.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select distinct ?identifier where { ?mb a wp:Metabolite ; dc:source "HMDB"^^xsd:string ; dc:identifier ?identifier . OPTIONAL { ?mb wp:bdbHmdb ?bridgedb . } FILTER (!BOUND(?bridgedb)) } order by ?identifier
Metabolites not classified as such
One can list all data sources for non-metabolites with this query.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> select ?datasource count(?identifier) as ?count where { ?mb dc:source ?datasource ; dcterms:identifier ?identifier . FILTER NOT EXISTS { ?mb a wp:Metabolite } } order by desc(?count)
That mostly lists gene identifier sources, etc, but watch out for the metabolite identifier data sources. For example, metabolites not marked as such but with a metabolite identifier can be found this way. Down the list is CAS (but genes are chemicals too...), and a few minor more:
"CTD Gene"^^<http://www.w3.org/2001/XMLSchema#string> 5 "HMDB"^^<http://www.w3.org/2001/XMLSchema#string> 4 "ChEBI"^^<http://www.w3.org/2001/XMLSchema#string> 3 "GLYCAN"^^<http://www.w3.org/2001/XMLSchema#string> 3 "COMPOUND"^^<http://www.w3.org/2001/XMLSchema#string> 3 "PubChem"^^<http://www.w3.org/2001/XMLSchema#string> 2
I would expect GLYCAN and COMPOUND to be misnomers of the matching KEGG subsets.
Non-Metabolites with CAS identifier
Note that a CAS identifier can also refer to mixtures, compound classes, etc.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select distinct ?pathway ?mb ?label ?identifier where { ?mb dc:source "CAS"^^xsd:string ; rdfs:label ?label ; dcterms:identifier ?identifier ; dcterms:isPartOf ?pathway . FILTER NOT EXISTS { ?mb a wp:Metabolite } } order by ?pathway
Non-Metabolites with PubChem identifier
At the time of writing, this results in an empty set.
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select distinct ?pathway ?mb ?label ?identifier where { ?mb dc:source "PubChem-compound"^^xsd:string ; dcterms:identifier ?identifier ; dcterms:isPartOf ?pathway . OPTIONAL { ?mb rdfs:label ?label . } FILTER NOT EXISTS { ?mb a wp:Metabolite } } order by ?pathway
Metabolites with an identifier but undefined data source
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select distinct ?pathway ?mb ?identifier where { ?mb a wp:Metabolite ; dc:source ""^^xsd:string ; dc:identifier ?identifier ; dcterms:isPartOf ?pathway . FILTER (!isIRI(?identifier)) FILTER (str(?identifier) != "") } order by ?pathway
Metabolites with a data source but no identifier
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select distinct ?pathway ?mb ?source where { ?mb a wp:Metabolite ; dcterms:identifier ""^^xsd:string ; dc:source ?source ; dcterms:isPartOf ?pathway . FILTER (str(?source) != "") FILTER (!regex(str(?pathway), "internal.wikipathways.org", "i")) } order by ?pathway
Metabolites with an Entrez Gene identifier
prefix wp: <http://vocabularies.wikipathways.org/wp#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix dcterms: <http://purl.org/dc/terms/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select distinct ?pathway ?mb ?identifier where { ?mb a wp:Metabolite ; dc:source "Entrez Gene"^^xsd:string ; dc:identifier ?identifier ; dcterms:isPartOf ?pathway . FILTER (!isIRI(?identifier)) FILTER (str(?identifier) != "") } order by ?pathway