Help:WikiPathways Metabolomics
From WikiPathways
(→Common wrong identifiers) |
(→Non-Metabolites with PubChem identifier) |
||
| Line 276: | Line 276: | ||
select distinct ?pathway ?mb ?identifier | select distinct ?pathway ?mb ?identifier | ||
where { | where { | ||
| - | ?mb dc:source "PubChem"^^xsd:string ; | + | ?mb dc:source "PubChem-compound"^^xsd:string ; |
dc:identifier ?identifier ; | dc:identifier ?identifier ; | ||
dcterms:isPartOf ?pathway . | dcterms:isPartOf ?pathway . | ||
| Line 284: | Line 284: | ||
</pre> | </pre> | ||
| - | [http://goo.gl/ | + | [http://goo.gl/2N97Q Run] |
== Metabolites with an identifier but undefined data source == | == Metabolites with an identifier but undefined data source == | ||
Revision as of 20:38, 20 January 2013
On this page we collect SPARQL queries to see the state of the Metabolome in WikiPathways. Triggered by User:Andra's RDF / SPARQL work, curation started with metabolites without database identifiers. But this soon led to the observation that metabolites are often not even annotated as being a metabolite (using <Label> rather than <DataNode>). Therefore, User:Egonw started at Pathway:WP1 to curate them one by one and fix these issues:
- connect lines between metabolites
- convert metabolites to use <Label> rather than <DataNode>
The reason for this is that these are some basic underlying properties we need for metabolomics research fields.
Contents |
Metabolome
The following queries provide an overview of the Metabolome captures by WikiPathways.
The key type for metabolites is the wp:Metabolite. We can see all available properties with:
prefix wp: <http://vocabularies.wikipathways.org/wp#>
select distinct ?p where {
?mb a wp:Metabolite ;
?p [] .
}
Likewise, we can get all pathway properties with:
prefix wp: <http://vocabularies.wikipathways.org/wp#>
select distinct ?p where {
?mb a wp:Pathway ;
?p [] .
}
Latest data only
To only get analysis of the most recent pathways, add this snippet to your SPARQL, assuming ?pathway is the used variable name:
?mb dcterms:isPartOf ?pathway . ?pathway pav:version ?version . ?mb dcterms:isPartOf ?pathway2 . ?pathway2 pav:version ?version2 . FILTER (?version2 > ?version)
However, it should be kept in mind that this is not a fool-proof solution.
All Metabolites
Count
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select count(?mb) where {
?mb a wp:Metabolite .
}
List
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select ?mb ?label where {
?mb a wp:Metabolite ;
rdfs:label ?label .
}
Metabolic Data Sources
Sorted by use
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select ?datasource count(?identifier) as ?count
where {
?mb a wp:Metabolite ;
dc:source ?datasource ;
dc:identifier ?identifier .
} order by desc(?count)
All metabolites from one source
All KEGG identifiers
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select distinct ?identifier
where {
?mb a wp:Metabolite ;
dc:source "Kegg Compound"^^xsd:string ;
dc:identifier ?identifier .
FILTER (!isIRI(?identifier))
} order by ?identifier
All HMDB identifiers
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select distinct ?identifier
where {
?mb a wp:Metabolite ;
dc:source "HMDB"^^xsd:string ;
dc:identifier ?identifier .
FILTER (!isIRI(?identifier))
} order by ?identifier
Metabolic Pathways
Metabolomes
Human Metabolome
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix ncbi: <http://purl.obolibrary.org/obo/NCBITaxon_>
select distinct ?mb where {
?mb a wp:Metabolite ;
dcterms:isPartOf ?pw .
?pw wp:organism ncbi:9606 .
} order by ?mb
Pathways with the most metabolites
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix pav: <http://purl.org/pav/>
select ?pathway count(?mb) as ?mbCount
where {
?mb a wp:Metabolite ;
dcterms:isPartOf ?pathway .
} order by desc(?mbCount)
Metabolites in the most Pathways
With the remark that BridgeDB is not involved yet.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix pav: <http://purl.org/pav/>
select ?mb count(?pathway) as ?pwCount
where {
?mb a wp:Metabolite ;
dcterms:isPartOf ?pathway .
} order by desc(?pwCount)
Curation
Common wrong identifiers
PubChem-compound 1004
Wrongly used for phosphate. It is the uncharged compound. Phosphate is, instead, and particularly thinkgs like "Pi", CID 1061 for ortho-phosphate, aka [PO4]2-.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select ?pathway ?source
where {
?mb dc:source ?source ;
dcterms:isPartOf ?pathway ;
dc:identifier "1004"^^xsd:string .
}
Metabolites not classified as such
One can list all data sources for non-metabolites with this query:
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
select ?datasource count(?identifier) as ?count
where {
?mb dc:source ?datasource ;
dc:identifier ?identifier .
FILTER NOT EXISTS { ?mb a wp:Metabolite }
} order by desc(?count)
That mostly lists gene identifier sources, etc, but watch out for the metabolite identifier data sources. For example, metabolites not marked as such but with a metabolite identifier can be found this way.
Non-Metabolites with CAS identifier
Note that a CAS identifier can also refer to mixtures, compound classes, etc.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?pathway ?mb ?identifier
where {
?mb dc:source "CAS"^^xsd:string ;
dc:identifier ?identifier ;
dcterms:isPartOf ?pathway .
FILTER NOT EXISTS { ?mb a wp:Metabolite }
FILTER (!isIRI(?identifier))
} order by ?pathway
Non-Metabolites with PubChem identifier
These might have been curated by the time of reading.
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?pathway ?mb ?identifier
where {
?mb dc:source "PubChem-compound"^^xsd:string ;
dc:identifier ?identifier ;
dcterms:isPartOf ?pathway .
FILTER NOT EXISTS { ?mb a wp:Metabolite }
FILTER (!isIRI(?identifier))
} order by ?pathway
Metabolites with an identifier but undefined data source
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?pathway ?mb ?identifier
where {
?mb a wp:Metabolite ;
dc:source ""^^xsd:string ;
dc:identifier ?identifier ;
dcterms:isPartOf ?pathway .
FILTER (!isIRI(?identifier))
FILTER (str(?identifier) != "")
} order by ?pathway
Metabolites with an Entrez Gene identifier
prefix wp: <http://vocabularies.wikipathways.org/wp#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?pathway ?mb ?identifier
where {
?mb a wp:Metabolite ;
dc:source "Entrez Gene"^^xsd:string ;
dc:identifier ?identifier ;
dcterms:isPartOf ?pathway .
FILTER (!isIRI(?identifier))
FILTER (str(?identifier) != "")
} order by ?pathway

