Download Pathways
From WikiPathways
(Difference between revisions)
(updating to August) |
Current revision (21:36, 10 November 2024) (view source) (Nov data release) |
||
(83 intermediate revisions not shown.) | |||
Line 8: | Line 8: | ||
- | <font size=4>'''Current version: [http://data.wikipathways.org/ | + | <font size=4>'''Current version: [http://data.wikipathways.org/20241110/ 20241110 (10 November 2024)]'''</font> |
=== Vertebrates === | === Vertebrates === | ||
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Bos_taurus.zip Bos taurus]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Canis_familiaris.zip Canis familiaris]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Danio_rerio.zip Danio rerio]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Equus_caballus.zip Equus caballus]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Gallus_gallus.zip Gallus gallus]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Homo_sapiens.zip Homo sapiens]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Mus_musculus.zip Mus musculus]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Pan_troglodytes.zip Pan troglodytes]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Rattus_norvegicus.zip Rattus norvegicus]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Sus_scrofa.zip Sus scrofa]''' | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
=== Invertebrates === | === Invertebrates === | ||
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Anopheles_gambiae.zip Anopheles gambiae]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Caenorhabditis_elegans.zip Caenorhabditis elegans]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Drosophila_melanogaster.zip Drosophila melanogaster]''' | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
=== Plants === | === Plants === | ||
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Arabidopsis_thaliana.zip Arabidopsis thaliana]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Hordeum_vulgare.zip Hordeum vulgare]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Oryza_sativa.zip Oryza sativa]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Populus_trichocarpa.zip Populus trichocarpa]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Solanum_lycopersicum.zip Solanum lycopersicum]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Zea_mays.zip Zea mays]''' | |
- | + | <!-- * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Beta_vulgaris.zip Beta vulgaris]''' --> | |
- | + | ||
- | <!-- | + | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
=== Eukaryotic microorganisms === | === Eukaryotic microorganisms === | ||
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Gibberella_zeae.zip Gibberella zeae]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Saccharomyces_cerevisiae.zip Saccharomyces cerevisiae]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Plasmodium_falciparum.zip Plasmodium falciparum]''' | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
=== Bacteria === | === Bacteria === | ||
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Bacillus_subtilis.zip Bacillus subtilis]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Escherichia_coli.zip E.coli]''' | |
- | + | * '''[http://data.wikipathways.org/20241110/gpml/wikipathways-20241110-gpml-Mycobacterium_tuberculosis.zip Mycobacterium tuberculosis]''' | |
- | + | ||
- | + | ||
- | + | == Programmatic Access == | |
- | + | The archive of current and past collections of pathways in various formats at data.wikipathways.org is accessible programmatically as well. Depending on your preferences, there are many ways to identify and download the collection you need. | |
- | + | ||
- | + | ''Note: Our files contain the date of creation in their names so that you can be sure which collection your are using and to avoid overwriting local copies of these files.'' | |
- | | | + | |
+ | # '''[https://github.com/wikipathways/rwikipathways rWikiPathways]''' is an R package that provides an helper function called ''downloadPathwayArchive'' that will retrieve the latest file for you per species and format, e.g., <pre>downloadPathwayArchive(organism="Mus musculus”, format=‘gmt’)</pre> | ||
+ | # '''Filename pattern''' allows you to infer the filename of the latest collection given the current date. For example, since we always release our archive collections on the 10th of each month, you know that the latest filename is the nearest prior date matching that pattern, e.g., 20180910 would be the current file from Sep 10 to Oct 10, 2018. ''Caution: this might break if for some unforeseen reason we are unable to produce the archive on schedule.'' | ||
+ | # '''Bash scripting''' allows you to scrape the currently available filenames and guarantee that you are getting the latest file no matter what the name might be. Here is an example of a one-liner to get a list of all the current GMT files: <pre>echo "cat //html/body/div/table/tbody/tr/td/a" | xmllint --html --shell http://data.wikipathways.org/current/gmt/ | grep -o -E ">(.*gmt)<" | sed -E 's/(<|>)//g'</pre> And here is a version that would return the latest GMT for mouse: <pre>echo "cat //html/body/div/table/tbody/tr/td/a" | xmllint --html --shell http://data.wikipathways.org/current/gmt/ | grep -o -E ">.*Mus_musculus.gmt<" | sed -E 's/(<|>)//g'</pre> | ||
+ | |||
== Other Collections == | == Other Collections == | ||
Line 112: | Line 72: | ||
* [http://data.wikipathways.org/current/rdf Linked data files (RDF)] | * [http://data.wikipathways.org/current/rdf Linked data files (RDF)] | ||
* [http://data.wikipathways.org/current/index Database index files (index)] | * [http://data.wikipathways.org/current/index Database index files (index)] | ||
- | + | * [[Help:FileFormats|Other file formats]] | |
</font> | </font> |
Current revision
Versioned Releases
Each month we release an updated set of pathways in various data and image formats. These pathways have been reviewed and tagged as approved, and are considered ready for analysis and data overlays.
Current version: 20241110 (10 November 2024)
Vertebrates
- Bos taurus
- Canis familiaris
- Danio rerio
- Equus caballus
- Gallus gallus
- Homo sapiens
- Mus musculus
- Pan troglodytes
- Rattus norvegicus
- Sus scrofa
Invertebrates
Plants
Eukaryotic microorganisms
Bacteria
Programmatic Access
The archive of current and past collections of pathways in various formats at data.wikipathways.org is accessible programmatically as well. Depending on your preferences, there are many ways to identify and download the collection you need.
Note: Our files contain the date of creation in their names so that you can be sure which collection your are using and to avoid overwriting local copies of these files.
- rWikiPathways is an R package that provides an helper function called downloadPathwayArchive that will retrieve the latest file for you per species and format, e.g.,
downloadPathwayArchive(organism="Mus musculus”, format=‘gmt’)
- Filename pattern allows you to infer the filename of the latest collection given the current date. For example, since we always release our archive collections on the 10th of each month, you know that the latest filename is the nearest prior date matching that pattern, e.g., 20180910 would be the current file from Sep 10 to Oct 10, 2018. Caution: this might break if for some unforeseen reason we are unable to produce the archive on schedule.
- Bash scripting allows you to scrape the currently available filenames and guarantee that you are getting the latest file no matter what the name might be. Here is an example of a one-liner to get a list of all the current GMT files:
echo "cat //html/body/div/table/tbody/tr/td/a" | xmllint --html --shell http://data.wikipathways.org/current/gmt/ | grep -o -E ">(.*gmt)<" | sed -E 's/(<|>)//g'
And here is a version that would return the latest GMT for mouse:echo "cat //html/body/div/table/tbody/tr/td/a" | xmllint --html --shell http://data.wikipathways.org/current/gmt/ | grep -o -E ">.*Mus_musculus.gmt<" | sed -E 's/(<|>)//g'
Other Collections
- Prior monthly releases
- Daily curated releases
- Reactome Human Collection
- Gene lists per pathway (GMT)
- Pathway image files (SVG)
- Linked data files (RDF)
- Database index files (index)
- Other file formats