Version 13 (modified by admin, 15 years ago) |
---|
Datasets
Lists currently available and still missing datasets (as Linked Data / RDF).
Available resources
Let's make a list of all RDF resources currently available and evaluate them according to its contents and quality. Extraction of meaningful triples (relations) from original data sources requires good understanding of their contents and it could be a key for the resulting usefulness.
- Bio2RDF Namespace
- Ensembl - genes
- OBO - GO terms, ChEBI compounds
- NCBI - genes, sequences, mesh terms, disease (omim), pubmed articles
- KEGG - pathways, genes, enzymes, compounds, drugs, glycans, reactions
- MGI - genes
- PDB - structures
- UniProt - proteins, keywords, taxonomy
- NeuroCommons Project
- http://sparql.neurocommons.org/
- http://sparql.neurocommons.org/sparql? -- SPARQL endpoint
- http://neurocommons.org/page/RDF_distribution
- http://neurocommons.org/page/Bundles
- http://ashby.csail.mit.edu/presentations/The_Neurocommons_Common_names_and_ontologies_for_open_source_knowledge_integration_on_the_Semantic_Web.pdf
- uniprot RDF
- Linked Data
- DBCLS RDFs
We should categorize these according to their format (e.g. RDF) and extracted relationships (not only by their original source databases).
Missing resources
Not sure that they are actually unavailable, but let's lists wanted relations (triples) to solve biological queries.
- Taxonomy <-> Pathway module
- Taxonomy <-> Ortholog cluster
- Gene <-> Expression patterns (from multiple experiments)
- Enzyme <-> Activity
- Protein architectures (domain combinations) <-> Taxonomy
We should add intended reasons (what for these relations are required).