Changes between Version 3 and Version 4 of Data_exchange
- Timestamp:
- 2010/02/14 18:14:06 (15 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Data_exchange
v3 v4 1 1 [[PageOutline]] 2 2 Wednesday 10th February p.m. - Room 516 3 3 4 ''' Semantic Data Exchange''' 4 = Semantic Data Exchange = 5 5 6 6 * Gos Micklem … … 18 18 * Alberto Labarga 19 19 20 '''Discussion on possibilities/need for improving data exchange between 21 e.g. !InterMine, Galaxy, !BioMart, Cytoscape...''' 20 == Discussion on possibilities/need for improving data exchange between e.g. !InterMine, Galaxy, !BioMart, Cytoscape... == 22 21 23 22 Would typing of arbitrary data exchange improve communication between … … 26 25 27 26 28 '''Current situation:''' 27 == Current situation: == 29 28 30 29 It was felt that the current situation wasn't so bad: however it would … … 53 52 these aren't used. 54 53 55 56 57 58 59 60 54 Interoperation of Marts: this is the only place where must get the 55 semantics correct. If one mart calls something a !UniProt 56 identifier and the other one does too then essential that they are 57 refering to the same identifier. Perhaps would be good to have 58 controlled name-space for this and/or a hand-shake to check that do 59 have matching values. 61 60 62 61 !InterMine (http://www.intermine.org): multiple organisms can use the same identifiers … … 72 71 useful. 73 72 74 Available data-describing controlled vocabularies: OICR cancer data 75 experience is that there are rather limited naming systems. 76 77 Thought to be a good idea to expose/ export current naming systems. 78 The Cancer Genome Atlas (TCGA: http://cancergenome.nih.gov/) have done 79 some thinking along these lines. 73 Available data-describing controlled vocabularies: OICR cancer data experience is that there are rather limited naming systems. 74 75 Thought to be a good idea to expose/ export current naming systems. The Cancer Genome Atlas (TCGA: http://cancergenome.nih.gov/) have done some thinking along these lines. 80 76 81 77 Galaxy: has xml to describe file formats: 82 biopython/bioperl/bioruby/biojava have more-or-less agreed 83 filenames. 84 85 Just thinking of FASTA format for sequence there are quite a number of 86 Flavours: 78 biopython/bioperl/bioruby/biojava have more-or-less agreed filenames. 79 80 Just thinking of FASTA format for sequence there are quite a number of Flavours: 87 81 - DNA vs protein sequences 88 82 - use of ambiguity codes or not … … 125 119 Kei: PSI-MI EBI website: global definitions: 126 120 http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI 127 121 {{{ 128 122 molecular-interaction 129 123 --> database citation … … 131 125 --> gene ontology (double click for definition) 132 126 http://www.ebi.ac.uk/ontology-lookup/?termId=MI%3A0448 133 127 }}} 134 128 Semantics needs to be regulated regardless of the technology (So RDF 135 129 isn't necessarily the point here) … … 140 134 large-scale users/providers can start to comply. 141 135 142 '''Data exchange conclusions:''' 136 == Data exchange conclusions: == 143 137 * A namespace for file formats would be useful. 144 145 * A namespace for column of tabular data would be useful. Could 146 also be used to describe data in other formats e.g. XML, though 147 this could be rather verbose. 138 * A namespace for column of tabular data would be useful. Could also be used to describe data in other formats e.g. XML, though this could be rather verbose. 148 139 * Investigate whether the above exist. 149 Ontology Lookup Service (http://www.ebi.ac.uk/ontology-lookup) 150 and/orLife Science Resource Name Project (http://www.lsrn.org) 151 applicable ? 152 153 * At the moment namespaces for columns is probably more important 154 than URIs for each data element in a column. 155 156 * Agreed that worthwhile to pass URIs to describe columns. Agreed that 157 arbitrary human-friendly names are also good. 158 159 * Agreed to dump all !BioMart/ !InterMine column headings out, find 160 the common/commonly-used ones and work on naming. 161 162 163 164 '''Discussion turned to genome builds:''' 140 * Ontology Lookup Service (http://www.ebi.ac.uk/ontology-lookup) and/orLife Science Resource Name Project (http://www.lsrn.org) applicable ? 141 142 * At the moment namespaces for columns is probably more important than URIs for each data element in a column. 143 144 * Agreed that worthwhile to pass URIs to describe columns. Agreed that arbitrary human-friendly names are also good. 145 146 * Agreed to dump all !BioMart/ !InterMine column headings out, find the common/commonly-used ones and work on naming. 147 148 149 150 == Discussion turned to genome builds: == 165 151 166 152 There is no-where to go to find out if entities/ coordinates come from 167 153 the same versions of genomes. Agreed Versioning is important. 168 154 169 !BioMart/ UCSC do have versions 170 available but not necessarily using the same namespaces. 155 !BioMart/ UCSC do have versions available but not necessarily using the same namespaces. 171 156 172 157 biomart has place-holders for versions and could easily expose these. 173 158 174 Issue with resources generated from'old' genome versions e.g. affy 175 chips: difficult to force people to use just one version of the 176 genome. 177 178 Can make gene identifiers unique by organism-specific prefix, or by 179 qualifier. 180 181 Ensembl (http://www.ensembl.org) does a good job and plans on 182 supporting all genomes. ensembl: systematic 183 about mapping their versions to others e.g. from UCSC. 184 185 Assembly version and ensembl gene-build version are sufficient to 186 resolve all ambiguities. 159 Issue with resources generated from'old' genome versions e.g. affy chips: difficult to force people to use just one version of the genome. 160 161 Can make gene identifiers unique by organism-specific prefix, or by qualifier. 162 163 Ensembl (http://www.ensembl.org) does a good job and plans on supporting all genomes. ensembl: systematic about mapping their versions to others e.g. from UCSC. 164 165 Assembly version and ensembl gene-build version are sufficient to resolve all ambiguities. 187 166 188 167 … … 199 178 version. 200 179 201 '''Genome version summary:''' 180 == Genome version summary: == 202 181 * Investigate whether is there a standard available for describing genome version 203 182 * Consider whether to base naming on ensembl genome/ annotation versions … … 205 184 206 185 207 '''Thoughts on RDF:''' 186 == Thoughts on RDF: == 208 187 209 188 If everyone is expressing their data in RDF with a common underlying