Changes between Version 3 and Version 4 of Data_exchange

Show
Ignore:
Timestamp:
2010/02/14 18:14:06 (8 years ago)
Author:
admin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Data_exchange

    v3 v4  
    1  
     1[[PageOutline]]  
    22Wednesday 10th February p.m. - Room 516 
    33 
    4 ''' Semantic Data Exchange''' 
     4= Semantic Data Exchange = 
    55 
    66 * Gos Micklem 
     
    1818 * Alberto Labarga 
    1919 
    20 '''Discussion on possibilities/need for improving data exchange between 
    21 e.g. !InterMine, Galaxy, !BioMart, Cytoscape...''' 
     20== Discussion on possibilities/need for improving data exchange between e.g. !InterMine, Galaxy, !BioMart, Cytoscape... == 
    2221 
    2322Would typing of arbitrary data exchange improve communication between 
     
    2625 
    2726 
    28 '''Current situation:''' 
     27== Current situation: == 
    2928 
    3029It was felt that the current situation wasn't so bad: however it would 
     
    5352   these aren't used. 
    5453 
    55     Interoperation of Marts: this is the only place where must get the 
    56      semantics correct.  If one mart calls something a !UniProt 
    57      identifier and the other one does too then essential that they are 
    58      refering to the same identifier.  Perhaps would be good to have 
    59      controlled name-space for this and/or a hand-shake to check that do 
    60      have matching values. 
     54   Interoperation of Marts: this is the only place where must get the 
     55   semantics correct.  If one mart calls something a !UniProt 
     56   identifier and the other one does too then essential that they are 
     57   refering to the same identifier.  Perhaps would be good to have 
     58   controlled name-space for this and/or a hand-shake to check that do 
     59   have matching values. 
    6160 
    6261   !InterMine (http://www.intermine.org): multiple organisms can use the same identifiers 
     
    7271useful. 
    7372 
    74    Available data-describing controlled vocabularies: OICR cancer data 
    75    experience is that there are rather limited naming systems. 
    76  
    77    Thought to be a good idea to expose/ export current naming systems. 
    78    The Cancer Genome Atlas (TCGA: http://cancergenome.nih.gov/) have done 
    79    some thinking along these lines. 
     73   Available data-describing controlled vocabularies: OICR cancer data experience is that there are rather limited naming systems. 
     74 
     75   Thought to be a good idea to expose/ export current naming systems. The Cancer Genome Atlas (TCGA: http://cancergenome.nih.gov/) have done some thinking along these lines. 
    8076 
    8177   Galaxy: has xml to describe file formats: 
    82    biopython/bioperl/bioruby/biojava have more-or-less agreed 
    83    filenames. 
    84  
    85 Just thinking of FASTA format for sequence there are quite a number of 
    86 Flavours: 
     78   biopython/bioperl/bioruby/biojava have more-or-less agreed  filenames. 
     79 
     80Just thinking of FASTA format for sequence there are quite a number of Flavours: 
    8781 - DNA vs protein sequences 
    8882 - use of ambiguity codes or not 
     
    125119Kei: PSI-MI EBI website:  global definitions: 
    126120http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI 
    127  
     121{{{ 
    128122   molecular-interaction 
    129123   --> database citation 
     
    131125       --> gene ontology (double click for definition) 
    132126      http://www.ebi.ac.uk/ontology-lookup/?termId=MI%3A0448 
    133  
     127}}} 
    134128Semantics needs to be regulated regardless of the technology (So RDF 
    135129isn't necessarily the point here) 
     
    140134large-scale users/providers can start to comply. 
    141135 
    142 '''Data exchange conclusions:''' 
     136== Data exchange conclusions: == 
    143137 * A namespace for file formats would be useful. 
    144  
    145  * A namespace for column of tabular data would be useful.  Could 
    146     also be used to describe data in other formats e.g. XML, though 
    147     this could be rather verbose. 
     138 * A namespace for column of tabular data would be useful.  Could also be used to describe data in other formats e.g. XML, though this could be rather verbose. 
    148139 * Investigate whether the above exist. 
    149     Ontology Lookup Service (http://www.ebi.ac.uk/ontology-lookup) 
    150     and/orLife Science Resource Name Project (http://www.lsrn.org) 
    151     applicable ? 
    152  
    153  * At the moment namespaces for columns is probably more important 
    154     than URIs for each data element in a column. 
    155  
    156  * Agreed that worthwhile to pass URIs to describe columns.  Agreed that 
    157   arbitrary human-friendly names are also good. 
    158  
    159  * Agreed to dump all !BioMart/ !InterMine column headings out, find 
    160     the common/commonly-used ones and work on naming. 
    161  
    162  
    163  
    164 '''Discussion turned to genome builds:''' 
     140   * Ontology Lookup Service (http://www.ebi.ac.uk/ontology-lookup) and/orLife Science Resource Name Project (http://www.lsrn.org) applicable ? 
     141 
     142 * At the moment namespaces for columns is probably more important than URIs for each data element in a column. 
     143 
     144 * Agreed that worthwhile to pass URIs to describe columns.  Agreed that arbitrary human-friendly names are also good. 
     145 
     146 * Agreed to dump all !BioMart/ !InterMine column headings out, find the common/commonly-used ones and work on naming. 
     147 
     148 
     149 
     150== Discussion turned to genome builds: == 
    165151 
    166152There is no-where to go to find out if entities/ coordinates come from 
    167153the same versions of genomes.  Agreed Versioning is important. 
    168154 
    169   !BioMart/ UCSC do have versions 
    170   available but not necessarily using the same namespaces. 
     155  !BioMart/ UCSC do have versions available but not necessarily using the same namespaces. 
    171156 
    172157  biomart has place-holders for versions and could easily expose these. 
    173158 
    174 Issue with resources generated from'old' genome versions e.g. affy 
    175 chips: difficult to force people to use just one version of the 
    176 genome. 
    177  
    178 Can make gene identifiers unique by organism-specific prefix, or by 
    179 qualifier. 
    180  
    181 Ensembl (http://www.ensembl.org) does a good job and plans on 
    182 supporting all genomes.  ensembl: systematic 
    183 about mapping their versions to others e.g. from UCSC. 
    184  
    185 Assembly version and ensembl gene-build version are sufficient to 
    186 resolve all ambiguities. 
     159Issue with resources generated from'old' genome versions e.g. affy chips: difficult to force people to use just one version of the genome. 
     160 
     161Can make gene identifiers unique by organism-specific prefix, or by qualifier. 
     162 
     163Ensembl (http://www.ensembl.org) does a good job and plans on supporting all genomes.  ensembl: systematic about mapping their versions to others e.g. from UCSC. 
     164 
     165Assembly version and ensembl gene-build version are sufficient to resolve all ambiguities. 
    187166 
    188167 
     
    199178version. 
    200179 
    201 '''Genome version summary:''' 
     180== Genome version summary: == 
    202181 * Investigate whether is there a standard available for describing genome version 
    203182 * Consider whether to base naming on ensembl genome/ annotation versions 
     
    205184 
    206185 
    207 '''Thoughts on RDF:''' 
     186== Thoughts on RDF: == 
    208187 
    209188If everyone is expressing their data in RDF with a common underlying