Changes between Version 39 and Version 40 of ImplementationBootcamp

Show
Ignore:
Timestamp:
2010/02/11 21:38:05 (14 years ago)
Author:
RutgerVos
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ImplementationBootcamp

    v39 v40  
    1313Francois Belleau seems to prefer virtuoso, though he assures us he has no commercial interest in them :) 
    1414 
    15 MDW:  Virtuoso seems to be the triple-store of choice at the moment, but it does suffer from problems with data import.  We (Wilkinson lab, Belleau/Bio2RDF, Dumontier lab) have considerable experience with this, that we will write tutorials about and add the link here soon! 
     15> MDW:  Virtuoso seems to be the triple-store of choice at the moment, but it does suffer from problems with data import.  We (Wilkinson lab, Belleau/Bio2RDF, Dumontier lab) have considerable experience with this, that we will write tutorials about and add the link here soon! 
    1616 
    1717=== How to access a triplestore in Perl/Ruby/Python/Java etc.? === 
     
    3131Then use [http://protege.stanford.edu/ Protege] to actually build the ontology. 
    3232 
    33 MDW:  I highly recommend that you "make friends" with someone who has a deep understanding of OWL, and the consequences of various OWL constructs, as you go through your learning experience.  While the existing tutorials are good for telling you what is possible, they aren't always entirely clear about the consequences of choosing one encoding method versus another... and this dramatically affects your ability to "reason over" your data!!  Unfortunately, there are few shortcuts - OWL is hard!   
     33> MDW:  I highly recommend that you "make friends" with someone who has a deep understanding of OWL, and the consequences of various OWL constructs, as you go through your learning experience.  While the existing tutorials are good for telling you what is possible, they aren't always entirely clear about the consequences of choosing one encoding method versus another... and this dramatically affects your ability to "reason over" your data!!  Unfortunately, there are few shortcuts - OWL is hard!   
    3434 
    3535=== Which version of Protege should I use? === 
     
    3737Why not the latest one? You get the current OWL 2. 
    3838 
    39 MDW:  Protege 3 and Protege 4 are "philosophically" different, and represent a split in the global ontology community that runs roughly along the lines of the "OBO-fans" and the "OWL-DL-fans" (that's over-simplifying the situation, but I think it is by-and-large correct).  The two development communities had different target-audiences in mind when developing the software, and those audiences are reflected in the decisions made.  Protege 4 uses the Manchester OWL API "under the hood", and is somewhat more capable of manipulating OWL than Protege 3 is (IMO).  On the other hand, if you are planning to use Protege to generate RDF data ("individuals") manually, then Protege 3 might be more useful for you.  This is all entirely my opinion, so please don't flame me if you are a fan of one or the other :-) 
     39> MDW:  Protege 3 and Protege 4 are "philosophically" different, and represent a split in the global ontology community that runs roughly along the lines of the "OBO-fans" and the "OWL-DL-fans" (that's over-simplifying the situation, but I think it is by-and-large correct).  The two development communities had different target-audiences in mind when developing the software, and those audiences are reflected in the decisions made.  Protege 4 uses the Manchester OWL API "under the hood", and is somewhat more capable of manipulating OWL than Protege 3 is (IMO).  On the other hand, if you are planning to use Protege to generate RDF data ("individuals") manually, then Protege 3 might be more useful for you.  This is all entirely my opinion, so please don't flame me if you are a fan of one or the other :-) 
    4040 
    4141=== How do I namespace my terms? === 
     
    4343It is best to do this such that they can actually be resolved (unlike XML), preferably to an OWL file, e.g. "http://example.org/terms.owl#" 
    4444 
    45 MDW:  Can we re-phrase the question to be clear what we are asking?  :-) 
     45> MDW:  Can we re-phrase the question to be clear what we are asking?  :-) 
    4646 
    4747===  What are the similarities and differences between the various shared names proposals?  === 
     
    4949Shared names proposals such as LSRN? UniProt? 
    5050 
    51 MDW:  See above, and ask specific questions that I can try to answer myself, or invite the representatives from the other proposals to answer.  (Get well soon, Alan!!!!) 
     51> MDW:  See above, and ask specific questions that I can try to answer myself, or invite the representatives from the other proposals to answer.  (Get well soon, Alan!!!!) 
    5252 
    5353== Web services == 
     
    5555===  I have an analysis tool, how do I expose it as a semantic web resource? === 
    5656 
    57 MDW:  SADI please :-)   Luke gave the Java tutorial today, and I gave the Perl tutorial.  Edward Kawas from my lab has produced movies detailing how to create services in Perl for SADI, and I will be doing the voice-over for these movies and putting them up on YouTube in the next week.  I will add a link here.  We will do the same for the Java side once we have the extra-cool Java functionalities coded and ~stable.  In particular, Luke McCarthy and Paul Gordon have been working together at the Hackathon finding simple ways to put SADI Java services into the Google Cloud... so you might not even have to consume your own compute resources to achieve this! 
     57> MDW:  SADI please :-)   Luke gave the Java tutorial today, and I gave the Perl tutorial.  Edward Kawas from my lab has produced movies detailing how to create services in Perl for SADI, and I will be doing the voice-over for these movies and putting them up on YouTube in the next week.  I will add a link here.  We will do the same for the Java side once we have the extra-cool Java functionalities coded and ~stable.  In particular, Luke McCarthy and Paul Gordon have been working together at the Hackathon finding simple ways to put SADI Java services into the Google Cloud... so you might not even have to consume your own compute resources to achieve this! 
    5858 
    5959===  When someone calls GET on my URLs, what should I return in order to be semantic webby? === 
     
    7272}}} 
    7373 
    74 MDW:  As an aside, the idea of content-negotiation has been extensively discussed within the Semantic Web for Healthcare and Life Science community, and it was not widely welcomed.  The point of the Semantic Web is that things should be ''explicit'', so there is some preference given to explicitly indicating (in your RDF metadata) that any given URI is going to return one syntax or another.  (though I have to agree, I am quite a fan of content-negotiation, given that this is exactly the problem that it was designed to solve!!  :-) ) 
     74> MDW:  As an aside, the idea of content-negotiation has been extensively discussed within the Semantic Web for Healthcare and Life Science community, and it was not widely welcomed.  The point of the Semantic Web is that things should be ''explicit'', so there is some preference given to explicitly indicating (in your RDF metadata) that any given URI is going to return one syntax or another.  (though I have to agree, I am quite a fan of content-negotiation, given that this is exactly the problem that it was designed to solve!!  :-) ) 
    7575 
    76 MDW:  Going back as far as 2004, when the LSID specification was being finalized, this issue was a top-priority, so there is a sub-commmunity of bioinformatics data providers who have thought about this problem for many many years! :-)  This has led to a variety of "shared names" proposals, including the Science Commons, Semantic Science, LSRN, and others.  In SADI (and now LSRN, since my lab has taken-over the LSRN project in the past 2 months) we have decided to work with the Semantic Science shared-names proposal from Michel Dumontier.  He has developed an ontology (I will provide a link to this as soon as Michel decides that the ontology is "final"... within days!!).  The ontology defines how a URI should "behave" during resolution, depending on the kind of "thing" that the URI represents - e.g. a biological/physical entity, a database record, or a particular ''representation'' of a database record in html, xml, rdf, etc.  Within the SADI project, we will be writing all of our support code to make compliance with the Semantic Science ontology as automatic as possible.  We are also in the process of doing the same for URIs resolved through the LSRN resolution system... so if you use SADI or LSRN, you should get compliance with this ontology "for free" within the next week or two!  ''In My Opinion This Is One Of The Most Important Issues We Have Addressed At This Hackathon!!''  The Semantic Web works SO much better if we are careful to pay attention to what our URIs REPRESENT: things, records, or representations of records.  It sounds tedious, but we're doing everything we can to shield the data providers from having to think deeply about the problem, and trying to encode the complexity in our respective codebases. 
     76> MDW:  Going back as far as 2004, when the LSID specification was being finalized, this issue was a top-priority, so there is a sub-commmunity of bioinformatics data providers who have thought about this problem for many many years! :-)  This has led to a variety of "shared names" proposals, including the Science Commons, Semantic Science, LSRN, and others.  In SADI (and now LSRN, since my lab has taken-over the LSRN project in the past 2 months) we have decided to work with the Semantic Science shared-names proposal from Michel Dumontier.  He has developed an ontology (I will provide a link to this as soon as Michel decides that the ontology is "final"... within days!!).  The ontology defines how a URI should "behave" during resolution, depending on the kind of "thing" that the URI represents - e.g. a biological/physical entity, a database record, or a particular ''representation'' of a database record in html, xml, rdf, etc.  Within the SADI project, we will be writing all of our support code to make compliance with the Semantic Science ontology as automatic as possible.  We are also in the process of doing the same for URIs resolved through the LSRN resolution system... so if you use SADI or LSRN, you should get compliance with this ontology "for free" within the next week or two!  ''In My Opinion This Is One Of The Most Important Issues We Have Addressed At This Hackathon!!''  The Semantic Web works SO much better if we are careful to pay attention to what our URIs REPRESENT: things, records, or representations of records.  It sounds tedious, but we're doing everything we can to shield the data providers from having to think deeply about the problem, and trying to encode the complexity in our respective codebases. 
    7777 
    7878=== How do I create a SADI service? === 
     
    9797There is [http://www4.wiwiss.fu-berlin.de/bizer/d2rq/ D2RQ] which works okey but lacks a bit performance-wise. 
    9898 
    99 MDW:  This really depends on whether or not you intend to publish your database as a SPARQL endpoint.  The poll that Pierre and I took over the past couple of days suggests that only 5 data providers (within Tweet-shot of us) currently provide SQL access to their data resources.  IMO this does not bode well for having data providers set-up SPARQL endpoints!!  (why would they open themselves to a new, unfamiliar technology when they don't open themselves to a well-known, tested, secure, and highly powerful technology???)   We have tried to make a compelling argument that exposing resources via SADI Web Services gives you the best of both worlds - a highly-granular control over what data you expose, how you expose it, and over the distribution of large numbers of requests over your compute-resources; yet our SHARE client helps make it *appear* that the entire world is one big SPARQL endpoint (on steroids, since you can SPARQL data that doesn't even exist until you ask the question!)  My opinion (biased!) is that SADI Web Services are a better way to expose RDF data compared to SPARQL endpoints.  Moreover, it doesn't require you to change your existing data infrastructure in any way - you don't need to have a triple-store to expose your data as triples via SADI.  With a Web Service-based exposure, you can migrate your data gradually/modularly, a few properties at a time, rather than attempting to move your entire database to the Semantic Web in one shot... and gain experience as you go!  Given that it is currently not (natively) possible to SPARQL query over multiple endpoints, you aren't losing anything by going the SADI route either.  Finally, '''all''' of your resources (both database and analytical tools) are exposed in exactly the same way, meaning that they are all accessed by clients in exactly the same way, simplifying client design :-) 
     99> MDW:  This really depends on whether or not you intend to publish your database as a SPARQL endpoint.  The poll that Pierre and I took over the past couple of days suggests that only 5 data providers (within Tweet-shot of us) currently provide SQL access to their data resources.  IMO this does not bode well for having data providers set-up SPARQL endpoints!!  (why would they open themselves to a new, unfamiliar technology when they don't open themselves to a well-known, tested, secure, and highly powerful technology???)   We have tried to make a compelling argument that exposing resources via SADI Web Services gives you the best of both worlds - a highly-granular control over what data you expose, how you expose it, and over the distribution of large numbers of requests over your compute-resources; yet our SHARE client helps make it *appear* that the entire world is one big SPARQL endpoint (on steroids, since you can SPARQL data that doesn't even exist until you ask the question!)  My opinion (biased!) is that SADI Web Services are a better way to expose RDF data compared to SPARQL endpoints.  Moreover, it doesn't require you to change your existing data infrastructure in any way - you don't need to have a triple-store to expose your data as triples via SADI.  With a Web Service-based exposure, you can migrate your data gradually/modularly, a few properties at a time, rather than attempting to move your entire database to the Semantic Web in one shot... and gain experience as you go!  Given that it is currently not (natively) possible to SPARQL query over multiple endpoints, you aren't losing anything by going the SADI route either.  Finally, '''all''' of your resources (both database and analytical tools) are exposed in exactly the same way, meaning that they are all accessed by clients in exactly the same way, simplifying client design :-) 
    100100   
    101101=== How granular should my returned RDF be? === 
    102102 
    103 MDW:  There was a VERY brief discussion of this issue on Thursday... the answer was "be pragmatic".  Highly granular data (like absolute expression-level changes for microarrays) might not be appropriate for conversion into RDF because it explodes the size of the dataset in a circumstance where (a) the dataset is generally going to be used as a whole anyway, and (b) there are completely adequate parsers for existing file-formats, and (c) the benefit of being able to reason over an RDF representation of the data is limited, or absent.   
     103> MDW:  There was a VERY brief discussion of this issue on Thursday... the answer was "be pragmatic".  Highly granular data (like absolute expression-level changes for microarrays) might not be appropriate for conversion into RDF because it explodes the size of the dataset in a circumstance where (a) the dataset is generally going to be used as a whole anyway, and (b) there are completely adequate parsers for existing file-formats, and (c) the benefit of being able to reason over an RDF representation of the data is limited, or absent.   
    104104 
    105105=== Where do I validate my RDF/XML? ===