Day-1 (2/6)
BioRuby Roadmap
10:30-18:00, Room 408, DBCLS (Faculty of Engineering Bldg. 12, Univ. of Tokyo)
Participants:
- Pjotr Prins
- Jan Aerts
- Toshiaki Katayama
- Mitsuteru Nakao
- Naohisa Goto
- Raoul Jean Pierre Bonnal
Day0 (2/7)
BioRuby Roadmap
10:00-18:00, Room 408
Participants:
- Pjotr Prins
- Jan Aerts
- Toshiaki Katayama
- Mitsuteru Nakao
- Naohisa Goto
- Raoul Jean Pierre Bonnal
- Christian M Zmasek
- Shuichi Kawashima
- Kazuhiro Hayashi
Day1
Day2
Designing a mechanism to output object using the ERb template engine for writing data in RDF
- Mitsuteru Nakao
- Naohisa Goto
- Toshiaki Katayama
- Raoul Jean Pierre Bonnal
Basic idea
medline = Bio::MEDLINE.new(medline_flatfile) puts medline.output_rdf # => print a medline abstract in RDF format.
- Writing a Bio::DB object data in RDF format
- Using the Erb template system
- Template replaceablity
Proposed architecture
Adding a module Bio::OutputErb?
module Bio
  module OutputErb
    require 'erb'
    def output_method_erb(m, t)
      erb = ERB.new(File.read(t))
      erb.def_method(self, m, t)
    end
  end
end
Extending the Bio::MEDLINE
class MEDLINE
  extend OutputErb
  def_output_method_erb("output_ttl", 'bio/bio/db/medline/medline.ttl.erb')
  def output(t)
    send("output_#{t.to_s}")
  end
end
An Erb template file medline.ttl.erb for RDF/Turtle (partial)
<%
require 'date'
# http://www.nlm.nih.gov/bsd/mms/medlineelements.html
# A generic RDF subject URI at the TogoWS REST
@prefix = "http://togows.dbcls.jp/entry/ncbi-pubmed"
def uri
  "<#{@prefix}/#{c(pubmed['PMID'])}>"
end
# Generate a generic RDF predicate URI at the TogoWS REST.
def predicate(field_name)
  "<http://togows.dbcls.jp/nezu/1.0/ncbi-pubmed##{field_name}>"
end
def ndate(str)
  str.strip
  case str
  when /^\d+$/
    str.gsub(/(\d{4})(\d{2})(\d{2})/) { "#{$1}-#{$2}-#{$3}"}
  else
    str.gsub("/", '-')
  end
end
def ndatetime(str)
  d,t = str.split(" ")
  [ndate(d), t].join(" ")
end
%><%= uri %>    <%= predicate('pmid') %>        "<%=c pubmed['PMID'] %>" .
<%= uri %>      <http://www.w3.org/2000/01/rdf-schema#label>    "pmid:<%=c pubmed['PMID'] %>" .
<%= uri %>      <http://purl.org/dc/elements/1.1/title> "pmid:<%=c pubmed['PMID'] %>" .
<%= uri %>      <http://purl.org/dc/elements/1.1/identifier>    <http://pubmed.org/<%=c pubmed['PMID'] %>> .
<%= uri %>      <%= predicate('own') %> "<%=c pubmed['OWN'] %>" .
<%= uri %>      <%= predicate('stat') %>        "<%=c pubmed['STAT'] %>" .
<%= uri %>      <%= predicate('da') %>  "<%= ndate(c pubmed['DA']) %>" .
<%= uri %>      <%= predicate('dcom') %>        "<%=ndate(c pubmed['DCOM']) %>" .
<%= uri %>      <%= predicate('lr') %>  "<%=ndate(c pubmed['LR']) %>" .
<% pubmed['IS'].scan(/(\d+-\d+ \(\S+\))/).flatten.each do |is| %>
<%= uri %>      <%= predicate('is') %>  "<%=c "#{is}" %>" .
<% end %>
<%= uri %>      <%= predicate('vi') %>  "<%=c pubmed['VI'] %>" .
<%= uri %>      <http://prismstandard.org/namespaces/2.0/basic/volume>  "<%=c pubmed['VI'] %>" .
<%= uri %>      <%= predicate('dp') %>  "<%=c pubmed['DP'] %>" .
...
On using
medline = Bio::MEDLINE.new(medline_flatfile) mdeline.output_ttl # .ttl == RDF/Turtle mdeline.output(:ttl) # alias medline.to_ttl # alias medline.to(:ttl) # alias
Use user template
class MEDLINE
  def_output_method_erb("output_rdfxml", 'bio/bio/db/medline/medline.rdfxml.erb')
end
medline = Bio::MEDLINE.new(medline_flatfile)
mdeline.output_rdfxml
File arrangement
- lib/bio/db/
- medline.rb
- medline/medline.ttl.erb
 
Issues
Naming issue
- Choice: Bio::Renderer / Bio::Render / Bio::Template / Bio::Output / Bio::Export / Bio::Exporter / Bio::Writer / Bio::OutputErb?
- Choice: medline.output_ttl / medline.output(:ttl) / medline.to_ttl / medline.to(:ttl)
- Pros: the to_ttl naming is easy for beginner.
- Cons: to_ttl style name is for converting class. The to_s method is to convert a object to String expression. And to_json may be confusional.
 
Method namespace
- Functions defined at the template file contaminates the namespace of the Bio::MEDLINE class.
Performance issue
- Reading template file every time may be heavy because it uses eval.
- ERB#def_method (or def_class or def_module) may help, but another problem: it always reads the template when the method (or class/module) is defined even if the output is not needed.
 
Format variants and options
- How to specify format variants and options? For example, it is better to have many html output variants.

