Day-1 (2/6)

BioRuby Roadmap

10:30-18:00, Room 408, DBCLS (Faculty of Engineering Bldg. 12, Univ. of Tokyo)

Participants:

  • Pjotr Prins
  • Jan Aerts
  • Toshiaki Katayama
  • Mitsuteru Nakao
  • Naohisa Goto
  • Raoul Jean Pierre Bonnal

Day0 (2/7)

BioRuby Roadmap

10:00-18:00, Room 408

Participants:

  • Pjotr Prins
  • Jan Aerts
  • Toshiaki Katayama
  • Mitsuteru Nakao
  • Naohisa Goto
  • Raoul Jean Pierre Bonnal
  • Christian M Zmasek
  • Shuichi Kawashima
  • Kazuhiro Hayashi

Day1

Day2

Designing a mechanism to output object using the ERb template engine for writing data in RDF

  • Mitsuteru Nakao
  • Naohisa Goto
  • Toshiaki Katayama
  • Raoul Jean Pierre Bonnal

Basic idea

medline = Bio::MEDLINE.new(medline_flatfile)
puts medline.output_rdf # => print a medline abstract in RDF format.
  • Writing a Bio::DB object data in RDF format
  • Using the Erb template system
  • Template replaceablity

Proposed architecture

Adding a module Bio::OutputErb?

module Bio
  module OutputErb
    require 'erb'
    def output_method_erb(m, t)
      erb = ERB.new(File.read(t))
      erb.def_method(self, m, t)
    end
  end
end

Extending the Bio::MEDLINE

class MEDLINE
  extend OutputErb
  def_output_method_erb("output_ttl", 'bio/bio/db/medline/medline.ttl.erb')

  def output(t)
    send("output_#{t.to_s}")
  end
end

An Erb template file medline.ttl.erb for RDF/Turtle (partial)

<%
require 'date'
# http://www.nlm.nih.gov/bsd/mms/medlineelements.html
# A generic RDF subject URI at the TogoWS REST
@prefix = "http://togows.dbcls.jp/entry/ncbi-pubmed"

def uri
  "<#{@prefix}/#{c(pubmed['PMID'])}>"
end

# Generate a generic RDF predicate URI at the TogoWS REST.
def predicate(field_name)
  "<http://togows.dbcls.jp/nezu/1.0/ncbi-pubmed##{field_name}>"
end

def ndate(str)
  str.strip
  case str
  when /^\d+$/
    str.gsub(/(\d{4})(\d{2})(\d{2})/) { "#{$1}-#{$2}-#{$3}"}
  else
    str.gsub("/", '-')
  end
end

def ndatetime(str)
  d,t = str.split(" ")
  [ndate(d), t].join(" ")
end
%><%= uri %>    <%= predicate('pmid') %>        "<%=c pubmed['PMID'] %>" .
<%= uri %>      <http://www.w3.org/2000/01/rdf-schema#label>    "pmid:<%=c pubmed['PMID'] %>" .
<%= uri %>      <http://purl.org/dc/elements/1.1/title> "pmid:<%=c pubmed['PMID'] %>" .
<%= uri %>      <http://purl.org/dc/elements/1.1/identifier>    <http://pubmed.org/<%=c pubmed['PMID'] %>> .
<%= uri %>      <%= predicate('own') %> "<%=c pubmed['OWN'] %>" .
<%= uri %>      <%= predicate('stat') %>        "<%=c pubmed['STAT'] %>" .
<%= uri %>      <%= predicate('da') %>  "<%= ndate(c pubmed['DA']) %>" .
<%= uri %>      <%= predicate('dcom') %>        "<%=ndate(c pubmed['DCOM']) %>" .
<%= uri %>      <%= predicate('lr') %>  "<%=ndate(c pubmed['LR']) %>" .
<% pubmed['IS'].scan(/(\d+-\d+ \(\S+\))/).flatten.each do |is| %>
<%= uri %>      <%= predicate('is') %>  "<%=c "#{is}" %>" .
<% end %>
<%= uri %>      <%= predicate('vi') %>  "<%=c pubmed['VI'] %>" .
<%= uri %>      <http://prismstandard.org/namespaces/2.0/basic/volume>  "<%=c pubmed['VI'] %>" .
<%= uri %>      <%= predicate('dp') %>  "<%=c pubmed['DP'] %>" .
...

On using

medline = Bio::MEDLINE.new(medline_flatfile)

mdeline.output_ttl # .ttl == RDF/Turtle
mdeline.output(:ttl) # alias 
medline.to_ttl # alias 
medline.to(:ttl) # alias 

Use user template

class MEDLINE
  def_output_method_erb("output_rdfxml", 'bio/bio/db/medline/medline.rdfxml.erb')
end

medline = Bio::MEDLINE.new(medline_flatfile)
mdeline.output_rdfxml

File arrangement

  • lib/bio/db/
    • medline.rb
    • medline/medline.ttl.erb

Issues

Naming issue

  • Choice: Bio::Renderer / Bio::Render / Bio::Template / Bio::Output / Bio::Export / Bio::Exporter / Bio::Writer / Bio::OutputErb?
  • Choice: medline.output_ttl / medline.output(:ttl) / medline.to_ttl / medline.to(:ttl)
    • Pros: the to_ttl naming is easy for beginner.
    • Cons: to_ttl style name is for converting class. The to_s method is to convert a object to String expression. And to_json may be confusional.

Method namespace

  • Functions defined at the template file contaminates the namespace of the Bio::MEDLINE class.

Performance issue

  • Reading template file every time may be heavy because it uses eval.
    • ERB#def_method (or def_class or def_module) may help, but another problem: it always reads the template when the method (or class/module) is defined even if the output is not needed.

Format variants and options

  • How to specify format variants and options? For example, it is better to have many html output variants.