= Day-1 = = Day0 = = Day1 = = Day2 = == Designing a mechanism to output object using the ERb template engine for writing data in RDF == * Mitsuteru Nakao * Naohisa Goto * Toshiaki Katayama * Raoul Jean Pierre Bonnal === Basic idea === {{{ medline = Bio::MEDLINE.new(medline_flatfile) puts medline.output_rdf # => print a medline abstract in RDF format. }}} * Writing a Bio::DB object data in RDF format * Using the Erb template system * Template replaceablity === Proposed architecture === Adding a module Bio::OutputErb {{{ module Bio module OutputErb require 'erb' def output_method_erb(m, t) erb = ERB.new(File.read(t)) erb.def_method(self, m, t) end end end }}} Extending the Bio::MEDLINE {{{ class MEDLINE extend OutputErb def_output_method_erb("output_ttl", 'bio/bio/db/medline/medline.ttl.erb') def output(t) send("output_#{t.to_s}") end end }}} An Erb template file medline.ttl.erb for RDF/Turtle (partial) {{{ <% require 'date' # http://www.nlm.nih.gov/bsd/mms/medlineelements.html # A generic RDF subject URI at the TogoWS REST @prefix = "http://togows.dbcls.jp/entry/ncbi-pubmed" def uri "<#{@prefix}/#{c(pubmed['PMID'])}>" end # Generate a generic RDF predicate URI at the TogoWS REST. def predicate(field_name) "" end def ndate(str) str.strip case str when /^\d+$/ str.gsub(/(\d{4})(\d{2})(\d{2})/) { "#{$1}-#{$2}-#{$3}"} else str.gsub("/", '-') end end def ndatetime(str) d,t = str.split(" ") [ndate(d), t].join(" ") end %><%= uri %> <%= predicate('pmid') %> "<%=c pubmed['PMID'] %>" . <%= uri %> "pmid:<%=c pubmed['PMID'] %>" . <%= uri %> "pmid:<%=c pubmed['PMID'] %>" . <%= uri %> > . <%= uri %> <%= predicate('own') %> "<%=c pubmed['OWN'] %>" . <%= uri %> <%= predicate('stat') %> "<%=c pubmed['STAT'] %>" . <%= uri %> <%= predicate('da') %> "<%= ndate(c pubmed['DA']) %>" . <%= uri %> <%= predicate('dcom') %> "<%=ndate(c pubmed['DCOM']) %>" . <%= uri %> <%= predicate('lr') %> "<%=ndate(c pubmed['LR']) %>" . <% pubmed['IS'].scan(/(\d+-\d+ \(\S+\))/).flatten.each do |is| %> <%= uri %> <%= predicate('is') %> "<%=c "#{is}" %>" . <% end %> <%= uri %> <%= predicate('vi') %> "<%=c pubmed['VI'] %>" . <%= uri %> "<%=c pubmed['VI'] %>" . <%= uri %> <%= predicate('dp') %> "<%=c pubmed['DP'] %>" . ... }}} On using {{{ medline = Bio::MEDLINE.new(medline_flatfile) mdeline.output_ttl # .ttl == RDF/Turtle mdeline.output(:ttl) # alias medline.to_ttl # alias medline.to(:ttl) # alias }}} Use user template {{{ class MEDLINE def_output_method_erb("output_rdfxml", 'bio/bio/db/medline/medline.rdfxml.erb') end medline = Bio::MEDLINE.new(medline_flatfile) mdeline.output_rdfxml }}} File arrangement * lib/bio/db/ * medline.rb * medline/medline.ttl.erb === Issues === Naming issue * Choice: Bio::Renderer / Bio::Render / Bio::Template / Bio::Output / Bio::Export / Bio::Exporter / Bio::Writer / Bio::OutputErb * Choice: medline.output_ttl / medline.output(:ttl) / medline.to_ttl / medline.to(:ttl) * Pros: the to_ttl naming is easy for beginner. * Cons: to_ttl style name is for converting class. The to_s method is to convert a object to String expression. And to_json may be confusional. Method namespace * Functions defined at the template file contaminates the namespace of the Bio::MEDLINE class. Performance issue * Reading template file every time may be heavy because it uses eval. * ERB#def_method (or def_class or def_module) may help, but another problem: it always reads the template when the method (or class/module) is defined even if the output is not needed. Format variants and options * How to specify format variants and options? For example, it is better to have many html output variants.