Changes between Version 3 and Version 4 of 2010Q4

Show
Ignore:
Timestamp:
2011/01/27 17:29:42 (13 years ago)
Author:
arek
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • 2010Q4

    v3 v4  
    11Follwo-up meetings held during 2011/01-2011/03 at DBCLS. 
    22 
    3 BioMart RDF-Integration via SPARQL 
     3[[PageOutline]]  
     4= URL = 
     5 * BioMart http://www.biomart.org 
     6 * ICGC Data Portal http://dcc.icgc.org  
     7 * 
     8 
     9== BioMart RDF integration via SPARQL == 
     10tore.eriksson has made a tentative XSL stylesheet to convert PDBMLplus (some selected elements) into RDF. 
     11(but when I checked the output RDF with raptor converter (rapper), it had some errors...) 
     12 
     13While I (akinjo) was in Shinkansen from Tokyo to Osaka, I wrote an XSL stylesheet that convert the whole PDBML file 
     14into RDF (files attached). I noticed one good thing about PDBML. 
     15 * PDBML is based on mmCIF (PDB's original format) 
     16 * mmCIF is actually defined as an ontology. 
     17 * So, we can use mmCIF categories and items as predicates. 
     18 * An xpath REST interface for PDBMLplus is available at pdbj: e.g., http://service.pdbj.org/mine/xpath/1a00/PDBx:datablock/PDBx:entityCategory 
     19 * Thus, we can use xpaths as subjects and objects in RDF. 
     20 
     21Some examples of the triples are: 
     22{{{ 
     23<http://service.pdbj.org/mine/xpath/1A00> <http://www.w3.org/2000/01/rdf-schema#label> "1A00" . 
     24<http://service.pdbj.org/mine/xpath/1A00/PDBx:datablock/PDBx:entityCategory/PDBx:entity[1]> <http://mmcif.pdbj.org/XML/pdbmlplus/pdbMLplus_v32.xsd/_entity.pdbx_description> "HEMOGLOBIN (ALPHA CHAIN)" . 
     25<http://service.pdbj.org/mine/xpath/1A00/PDBx:datablock/PDBx:entityCategory> <http://mmcif.pdbj.org/XML/pdbmlplus/pdbMLplus_v32.xsd/entity> <http://service.pdbj.org/mine/xpath/1A00/PDBx:datablock/PDBx:entityCategory/PDBx:entity[4]> . 
     26}}} 
     27(Predicate URI's are not valid at present.) 
     28 
     29=== To do === 
     30 * Currently, PDBML files converted by using PDBMLplus2rdf.xsl and PDBML2rdf.xsl do not contain any links to other databases. For that we need to write other XSL stylesheets. 
     31 * There are also cross references within PDB, but these are not handled yet. To do so requires some analysis of the PDBML schema.  
     32 
     33== 2010-02-15: PDBML schema to OWL == 
     34I succeeded converting PDBML schema into OWL/RDF using XSLT. The resulting OWL file was validated as OWL/Full-compatible by !WonderWeb OWL Ontology validator 
     35( http://www.mygrid.org.uk/OWL/Validator )! 
     36 
     37=== To do === 
     38 * Writing a XSL stylesheet that write another XSL stylesheet for converting PDBML files into RDF. 
     39That is,  
     40{{{ 
     41PDBML Schema (pdbx-v32.xsd) --(pdbx2pdbml2rdf.xsl)--> XSL Stylesheet (pdbml2rdf.xsl) 
     42PDBML file --(pdbml2rdf.xsl)--> PDBML/RDF 
     43}}} 
     44 
     45One big advantage of translating PDBML schema is that it contains cross-references to many items within a PDBML file. 
     46= DDBJ things = 
     47 * http://xml.nig.ac.jp/rest/Invoke?service=DDBJ&method=getXMLEntry&accession=<ACCESSION> 
     48    e.g. http://xml.nig.ac.jp/rest/Invoke?service=DDBJ&method=getXMLEntry&accession=AL121903 
     49 * URL which returns prototype RDF 
     50 * http://sabi.ddbj.nig.ac.jp/ddbj/data/<ACCESSION> 
     51    e.g. http://sabi.ddbj.nig.ac.jp/ddbj/data/Z48241 
     52 * URL which returns in flatfile format 
     53 * http://sabi.ddbj.nig.ac.jp/ddbj/<ACCESSION> 
     54    e.g. http://sabi.ddbj.nig.ac.jp/ddbj/Z48241 
     55 * URL which redirects HTML page 
     56 * http://sabi.ddbj.nig.ac.jp/ddbj/html/<ACCESSION> 
     57    e.g. http://sabi.ddbj.nig.ac.jp/ddbj/html/Z48241 
     58 
     59= KEGG things = 
     60 * Draft KEGG RDF download site (temporal) : http://www.hgc.jp/~shuichi/biohack2010/ 
     61 
     62 * Note: I wouldn't recommend to display the following files in your web browsers because it's large text file. 
     63 * http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2pdb.ttl (KEGG GENES2PDB / PDB2KEGG GENES turtle: 730,602 triples) 
     64 * http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2kegg-ko.ttl (KEGG GENES2KO / KEGG KO2GENES turtle: 3,687,074 triples) 
     65 * http://www.hgc.jp/~shuichi/biohack2010/kegg-ko2kegg-pathway.ttl (KEGG KO2PATHWAY / KEGG PATHWAY2KO turtle: 22,774 triples) 
     66 * http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2kegg-ko.ttl (KEGG GENES2NCBI GENE-ID / NCBI GENE-ID2KEGG GENES turtle: 3,687,074 triples) 
     67 * http://www.hgc.jp/~shuichi/biohack2010/kegg-ko2definition.ttl (KEGG KO2KO definition turtle: 13,211 triples) 
     68 * Total 14,391,245 triples 
     69 
     70= Reflect for pubmed = 
     71To use reflect on pubmed: 
     72http://reflect.cbs.dtu.dk/TEST/GetEntities?uri=http://www.ncbi.nlm.nih.gov/pubmed/20146332&entity_types=9606 
     73 
     74The result will contain XML code like seen at 
     75[http://reflect.cbs.dtu.dk/restAPI.html http://reflect.cbs.dtu.dk/restAPI.html] 
     76 
     77 
     78= SPARQL endpoint = 
     79 
     80Room 415 network 
     81 * Bio2RDF KEGG - http://192.168.11.61:8890/sparql/ 
     82 * Bio2RDF PDB - http://192.168.11.61:8891/sparql/ 
     83 * DDBJ+KEGG-PDBj -  http://192.168.11.61:8892/sparql/ 
     84 * PDBj - 
     85 * KEGG -  
     86 * DDBJ -  
     87 
     88Facet 
     89 * Bio2RDF KEGG - http://192.168.11.61:8890/fct/ 
     90 * Bio2RDF PDB - http://192.168.11.61:8891/fct/ 
     91 * DDBJ-KEGG-PDBj -  http://192.168.11.61:8892/fct/ 
     92 * PDBj -  
     93 * KEGG -  
     94 * DDBJ -  
     95 
     96= Validating RDF/XML format = 
     97 * http://librdf.org/parse 
     98 
     99= How to load data to virtuoso = 
     100First, in the '''virtuoso.ini''' file, set the following parameter  
     101{{{ 
     102DirsAllowed                     = ., /usr/local/virtuoso-opensource/share/virtuoso/vad, /tmp 
     103}}} 
     104So the directory /tmp is allowed to have data to be loaded. 
     105 
     106Then put the data file in /tmp (e.g., all.ttl, ddbj.rdf). 
     107 
     108{{{ 
     109% cat load.isql 
     110DB.DBA.TTLP_MT(file_to_string_output('/tmp/all.ttl'), '' ,'http://www.pdbj.org'); 
     111checkpoint; 
     112 
     113DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/tmp/lala.rdf'), '' ,'http://www.pdbj.org'); 
     114checkpoint; 
     115 
     116% isql 1111 dba dba < load.isql 
     117}}} 
     118 
     119Here the third argument for the functions '''TTLP_MT''' and '''RDF_LOAD_RDFXML''' is the name of the graph 
     120(in this case, it's '''http://www.pdbj.org'''). 
     121 
     122= Results? = 
     123[[wiki:DDBJ-KEGG-PDBj/Results]] 
     124 
     125Developed the following on-the-fly DDBJ interface of RDF, Web API and HTML page 
     126 * URL which returns prototype RDF 
     127 * http://sabi.ddbj.nig.ac.jp/ddbj/data/<ACCESSION> 
     128    e.g. http://sabi.ddbj.nig.ac.jp/ddbj/data/Z48241 
     129 * URL which returns in flatfile format (URI?) 
     130 * http://sabi.ddbj.nig.ac.jp/ddbj/<ACCESSION> 
     131    e.g. http://sabi.ddbj.nig.ac.jp/ddbj/Z48241 
     132 * URL which redirects HTML page 
     133 * http://sabi.ddbj.nig.ac.jp/ddbj/html/<ACCESSION> 
     134    e.g. http://sabi.ddbj.nig.ac.jp/ddbj/html/Z48241 
     135 
     136Installed the following virtuoso at DDBJ site 
     137 * http://sabi.ddbj.nig.ac.jp:8080/sparql 
     138 
     139FAQ: How many triples ? 
     140{{{ 
     141mnmq:pdbj bh10$ wc -l *.ttl 
     142 1018388 all.ttl 
     143   25991 ddbj.ttl 
     144  730602 kegg-genes2pdb.ttl 
     145   18988 kegg-hsa2kegg-ko.ttl 
     146   51438 kegg-hsa2ncbi-gene_id.ttl 
     147   22774 kegg-ko2kegg-pathway.ttl 
     148 15048785 kegg.ttl 
     149   61208 pubmed.ttl 
     150  831951 struct_title.ttl 
     151   57943 taxonomy.ttl 
     152   67286 uniprot.ttl 
     153}}} 
     154 
     155== PDBML2RDF == 
     156 * The XSL stylesheet for converting PDBML Schema (pdbx-v32.xsd) to an OWL ontology is completed (pdbx2owl.xsl). 
     157 * The XSL stylesheet for converting PDBML Schema (pdbx-v32.xsd) to the XSL stylesheet that converts PDBML files to RDF files is completed (pdbx2pdbml2rdf.xsl). 
     158  * This converter generator also make internal cross-references within each PDB entry. However, there are a number of errors in the definition of cross-references in the PDBML Schema (using xsd:key and xsd:keyref), thus, the resulting cross-references are significantly flawed. 
     159Example of using the stylesheets 
     160{{{ 
     161# creating OWL ontology 
     162% xsltproc pdbx2owl.xsl pdbx-v32.xsd > pdbx-v32.owl 
     163 
     164# creating PDBML-> RDF converter 
     165% xsltproc pdbx2pdbml2rdf.xsl pdbx-v32.xsd > PDBML2rdf.xsl 
     166 
     167# converting a PDBML file to RDF. 
     168% xsltproc PDBML2rdf.xsl 1a00-noatom.xml > 1a00-noatom.rdf 
     169}}}