| 3 | | BioMart RDF-Integration via SPARQL |
| | 3 | [[PageOutline]] |
| | 4 | = URL = |
| | 5 | * BioMart http://www.biomart.org |
| | 6 | * ICGC Data Portal http://dcc.icgc.org |
| | 7 | * |
| | 8 | |
| | 9 | == BioMart RDF integration via SPARQL == |
| | 10 | tore.eriksson has made a tentative XSL stylesheet to convert PDBMLplus (some selected elements) into RDF. |
| | 11 | (but when I checked the output RDF with raptor converter (rapper), it had some errors...) |
| | 12 | |
| | 13 | While I (akinjo) was in Shinkansen from Tokyo to Osaka, I wrote an XSL stylesheet that convert the whole PDBML file |
| | 14 | into RDF (files attached). I noticed one good thing about PDBML. |
| | 15 | * PDBML is based on mmCIF (PDB's original format) |
| | 16 | * mmCIF is actually defined as an ontology. |
| | 17 | * So, we can use mmCIF categories and items as predicates. |
| | 18 | * An xpath REST interface for PDBMLplus is available at pdbj: e.g., http://service.pdbj.org/mine/xpath/1a00/PDBx:datablock/PDBx:entityCategory |
| | 19 | * Thus, we can use xpaths as subjects and objects in RDF. |
| | 20 | |
| | 21 | Some examples of the triples are: |
| | 22 | {{{ |
| | 23 | <http://service.pdbj.org/mine/xpath/1A00> <http://www.w3.org/2000/01/rdf-schema#label> "1A00" . |
| | 24 | <http://service.pdbj.org/mine/xpath/1A00/PDBx:datablock/PDBx:entityCategory/PDBx:entity[1]> <http://mmcif.pdbj.org/XML/pdbmlplus/pdbMLplus_v32.xsd/_entity.pdbx_description> "HEMOGLOBIN (ALPHA CHAIN)" . |
| | 25 | <http://service.pdbj.org/mine/xpath/1A00/PDBx:datablock/PDBx:entityCategory> <http://mmcif.pdbj.org/XML/pdbmlplus/pdbMLplus_v32.xsd/entity> <http://service.pdbj.org/mine/xpath/1A00/PDBx:datablock/PDBx:entityCategory/PDBx:entity[4]> . |
| | 26 | }}} |
| | 27 | (Predicate URI's are not valid at present.) |
| | 28 | |
| | 29 | === To do === |
| | 30 | * Currently, PDBML files converted by using PDBMLplus2rdf.xsl and PDBML2rdf.xsl do not contain any links to other databases. For that we need to write other XSL stylesheets. |
| | 31 | * There are also cross references within PDB, but these are not handled yet. To do so requires some analysis of the PDBML schema. |
| | 32 | |
| | 33 | == 2010-02-15: PDBML schema to OWL == |
| | 34 | I succeeded converting PDBML schema into OWL/RDF using XSLT. The resulting OWL file was validated as OWL/Full-compatible by !WonderWeb OWL Ontology validator |
| | 35 | ( http://www.mygrid.org.uk/OWL/Validator )! |
| | 36 | |
| | 37 | === To do === |
| | 38 | * Writing a XSL stylesheet that write another XSL stylesheet for converting PDBML files into RDF. |
| | 39 | That is, |
| | 40 | {{{ |
| | 41 | PDBML Schema (pdbx-v32.xsd) --(pdbx2pdbml2rdf.xsl)--> XSL Stylesheet (pdbml2rdf.xsl) |
| | 42 | PDBML file --(pdbml2rdf.xsl)--> PDBML/RDF |
| | 43 | }}} |
| | 44 | |
| | 45 | One big advantage of translating PDBML schema is that it contains cross-references to many items within a PDBML file. |
| | 46 | = DDBJ things = |
| | 47 | * http://xml.nig.ac.jp/rest/Invoke?service=DDBJ&method=getXMLEntry&accession=<ACCESSION> |
| | 48 | e.g. http://xml.nig.ac.jp/rest/Invoke?service=DDBJ&method=getXMLEntry&accession=AL121903 |
| | 49 | * URL which returns prototype RDF |
| | 50 | * http://sabi.ddbj.nig.ac.jp/ddbj/data/<ACCESSION> |
| | 51 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/data/Z48241 |
| | 52 | * URL which returns in flatfile format |
| | 53 | * http://sabi.ddbj.nig.ac.jp/ddbj/<ACCESSION> |
| | 54 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/Z48241 |
| | 55 | * URL which redirects HTML page |
| | 56 | * http://sabi.ddbj.nig.ac.jp/ddbj/html/<ACCESSION> |
| | 57 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/html/Z48241 |
| | 58 | |
| | 59 | = KEGG things = |
| | 60 | * Draft KEGG RDF download site (temporal) : http://www.hgc.jp/~shuichi/biohack2010/ |
| | 61 | |
| | 62 | * Note: I wouldn't recommend to display the following files in your web browsers because it's large text file. |
| | 63 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2pdb.ttl (KEGG GENES2PDB / PDB2KEGG GENES turtle: 730,602 triples) |
| | 64 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2kegg-ko.ttl (KEGG GENES2KO / KEGG KO2GENES turtle: 3,687,074 triples) |
| | 65 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-ko2kegg-pathway.ttl (KEGG KO2PATHWAY / KEGG PATHWAY2KO turtle: 22,774 triples) |
| | 66 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2kegg-ko.ttl (KEGG GENES2NCBI GENE-ID / NCBI GENE-ID2KEGG GENES turtle: 3,687,074 triples) |
| | 67 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-ko2definition.ttl (KEGG KO2KO definition turtle: 13,211 triples) |
| | 68 | * Total 14,391,245 triples |
| | 69 | |
| | 70 | = Reflect for pubmed = |
| | 71 | To use reflect on pubmed: |
| | 72 | http://reflect.cbs.dtu.dk/TEST/GetEntities?uri=http://www.ncbi.nlm.nih.gov/pubmed/20146332&entity_types=9606 |
| | 73 | |
| | 74 | The result will contain XML code like seen at |
| | 75 | [http://reflect.cbs.dtu.dk/restAPI.html http://reflect.cbs.dtu.dk/restAPI.html] |
| | 76 | |
| | 77 | |
| | 78 | = SPARQL endpoint = |
| | 79 | |
| | 80 | Room 415 network |
| | 81 | * Bio2RDF KEGG - http://192.168.11.61:8890/sparql/ |
| | 82 | * Bio2RDF PDB - http://192.168.11.61:8891/sparql/ |
| | 83 | * DDBJ+KEGG-PDBj - http://192.168.11.61:8892/sparql/ |
| | 84 | * PDBj - |
| | 85 | * KEGG - |
| | 86 | * DDBJ - |
| | 87 | |
| | 88 | Facet |
| | 89 | * Bio2RDF KEGG - http://192.168.11.61:8890/fct/ |
| | 90 | * Bio2RDF PDB - http://192.168.11.61:8891/fct/ |
| | 91 | * DDBJ-KEGG-PDBj - http://192.168.11.61:8892/fct/ |
| | 92 | * PDBj - |
| | 93 | * KEGG - |
| | 94 | * DDBJ - |
| | 95 | |
| | 96 | = Validating RDF/XML format = |
| | 97 | * http://librdf.org/parse |
| | 98 | |
| | 99 | = How to load data to virtuoso = |
| | 100 | First, in the '''virtuoso.ini''' file, set the following parameter |
| | 101 | {{{ |
| | 102 | DirsAllowed = ., /usr/local/virtuoso-opensource/share/virtuoso/vad, /tmp |
| | 103 | }}} |
| | 104 | So the directory /tmp is allowed to have data to be loaded. |
| | 105 | |
| | 106 | Then put the data file in /tmp (e.g., all.ttl, ddbj.rdf). |
| | 107 | |
| | 108 | {{{ |
| | 109 | % cat load.isql |
| | 110 | DB.DBA.TTLP_MT(file_to_string_output('/tmp/all.ttl'), '' ,'http://www.pdbj.org'); |
| | 111 | checkpoint; |
| | 112 | |
| | 113 | DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/tmp/lala.rdf'), '' ,'http://www.pdbj.org'); |
| | 114 | checkpoint; |
| | 115 | |
| | 116 | % isql 1111 dba dba < load.isql |
| | 117 | }}} |
| | 118 | |
| | 119 | Here the third argument for the functions '''TTLP_MT''' and '''RDF_LOAD_RDFXML''' is the name of the graph |
| | 120 | (in this case, it's '''http://www.pdbj.org'''). |
| | 121 | |
| | 122 | = Results? = |
| | 123 | [[wiki:DDBJ-KEGG-PDBj/Results]] |
| | 124 | |
| | 125 | Developed the following on-the-fly DDBJ interface of RDF, Web API and HTML page |
| | 126 | * URL which returns prototype RDF |
| | 127 | * http://sabi.ddbj.nig.ac.jp/ddbj/data/<ACCESSION> |
| | 128 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/data/Z48241 |
| | 129 | * URL which returns in flatfile format (URI?) |
| | 130 | * http://sabi.ddbj.nig.ac.jp/ddbj/<ACCESSION> |
| | 131 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/Z48241 |
| | 132 | * URL which redirects HTML page |
| | 133 | * http://sabi.ddbj.nig.ac.jp/ddbj/html/<ACCESSION> |
| | 134 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/html/Z48241 |
| | 135 | |
| | 136 | Installed the following virtuoso at DDBJ site |
| | 137 | * http://sabi.ddbj.nig.ac.jp:8080/sparql |
| | 138 | |
| | 139 | FAQ: How many triples ? |
| | 140 | {{{ |
| | 141 | mnmq:pdbj bh10$ wc -l *.ttl |
| | 142 | 1018388 all.ttl |
| | 143 | 25991 ddbj.ttl |
| | 144 | 730602 kegg-genes2pdb.ttl |
| | 145 | 18988 kegg-hsa2kegg-ko.ttl |
| | 146 | 51438 kegg-hsa2ncbi-gene_id.ttl |
| | 147 | 22774 kegg-ko2kegg-pathway.ttl |
| | 148 | 15048785 kegg.ttl |
| | 149 | 61208 pubmed.ttl |
| | 150 | 831951 struct_title.ttl |
| | 151 | 57943 taxonomy.ttl |
| | 152 | 67286 uniprot.ttl |
| | 153 | }}} |
| | 154 | |
| | 155 | == PDBML2RDF == |
| | 156 | * The XSL stylesheet for converting PDBML Schema (pdbx-v32.xsd) to an OWL ontology is completed (pdbx2owl.xsl). |
| | 157 | * The XSL stylesheet for converting PDBML Schema (pdbx-v32.xsd) to the XSL stylesheet that converts PDBML files to RDF files is completed (pdbx2pdbml2rdf.xsl). |
| | 158 | * This converter generator also make internal cross-references within each PDB entry. However, there are a number of errors in the definition of cross-references in the PDBML Schema (using xsd:key and xsd:keyref), thus, the resulting cross-references are significantly flawed. |
| | 159 | Example of using the stylesheets |
| | 160 | {{{ |
| | 161 | # creating OWL ontology |
| | 162 | % xsltproc pdbx2owl.xsl pdbx-v32.xsd > pdbx-v32.owl |
| | 163 | |
| | 164 | # creating PDBML-> RDF converter |
| | 165 | % xsltproc pdbx2pdbml2rdf.xsl pdbx-v32.xsd > PDBML2rdf.xsl |
| | 166 | |
| | 167 | # converting a PDBML file to RDF. |
| | 168 | % xsltproc PDBML2rdf.xsl 1a00-noatom.xml > 1a00-noatom.rdf |
| | 169 | }}} |