3 | | BioMart RDF-Integration via SPARQL |
| 3 | [[PageOutline]] |
| 4 | = URL = |
| 5 | * BioMart http://www.biomart.org |
| 6 | * ICGC Data Portal http://dcc.icgc.org |
| 7 | * |
| 8 | |
| 9 | == BioMart RDF integration via SPARQL == |
| 10 | tore.eriksson has made a tentative XSL stylesheet to convert PDBMLplus (some selected elements) into RDF. |
| 11 | (but when I checked the output RDF with raptor converter (rapper), it had some errors...) |
| 12 | |
| 13 | While I (akinjo) was in Shinkansen from Tokyo to Osaka, I wrote an XSL stylesheet that convert the whole PDBML file |
| 14 | into RDF (files attached). I noticed one good thing about PDBML. |
| 15 | * PDBML is based on mmCIF (PDB's original format) |
| 16 | * mmCIF is actually defined as an ontology. |
| 17 | * So, we can use mmCIF categories and items as predicates. |
| 18 | * An xpath REST interface for PDBMLplus is available at pdbj: e.g., http://service.pdbj.org/mine/xpath/1a00/PDBx:datablock/PDBx:entityCategory |
| 19 | * Thus, we can use xpaths as subjects and objects in RDF. |
| 20 | |
| 21 | Some examples of the triples are: |
| 22 | {{{ |
| 23 | <http://service.pdbj.org/mine/xpath/1A00> <http://www.w3.org/2000/01/rdf-schema#label> "1A00" . |
| 24 | <http://service.pdbj.org/mine/xpath/1A00/PDBx:datablock/PDBx:entityCategory/PDBx:entity[1]> <http://mmcif.pdbj.org/XML/pdbmlplus/pdbMLplus_v32.xsd/_entity.pdbx_description> "HEMOGLOBIN (ALPHA CHAIN)" . |
| 25 | <http://service.pdbj.org/mine/xpath/1A00/PDBx:datablock/PDBx:entityCategory> <http://mmcif.pdbj.org/XML/pdbmlplus/pdbMLplus_v32.xsd/entity> <http://service.pdbj.org/mine/xpath/1A00/PDBx:datablock/PDBx:entityCategory/PDBx:entity[4]> . |
| 26 | }}} |
| 27 | (Predicate URI's are not valid at present.) |
| 28 | |
| 29 | === To do === |
| 30 | * Currently, PDBML files converted by using PDBMLplus2rdf.xsl and PDBML2rdf.xsl do not contain any links to other databases. For that we need to write other XSL stylesheets. |
| 31 | * There are also cross references within PDB, but these are not handled yet. To do so requires some analysis of the PDBML schema. |
| 32 | |
| 33 | == 2010-02-15: PDBML schema to OWL == |
| 34 | I succeeded converting PDBML schema into OWL/RDF using XSLT. The resulting OWL file was validated as OWL/Full-compatible by !WonderWeb OWL Ontology validator |
| 35 | ( http://www.mygrid.org.uk/OWL/Validator )! |
| 36 | |
| 37 | === To do === |
| 38 | * Writing a XSL stylesheet that write another XSL stylesheet for converting PDBML files into RDF. |
| 39 | That is, |
| 40 | {{{ |
| 41 | PDBML Schema (pdbx-v32.xsd) --(pdbx2pdbml2rdf.xsl)--> XSL Stylesheet (pdbml2rdf.xsl) |
| 42 | PDBML file --(pdbml2rdf.xsl)--> PDBML/RDF |
| 43 | }}} |
| 44 | |
| 45 | One big advantage of translating PDBML schema is that it contains cross-references to many items within a PDBML file. |
| 46 | = DDBJ things = |
| 47 | * http://xml.nig.ac.jp/rest/Invoke?service=DDBJ&method=getXMLEntry&accession=<ACCESSION> |
| 48 | e.g. http://xml.nig.ac.jp/rest/Invoke?service=DDBJ&method=getXMLEntry&accession=AL121903 |
| 49 | * URL which returns prototype RDF |
| 50 | * http://sabi.ddbj.nig.ac.jp/ddbj/data/<ACCESSION> |
| 51 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/data/Z48241 |
| 52 | * URL which returns in flatfile format |
| 53 | * http://sabi.ddbj.nig.ac.jp/ddbj/<ACCESSION> |
| 54 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/Z48241 |
| 55 | * URL which redirects HTML page |
| 56 | * http://sabi.ddbj.nig.ac.jp/ddbj/html/<ACCESSION> |
| 57 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/html/Z48241 |
| 58 | |
| 59 | = KEGG things = |
| 60 | * Draft KEGG RDF download site (temporal) : http://www.hgc.jp/~shuichi/biohack2010/ |
| 61 | |
| 62 | * Note: I wouldn't recommend to display the following files in your web browsers because it's large text file. |
| 63 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2pdb.ttl (KEGG GENES2PDB / PDB2KEGG GENES turtle: 730,602 triples) |
| 64 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2kegg-ko.ttl (KEGG GENES2KO / KEGG KO2GENES turtle: 3,687,074 triples) |
| 65 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-ko2kegg-pathway.ttl (KEGG KO2PATHWAY / KEGG PATHWAY2KO turtle: 22,774 triples) |
| 66 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2kegg-ko.ttl (KEGG GENES2NCBI GENE-ID / NCBI GENE-ID2KEGG GENES turtle: 3,687,074 triples) |
| 67 | * http://www.hgc.jp/~shuichi/biohack2010/kegg-ko2definition.ttl (KEGG KO2KO definition turtle: 13,211 triples) |
| 68 | * Total 14,391,245 triples |
| 69 | |
| 70 | = Reflect for pubmed = |
| 71 | To use reflect on pubmed: |
| 72 | http://reflect.cbs.dtu.dk/TEST/GetEntities?uri=http://www.ncbi.nlm.nih.gov/pubmed/20146332&entity_types=9606 |
| 73 | |
| 74 | The result will contain XML code like seen at |
| 75 | [http://reflect.cbs.dtu.dk/restAPI.html http://reflect.cbs.dtu.dk/restAPI.html] |
| 76 | |
| 77 | |
| 78 | = SPARQL endpoint = |
| 79 | |
| 80 | Room 415 network |
| 81 | * Bio2RDF KEGG - http://192.168.11.61:8890/sparql/ |
| 82 | * Bio2RDF PDB - http://192.168.11.61:8891/sparql/ |
| 83 | * DDBJ+KEGG-PDBj - http://192.168.11.61:8892/sparql/ |
| 84 | * PDBj - |
| 85 | * KEGG - |
| 86 | * DDBJ - |
| 87 | |
| 88 | Facet |
| 89 | * Bio2RDF KEGG - http://192.168.11.61:8890/fct/ |
| 90 | * Bio2RDF PDB - http://192.168.11.61:8891/fct/ |
| 91 | * DDBJ-KEGG-PDBj - http://192.168.11.61:8892/fct/ |
| 92 | * PDBj - |
| 93 | * KEGG - |
| 94 | * DDBJ - |
| 95 | |
| 96 | = Validating RDF/XML format = |
| 97 | * http://librdf.org/parse |
| 98 | |
| 99 | = How to load data to virtuoso = |
| 100 | First, in the '''virtuoso.ini''' file, set the following parameter |
| 101 | {{{ |
| 102 | DirsAllowed = ., /usr/local/virtuoso-opensource/share/virtuoso/vad, /tmp |
| 103 | }}} |
| 104 | So the directory /tmp is allowed to have data to be loaded. |
| 105 | |
| 106 | Then put the data file in /tmp (e.g., all.ttl, ddbj.rdf). |
| 107 | |
| 108 | {{{ |
| 109 | % cat load.isql |
| 110 | DB.DBA.TTLP_MT(file_to_string_output('/tmp/all.ttl'), '' ,'http://www.pdbj.org'); |
| 111 | checkpoint; |
| 112 | |
| 113 | DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/tmp/lala.rdf'), '' ,'http://www.pdbj.org'); |
| 114 | checkpoint; |
| 115 | |
| 116 | % isql 1111 dba dba < load.isql |
| 117 | }}} |
| 118 | |
| 119 | Here the third argument for the functions '''TTLP_MT''' and '''RDF_LOAD_RDFXML''' is the name of the graph |
| 120 | (in this case, it's '''http://www.pdbj.org'''). |
| 121 | |
| 122 | = Results? = |
| 123 | [[wiki:DDBJ-KEGG-PDBj/Results]] |
| 124 | |
| 125 | Developed the following on-the-fly DDBJ interface of RDF, Web API and HTML page |
| 126 | * URL which returns prototype RDF |
| 127 | * http://sabi.ddbj.nig.ac.jp/ddbj/data/<ACCESSION> |
| 128 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/data/Z48241 |
| 129 | * URL which returns in flatfile format (URI?) |
| 130 | * http://sabi.ddbj.nig.ac.jp/ddbj/<ACCESSION> |
| 131 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/Z48241 |
| 132 | * URL which redirects HTML page |
| 133 | * http://sabi.ddbj.nig.ac.jp/ddbj/html/<ACCESSION> |
| 134 | e.g. http://sabi.ddbj.nig.ac.jp/ddbj/html/Z48241 |
| 135 | |
| 136 | Installed the following virtuoso at DDBJ site |
| 137 | * http://sabi.ddbj.nig.ac.jp:8080/sparql |
| 138 | |
| 139 | FAQ: How many triples ? |
| 140 | {{{ |
| 141 | mnmq:pdbj bh10$ wc -l *.ttl |
| 142 | 1018388 all.ttl |
| 143 | 25991 ddbj.ttl |
| 144 | 730602 kegg-genes2pdb.ttl |
| 145 | 18988 kegg-hsa2kegg-ko.ttl |
| 146 | 51438 kegg-hsa2ncbi-gene_id.ttl |
| 147 | 22774 kegg-ko2kegg-pathway.ttl |
| 148 | 15048785 kegg.ttl |
| 149 | 61208 pubmed.ttl |
| 150 | 831951 struct_title.ttl |
| 151 | 57943 taxonomy.ttl |
| 152 | 67286 uniprot.ttl |
| 153 | }}} |
| 154 | |
| 155 | == PDBML2RDF == |
| 156 | * The XSL stylesheet for converting PDBML Schema (pdbx-v32.xsd) to an OWL ontology is completed (pdbx2owl.xsl). |
| 157 | * The XSL stylesheet for converting PDBML Schema (pdbx-v32.xsd) to the XSL stylesheet that converts PDBML files to RDF files is completed (pdbx2pdbml2rdf.xsl). |
| 158 | * This converter generator also make internal cross-references within each PDB entry. However, there are a number of errors in the definition of cross-references in the PDBML Schema (using xsd:key and xsd:keyref), thus, the resulting cross-references are significantly flawed. |
| 159 | Example of using the stylesheets |
| 160 | {{{ |
| 161 | # creating OWL ontology |
| 162 | % xsltproc pdbx2owl.xsl pdbx-v32.xsd > pdbx-v32.owl |
| 163 | |
| 164 | # creating PDBML-> RDF converter |
| 165 | % xsltproc pdbx2pdbml2rdf.xsl pdbx-v32.xsd > PDBML2rdf.xsl |
| 166 | |
| 167 | # converting a PDBML file to RDF. |
| 168 | % xsltproc PDBML2rdf.xsl 1a00-noatom.xml > 1a00-noatom.rdf |
| 169 | }}} |