Version 43 (modified by akinjo, 15 years ago) |
---|
URL
- PDBj http://www.pdbj.org/PDBID (PDBID = 1gof, 1a00, ...) This is a fake URL.
- KEGG http://togows.dbcls.jp/entry/kegg-DBNAME/ENTRYID
- DDBJ http://togows.dbcls.jp/entry/ddbj/ENTRYID
- UniProt http://www.uniprot.org/uniprot/ENTRYID
- PubMed http://www.ncbi.nlm.nih.gov/pubmed/ENTRYID
- Taxonomy http://www.ncbi.nlm.nih.gov/Taxonomy/TAXID (this is invalid URL)
- HapMap? http://www.khapmap.org/ENTRYID (this is invalid URL)
- gene_id http://www.ncbi.nlm.nih.gov/gene/ENTRYID (NCBI gene ID)
PDB things
- ftp://ftp.protein.osaka-u.ac.jp/pub/pdb/data/structures/all/XML-noatom/{PDBID}-noatom.xml.gz
- /PDBx:datablock/PDBx:entity_src_genCategory/PDBx:entity_src_gen/PDBx:pdbx_gene_src_ncbi_taxonomy_id
where {PDBID} should be something like "1a00" (in lowercase).
Converting PDBML to RDF
tore.eriksson has made a tentative XSL stylesheet to convert PDBMLplus (some selected elements) into RDF. (but when I checked the output RDF with raptor converter (rapper), it had some errors...)
While I (akinjo) was in Shinkansen from Tokyo to Osaka, I wrote an XSL stylesheet that convert the whole PDBML file into RDF (files attached). I noticed one good thing about PDBML.
- PDBML is based on mmCIF (PDB's original format)
- mmCIF is actually defined as an ontology.
- So, we can use mmCIF categories and items as predicates.
- An xpath REST interface for PDBMLplus is available at pdbj: e.g., http://service.pdbj.org/mine/xpath/1a00/datablock/entityCategory/entity
- Thus, we can use xpaths as subjects and objects in RDF.
DDBJ things
e.g. http://xml.nig.ac.jp/rest/Invoke?service=DDBJ&method=getXMLEntry&accession=AL121903
- URL which returns prototype RDF
- http://sabi.ddbj.nig.ac.jp/ddbj/data/<ACCESSION>
- URL which returns in flatfile format
- http://sabi.ddbj.nig.ac.jp/ddbj/<ACCESSION>
- URL which redirects HTML page
- http://sabi.ddbj.nig.ac.jp/ddbj/html/<ACCESSION>
KEGG things
- Draft KEGG RDF download site (temporal) : http://www.hgc.jp/~shuichi/biohack2010/
- Note: I wouldn't recommend to display the following files in your web browsers because it's large text file.
- http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2pdb.ttl (KEGG GENES2PDB / PDB2KEGG GENES turtle: 730,602 triples)
- http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2kegg-ko.ttl (KEGG GENES2KO / KEGG KO2GENES turtle: 3,687,074 triples)
- http://www.hgc.jp/~shuichi/biohack2010/kegg-ko2kegg-pathway.ttl (KEGG KO2PATHWAY / KEGG PATHWAY2KO turtle: 22,774 triples)
- http://www.hgc.jp/~shuichi/biohack2010/kegg-genes2kegg-ko.ttl (KEGG GENES2NCBI GENE-ID / NCBI GENE-ID2KEGG GENES turtle: 3,687,074 triples)
- http://www.hgc.jp/~shuichi/biohack2010/kegg-ko2definition.ttl (KEGG KO2KO definition turtle: 13,211 triples)
- Total 14,391,245 triples
Reflect for pubmed
To use reflect on pubmed: http://reflect.cbs.dtu.dk/TEST/GetEntities?uri=http://www.ncbi.nlm.nih.gov/pubmed/20146332&entity_types=9606
The result will contain XML code like seen at http://reflect.cbs.dtu.dk/restAPI.html
SPARQL endpoint
Room 415 network
- Bio2RDF KEGG - http://192.168.11.61:8890/sparql/
- Bio2RDF PDB - http://192.168.11.61:8891/sparql/
- DDBJ+KEGG-PDBj - http://192.168.11.61:8892/sparql/
- PDBj -
- KEGG -
- DDBJ -
Facet
- Bio2RDF KEGG - http://192.168.11.61:8890/fct/
- Bio2RDF PDB - http://192.168.11.61:8891/fct/
- DDBJ-KEGG-PDBj - http://192.168.11.61:8892/fct/
- PDBj -
- KEGG -
- DDBJ -
Validating RDF/XML format
How to load data to virtuoso
First, in the virtuoso.ini file, set the following parameter
DirsAllowed = ., /usr/local/virtuoso-opensource/share/virtuoso/vad, /tmp
So the directory /tmp is allowed to have data to be loaded.
Then put the data file in /tmp (e.g., all.ttl, ddbj.rdf).
% cat load.isql DB.DBA.TTLP_MT(file_to_string_output('/tmp/all.ttl'), '' ,'http://www.pdbj.org'); checkpoint; DB.DBA.RDF_LOAD_RDFXML(file_to_string_output('/tmp/lala.rdf'), '' ,'http://www.pdbj.org'); checkpoint; % isql 1111 dba dba < load.isql
Here the third argument for the functions TTLP_MT and RDF_LOAD_RDFXML is the name of the graph (in this case, it's http://www.pdbj.org).
Results?
Developed the following on-the-fly DDBJ interface of RDF, Web API and HTML page
- URL which returns prototype RDF
- http://sabi.ddbj.nig.ac.jp/ddbj/data/<ACCESSION>
- URL which returns in flatfile format (URI?)
- http://sabi.ddbj.nig.ac.jp/ddbj/<ACCESSION>
- URL which redirects HTML page
- http://sabi.ddbj.nig.ac.jp/ddbj/html/<ACCESSION>
Installed the following virtuoso at DDBJ site
Attachments
-
taxonomy.dat
(0.6 MB) - added by akinjo
15 years ago.
PDB2Tax_id mapping
-
pubmed.ttl.gz
(391.9 KB) - added by akinjo
15 years ago.
PDB2PubMed turtle
-
pdb2rdf.2.xsl
(5.2 KB) - added by tore.eriksson
15 years ago.
PDB2rdf stylesheet v 0.3 - working prototype
-
pdb2rdf.xsl
(5.6 KB) - added by tore.eriksson
15 years ago.
PDB2rdf sstylesheet v 0.4 - working prototype
-
pdb2rdf.xml
(3.4 KB) - added by tore.eriksson
15 years ago.
PDB2rdf output example
- ddbj_rdf_sample.zip (8.3 KB) - added by yshigemo 15 years ago.
- ddbj_rdf_sample2.zip (201.6 KB) - added by yshigemo 15 years ago.
-
1a00-noatom.xml
(0.6 MB) - added by akinjo
15 years ago.
Example PDBML (noatom) file
-
pdbx-v32.xsd
(3.2 MB) - added by akinjo
15 years ago.
PDBML XML Schema
-
pdbx-v32.owl
(5.3 MB) - added by akinjo
15 years ago.
PDB OWL ontology translated from PDBML schema
-
PDBML2rdf.xsl
(477.0 KB) - added by akinjo
15 years ago.
PDBML -> RDF converter (requires XSLT 2.0)
-
1a00-noatom.rdf
(2.5 MB) - added by akinjo
15 years ago.
An example of RDF generated by PDBML2rdf.xsl
-
pdbx2owl.xsl
(12.6 KB) - added by akinjo
15 years ago.
PDBML schema -> OWL ontology converter (XSL stylesheet)