[[Image(Photo 1.jpg,200px)]] = Transforming the data for Sequence Read Archive (SRA) to RDF = Just a proof of concept, We choose to use the XML data from the Sequence Reads Archive I've first tried to use '''XSLT''' to transform the data but it took to much time to analyse the '''XSD schemas''' for SRA and make the stylesheets so I wrote this short Java program that loads the DOM and export the RDF to stdout. I pasted the sources ( sorry quick'n stupid): https://gist.github.com/67bb728957abb16a680b for example: '''SRA010050.run.xml''' looks like this: {{{ quality_book_char @ quality_scoring_system log odds (...) }}} And here is the RDF version. Here I used some simple ''urn'' as the URIs (parsed successfully with the W3C validator) ...: {{{ SRR029634 GSM424847_1 Illumina Genome Analyzer unspecified 1 SRX012521 root_control_1 DM1.fastq quality_book_char @ quality_scoring_system log odds (...) }}} = Using XSLT = The '''XSLT''' transformations are a valuable way to transform any XML source to RDF. For example, have a look at those two posts (''warning/self promotion ! '') where a set of stylesheets was used to extract some RDF from different sources of XML data: * http://plindenbaum.blogspot.com/2010/02/linkedinxslt-foaf-people-from.html * http://plindenbaum.blogspot.com/2010/02/searching-for-genotypes-with-sparql.html = Links = * SRA http://www.ncbi.nlm.nih.gov/sra * the XSD files for SRA: http://www.ncbi.nlm.nih.gov/viewvc/v1/trunk/sra/doc/SRA * XML files for DRA000039 ftp://ftp.ncbi.nih.gov/sra/Submissions/DRA000/DRA000039/