The State of RDF support in Perl

Experimenting

tkappler created a git repository  http://github.com/thomas11/perl-rdf-experiments to experiment with RDF in Perl, see what existing modules can do, etc. Biohackathoners, please fork and contribute!

Existing modules

Also see  http://www.perlrdf.org!

RDF::Query

Generic query interface working with several RDF modules. Powerful but complex. Am working with the author, Greg Williams of RDF::Trine, on some docs/tutorials.

It can talk to a Sparql endpoint via the HTTP Sparql protocol.

RDF::Trine

 RDF::Trine is a complete RDF package written in Perl. It is the only one that has parsers for RDF serializations other than RDF/XML (including JSON), and that has a SPARQL wrapper. It implements the  SPARQL protocol and can thus talk to any SPARQL endpoint.

Last release: 0.117, 2010-02-04.

tkappler is in touch with the author, Gregory Williams, and it's a pleasure to work with him. Some patches contributed. Here's some of his advice that I intend to include in some more in-depth, step-by-step writeups.

In general you should be using RDF::Query for retrieving patterns that are more complex than a single triple pattern. get_pattern exists mostly for RDF::Query to use when the underlying store is expected to be able to execute a complex join query more efficiently than the perl implementation (for example, the DBI-based storage backend). It's never been a part of the code that has felt very stable, so I'd suggest always using the RDF::Query interface for situations where get_statements doesn't do as much as you need.

yy's serialization notes: The output (RDF/XML) has some redundancies. The module groups statements that share the subject, and embraces them by "<rdf:Description>" tags. It's OK, but declaration of user-defined namespaces is not at the first "<rdf:RDF>" tag, but at each rdf-description tag. This will probably be configurable from the next release, though. Also, the nesting level is one, and you cannot define a QNAME of a name space as "foaf" or "rdfs", but the module defines arbitrarily. Even so, it's more compact than that of RDF::Redland.

RDF::TrineShortcuts?

 A toolkit for RDF::Trine that implements common tasks. Makes a lot of my hackathon work redundant, but I discovered it only afterward...

It comes with two command line tools: trapper to read and write RDF and count triples (-c), and toquet to send a Sparql query to an endpoint over HTTP.

RDF::Redland

 RDF::Redland is a wrapper for the [Redland C library]( http://librdf.org/). Looks pretty complete.

I couldn't build it as I have the new version 0.9.17 of librasqal, the query library for Redland, which is API incompatible with its predecessor on which RDF::Redland apparently depends.

Another hint for building it is that the test script assumes that libraries generated by swig are in the shared library paths. After taking care of it by adding the following paths to LD_LIBRARY_PAT, no problem occurred for yy:

Redland-1.0.5.4/redland/rasqal/src/.libs
Redland-1.0.5.4/redland/raptor/src/.libs

Test::RDF

 Test::RDF supports checking for data validity, and comparing two graphs for equivalence. It does not explain the differences when they are not equal, however. Builds on RDF::Redland.

RDF::Core

Another pure Perl RDF framework.  CPAN. Last release: 0.51, 2007-02-19, which probably means that it's not very much used or supported. Use RDF::Trine unless you have a good reason not to.

Notes:

  • A pretty complete RDF package written in Perl, including parser and serializer, model with its own query language, and storage with either Berkeley DB, in-memory, or PostgreSQL as backend.
  • It's unfortunate that it has its own query language, should be SPARQL.
  • RDF/XML serialization and parsing only.
  • Has a Schema module to work with RDFS, didn't try it.
  • Greg Williams of RDF::Trine, the other, more complete and up-to-date RDF framework, tried to contribute to RDF::Core but was " met with resistance, rejection, or frustratingly long delays". Not good.

yy's serialization notes: RDF::Core is a bit old and cannot handle a blank node (rdf:nodeID), but the serialized output (RDF/XML) is better organized than the others. You cannot define a QNAME of a name space.

RDF::Simple

 RDF::Simple (0.415) - very basic.

yy's serialization notes: RDF::Simple is literally "simple", so IMO it is useful only for understanding how serialization works, but not for a practical use. As the module page says, it doesn't care the node type. You cannot declare whether a node is a URI or literal. On the other hand, a serialized output (RDF/XML) is better organized (more nested and size is smaller) than RDF::Trine and RDF::Redland.

Onto-Perl

 ONTO-PERL by fellow Biohackathoner Erick Antezana can translate between OBO, OBO-in-OWL, and RDF (among other things).

Wrappers

Future Work

A lot. Add if you think of something.

UniProt? has a very simple RDF reader that only works with UniProt? RDF. It's still useful as it's very easy to use, tkappler is working on packaging it up.