Version 15 (modified by gordonp, 14 years ago)




(Please add your name!)

  • Kazuharu Arakawa
  • Mark Wilkinson
  • Francois Belleau
  • Arek Kasprzyk
  • Paul Gordon
  • Yasunori Yamamoto
  • Akira R. KINJO
  • Gos Micklem
  • Shuichi Kawashima
  • Erick Antezana (dropping by ...)


Lots of use cases in data and service integrations are available in websites such as  myExperiment and  FlyMine. Using semantic web, any of the existing questions and use cases can be more efficient.

So one use case/demonstration would be to compare how data can be published with existing & SW technology.

Cases where semantic web is more fruitful

  • questions that queries over separated DBs
    • using in-house data -> NGS
    • can be in different sources - less cost of integration
    • small DBs (as opposed to NCBI, EBI, and KEGG)
  • querying the predicates
    • discovery of DBs
    • hypothesis generation
  • interpreted knowledge
    • not numerical?
    • not too raw?
    • linked by predicates (even if numerical)
  • data are linked
  • users do not have to know the entire schema

possible use cases?

  • migrate SQL to simple SPARQL endpoints
    • easy to convert 20 lines?
    • convert PDB relational to Sparql (kinjo-san, Francois)
    • miRbase (raoul)
  • formulation of SPARQL queries -> RDFscape
    • can be GUI.
  • Paul (biological hypothesis formulation)
    • predict PPI using homologs (PPI in yeast -> homolog in human) see
    • workflow overview: could use homolog, protein domain and protein-protein interaction resources (not all currently in one data warehouse so Sem Web could help)

  • The query in Prolog would be something like:

probableNovelInteractors(HumanGene?1, HumanGene?2) :- hasHomolog(HumanGene?1, YeastGene?1), hasInteractorFromDB(YeastGene?1, YeastGene?2), hasHomolog(YeastGene?2, HumanGene?2), hasHomolog(HumanGene?2, OtherModelOrganismGene?2), hasHomolog(YeastGene?2, OtherModelOrganismGene?2), hasProteinDomain(HumanGene?1, TargetDomain?), hasProteinDomain(OtherModelOrganismGene?1, TargetDomain?),

hasInteractorFromDB(OtherModelOrganismGene?1, OtherModelOrganismGene?2)), \+ hasInteractorFromDB(HumanGene?1, HumanGene?2).

probableNovelInteractors(ing1, CandidateInteractor?).

CandidateInteractor? = p38MAPK ? ; CandidateInteractor? = MEKK4 ? ;

  • Note the negation at the end of the logic, we only want to list those HumanGene?2 possibilities that are not already known from the database.
  • Now we just need the predicates to exists in various SPARQL endpoints to give use the RDF Facts with predicates corresponding to hasHomolog, hasInteractorFromDB, and hasProteinDomain. Then we will use SHARE/SADI to create queries in SPARQL than span multiple services. :-)