Version 33 (modified by cmzmasek, 14 years ago)

inferred domain combinations



(Please add your name!)

  • Kazuharu Arakawa
  • Mark Wilkinson
  • Francois Belleau
  • Arek Kasprzyk
  • Paul Gordon
  • Yasunori Yamamoto
  • Akira R. KINJO
  • Gos Micklem
  • Shuichi Kawashima
  • Erick Antezana (dropping by ...)
  • Raoul Bonnal
  • Christian Zmasek


Lots of use cases in data and service integrations are available in websites such as  myExperiment and  FlyMine. Using semantic web, any of the existing questions and use cases can be more efficient.

So one use case/demonstration would be to compare how data can be published with existing & SW technology.

Cases where semantic web is more fruitful

  • questions that queries over separated DBs
    • using in-house data -> NGS
    • can be in different sources - less cost of integration
    • small DBs (as opposed to NCBI, EBI, and KEGG)
  • querying the predicates
  • interpreted knowledge
    • not numerical?
    • not too raw?
    • linked by predicates (even if numerical)
  • data are linked
  • users do not have to know the entire schema

possible use cases?

  • migrate SQL to simple SPARQL endpoints
    • easy to convert 20 lines?
    • convert PDB relational to Sparql (kinjo-san, Francois)
    • miRbase (raoul)
  • formulation of SPARQL queries -> RDFscape
    • can be GUI.
  • Taxonomy/Systematics (cmzmasek)
    • Biodiversity informatics
    • Metagenomics
      •  microbiome projects (e.g. human gut and skin microbiomes)
      • global ocean sampling expedition ( GOS)
  • Domain combination database and queries (cmzmasek)
    • replacing/enhancing systems like  CADO
    • answering questions like (by example):
      • which species contain NACHT-TIR combinations?
      • what domains does NACHT combine with?
      • is domain/domain-combination expansion present? If so, species specific, lineage specific, or universal?
      • what are the likely (inferred!) ancestral domain combinations, e.g. of the last eukaryotic common ancestor (LECA)?
      • etc.
  • Paul (biological hypothesis formulation)
    • predict PPI using homologs (PPI in yeast -> homolog in human) see
    • workflow overview: could use homolog, protein domain and protein-protein interaction resources (not all currently in one data warehouse so Sem Web could help). In brackets are the values for a particular query, Inhibitor of Growth 1 (ING1).

  • The query in Prolog would be something like (simplified so it's depending only on known protein domains, whereas in the publication we built our own domain models):

probableNovelInteractors(HumanGene?1, HumanGene?2) :- hasHomolog(HumanGene?1, YeastGene?1), hasInteractorFromDB(YeastGene?1, YeastGene?2), hasHomolog(YeastGene?2, HumanGene?2), hasHomolog(HumanGene?2, OtherModelOrganismGene?2), hasHomolog(YeastGene?2, OtherModelOrganismGene?2), hasProteinDomain(HumanGene?1, TargetDomain?), hasProteinDomain(OtherModelOrganismGene?1, TargetDomain?), hasInteractorFromDB(OtherModelOrganismGene?1, OtherModelOrganismGene?2)), \+ hasInteractorFromDB(HumanGene?1, HumanGene?2).

probableNovelInteractors(ing1, CandidateInteractor?).

CandidateInteractor? = p38MAPK ? ; CandidateInteractor? = MEKK4 ? ;

  • Note the negation at the end of the logic, we only want to list those HumanGene?2 possibilities that are not already known from the database.
  • Now we just need the predicates to exists in various SPARQL endpoints to give use the RDF Facts with predicates corresponding to hasHomolog, hasInteractorFromDB, and hasProteinDomain. Then we will use SHARE/SADI to create queries in SPARQL than span multiple services. :-)