Literature mining for the discovery of hidden connections between drugs, genes and diseases

Raoul Frijters, Marianne van Vugt, Ruben Smeets, René van Schaik, Jacob de Vlieg, Wynand Alkema

Research output: Contribution to journalArticleAcademicpeer-review


The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also wellsuited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs. © 2010 Frijters et al.
Original languageEnglish
Article number9
JournalPLoS Computational Biology
Issue number9
Publication statusPublished - 23 Sept 2010


  • computational biology/methods
  • data mining/methods
  • diseases
  • genes
  • humans
  • leukocytes, mononuclear/physiology
  • metabolic networks and pathways/genetics
  • pharmaceutical preparations
  • signal transduction
  • software


Dive into the research topics of 'Literature mining for the discovery of hidden connections between drugs, genes and diseases'. Together they form a unique fingerprint.

Cite this