Bioinformatics -
9 hours and 43 minutes ago
Publication Date: 2008 Dec 2 PMID: 19050035br/Authors: Wabnik, K. - Hvidsten, T. R. - Kedzierska,
A. - Van Leene, J. - De Jaeger, G. - Beemster, G. T. - Komorowski, J. - Kuiper, M. T.br/Journal:
Bioinformaticsbr/br/MOTIVATION: Genome-scale 'omics' data constitutes a potentially rich source of
information about biological systems and their function. There is a plethora of tools and methods
available to mine omics data. However, the diversity and complexity of different omics data types
is a stumbling block for multi-data integration, hence there is a dire need for additional methods
to exploit potential synergy from integrated orthogonal data. Rough Sets provide an efficient means
to use complex information in classification approaches. Here, we set out to explore the
possibilities of Rough Sets to incorporate diverse information sources in a functional
classification of unknown genes. RESULTS: We explored the use of Rough Sets for a novel data
integration strategy where gene expression data, protein features, and GO annotations were combined
to describe general and biologically relevant patterns represented by If-Then rules. The
descriptive rules were used to predict the function of unknown genes in Arabidopsis thaliana and
Schizosaccharomyces pombe. The If-Then rule models showed success rates of up to 0.89
(discriminative and predictive power for both modeled organisms) whereas models built solely of one
data type (protein features or gene expression data) yielded success rates varying from 0.68 to
0.78. Our models were applied to generate classifications for many unknown genes, of which a
sizeable number were confirmed either by PubMed literature reports or electronically interfered
annotations. Finally, we studied cell cycle protein-protein interactions derived from both tandem
affinity purification (TAP) experiments and in silico experiments in the BioGRID interactome
database and found strong experimental evidence for the predictions generated by our models. The
results show that our approach can be used to build very robust models that create synergy from
integrating gene expression data and protein fea-tures. AVAILABILITY: The Rough Set-based method is
implemented in the Rosetta toolkit kernel version 1.0.1 available at: http://rosetta.lcb.uu.se/
CONTACT: kuiper@nt.ntnu.no; krwab@psb.ugent.be SUPPLEMENTARY INFORMATION: Supplementary data are
available at Bioinformatics online.br/br/post to: a href =
http://www.citeulike.org/posturl?url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fentrez%2Fquery.fcgi%3Fcmd%3DRetrieve%26db%3DPubMed%26dopt%3DAbstract%26list_uids%3D19050035title=Entrez+PubmedCiteULike/a

|