Network-Extracted Ontologies Organize Knowledge from Genomic Data
By BiotechDaily International staff writers
Posted on 24 Dec 2012
Converting huge amounts of genomic information into meaningful data about cellular processes is one of the biggest challenges of bioinformatics, and has great implications for the fields of human biology and medicine. Scientists have now devised new technology that generates a computational model of the cell from vast networks of gene and protein interactions, learning how genes and proteins connect to form higher-level cellular processes.
The study’s findings were published in the December 16, 2012, advance online publication of the journal Nature Biotechnology. “Our method creates ontology, or a specification of all the major players in the cell and the relationships between them,” said first author Janusz Dutkowski, PhD, postdoctoral researcher in the University of California (UC), San Diego School of Medicine (USA). It utilizes knowledge about how genes and proteins interact with each other and automatically organizes these data to form a comprehensive catalog of gene functions, cellular components, and mechanisms.
“What’s new about our ontology is that it is created automatically from large datasets. In this way, we see not only what is already known, but also potentially new biological components and processes--the bases for new hypotheses,” said Dr. Dutkowski.
Originally designed by philosophers attempting to clarify the nature of life, ontologies are now widely used to compress everything known about a subject in a hierarchy of terms and relationships. Intelligent information systems, such as iPhone’s (developed by Apple, Inc. Cupertino, CA, USA) Siri, are constructed on ontologies to enable reasoning about real life. Ontologies are also used by scientists to structure knowledge about topics such as bioactive compounds, taxonomy, anatomy and development, disease, and clinical diagnosis.
A gene ontology (GO) exists as well, constructed over the 10 years through a joint effort of hundreds of scientists. It is considered the gold standard for determining cell structure and gene function, containing 34,765 terms and 64,635 hierarchical relations annotating genes from more than 80 species.
“GO is very influential in biology and bioinformatics, but it is also incomplete and hard to update based on new data,” said senior author Trey Ideker, PhD, chief of the division of genetics in the School of Medicine and professor of bioengineering in UC San Diego’s Jacobs School of Engineering.
“This is expert knowledge based upon the work of many people over many, many years,” said Dr. Ideker, who is also lead investigator of the National Resource for Network Biology, based at UC San Diego. “A fundamental problem is consistency. People do things in different ways, and that impacts what findings are incorporated into GO and how they relate to other findings. The approach we have proposed is a more objective way to determine what's known and uncover what’s new.”
Drs. Dutkowski, Ideker, and colleagues, in their report, exploited the surging capacity and utility of new technologies such as high-throughput assays and bioinformatics to create elaborately detailed datasets describing complex biologic networks. To evaluate this application, the scientists gathered multiple such datasets, applied their technique, and then compared the resulting “network-extracted ontology” to the existing GO. They discovered that their ontology captured most of the known cellular components, in addition to many more terms and relationships, which then triggered updates of the existing GO.
Neither Dr. Ideker nor Dr. Dutkowski say the new approach is intended to replace the current GO. Instead, they foresee it as adjunct high-tech model that identifies both known and uncharacterized biologic components stemming directly from data, something the current GO does not do well. Furthermore, they reported that a network-extracted ontology can be constantly updated and modified with every new dataset, placing scientists closer to the complete model of the cell.
University of California, San Diego School of Medicine