Big data = Big gains for cancer research
They’re scientists. They’re miners. They dig deep through seemingly endless streams of numbers and terabytes of data to discover hidden gems of understanding about how and why cancer forms.
These scientific “treasure hunters” are researchers in MD Anderson’s Bioinformatics and Computational Biology department. They have the patience and the know-how to analyze mountains of information and locate that one precious vein of data that will contribute greatly to our understanding of cancer’s inner workings. And they believe that “molecular profiling” may be the mother lode for developing new therapies.
They include Han Liang, Ph.D., who's confident the future of cancer medicine rests largely in molecular profiling. It's a technique scientists use to analyze individual patients' tumors to identify genetic characteristics and any unique biological markers. Biomarkers are molecules that can indicate the presence of a condition or disease, and are increasingly being used to measure how the body responds to therapies.
There are 3 billion DNA code letters in each human cell and 32 thousand billion cells in the body. So each person has 96 thousand billion billion DNA code letters. That’s more than 10 times as many code letters as there are grains of sand in all of the beaches on Earth. And an unlucky mutation of any one of those code letters can initiate a cancer or make it resistant to our drugs. That defines our challenge — and our opportunity.
— John Weinstein, M.D., Ph.D., chair of Bioinformatics and Computational Biology
“By analyzing data from multiple cancer types, we could evaluate prognostic models and identify gene alterations that led to tumor formation,” he says. “This wouldn’t have been obtained by looking at tumor data from just one cancer type.”
Pseudo (gene) science and superclusters
Liang also led an effort to study the quirky pseudogene — a misfit of a gene that’s generally believed to have no purpose because it’s lost its protein-coding abilities. His recent study in Nature Communications showed otherwise.
After reviewing more than 2,800 samples from patients with seven types of cancer, Liang concluded that the largely ignored pseudogenes may very well be the new targets for helping medical experts discover new biological markers. These markers will allow doctors to both personalize treatments for individual patients and better equip them to predict cancer survival rates.
Like Liang, Roeland Verhaak, Ph.D., an assistant professor with a dual appointment in the Bioinformatics and Computational Biology and Genomic Medicine departments, also conducted a large-scale study, which compared how genes are expressed in 12 tumor subtypes. His team’s work, published in Oncogene this past August, identified eight cancer “superclusters” that shared similar disease pathways and gene expression.
In his study, Verhaak identified one particularly large supercluster of cancers, all which shared common genetic mutations and expressions, DNA changes and increased cell proliferation. In this supercluster, mutations in the protein TP53 led to cancer growth by refusing to let cells die, even though they had experienced DNA damage. TP53 normally suppresses tumor growth.
Designer data analysis
Like many MD Anderson scientists engaged in unraveling the mysteries of cancer’s cellular causes, Verhaak relies on high-tech approaches to understanding the mountains of information that accrue in databases such as The Cancer Genome Atlas.
To process and analyze data on such a large scale in a systematic, automated manner, he and fellow scientists developed the Pipeline for RNA-Sequencing Data Analysis, or PRADA — an acronym suggested by Verhaak’s fashion-savvy wife.
Unlike its fashion namesake, PRADA doesn’t involve manipulation of fancy fragrances or exotic leathers. Rather, it involves a software program that provides multifaceted analysis of genetic information from gene expression levels to detect items only a scientist could identify or be passionate about, such as intragenic fusion variants and homology scores.
The software has been used extensively and successfully to better understand brain and kidney cancer through The Cancer Genome Atlas program.
As the chair of Bioinformatics and Computational Biology, John Weinstein, M.D., Ph.D., oversees a department that collaborates with many departments and people across MD Anderson. This includes Gordon Mills, M.D., Ph.D., chair of Systems Biology, Lauren Byers, M.D., assistant professor in Thoracic/Head and Neck Medical Oncology and Wei Zhang, Ph.D., professor in Pathology, who incorporate molecular data into their work to improve cancer prognosis. Centers such as the Institute for Personalized Cancer Therapy, the Center for Targeted Therapy and the Kleberg Center for Molecular Markers are also focusing on this area of study.
Weinstein is widely known for his introduction in the early 1990s of the Clustered Heat Map, which has been labeled by some as the “most common visual icon of the post-genomic era.”
The Heat Map was built upon work Weinstein completed while serving as head of the Genomics and Informatics Group at the National Cancer Institute, where he developed a collection of web-based bioinformatics software packages known as the “Miner Suite.”
The Clustered Heat Map graphically allows investigators to understand complex data and is today a commonly used bioinformatics software tool.
Weinstein currently leads a group that is on the forefront of discoveries in molecular profiling. His approach to research, for which he has coined the term “integromic,” has been described as “part experimental, part computational.”
Weinstein’s group uses more than two dozen microarray platforms and other technologies to unscramble the tangled network of cellular processes that are believed to lead to cancer. When properly deciphered, these processes may offer exciting new options for prevention and treatment.
Former U.S. Secretary of the Interior Stewart Udall once wrote that “mining is like a search-and-destroy mission.” The work of scientific investigators such as Liang, Verhaak and Weinstein — along with many of their MD Anderson colleagues — is very much a search mission for the essence of our genetic makeup. And delving into the depths of genetic mysteries for new solutions may one day destroy the cancer that robs us of our health.