The AGCTs of cancer
How MD Anderson researchers are working with The Cancer Genome Atlas to unravel the mysteries of the disease
When President Barack Obama called for a $1 billion National Moon Shot Initiative to accelerate cancer research and “make America the country that cures cancer once and for all,” he looked to Vice President Joe Biden to head the effort.
Biden’s leadership — no doubt inspired by the death of his son Beau in 2015 from brain cancer — has involved attending key professional cancer research meetings and the formation of the Vice President’s Exceptional Opportunities in Cancer Research Fund. The fund encourages scientists, physicians, advocates, philanthropic organizations and representatives of the biotechnology and pharmaceutical industries to work together and share data to generate new ideas and new breakthroughs.
That sort of collaboration has been an important part of a program largely unknown to many Americans: The Cancer Genome Atlas (TCGA). Launched in 2005, and funded jointly by the National Cancer Institute and the National Human Genome Research Institute, TCGA makes cancer research data openly available to the research community to accelerate understanding of the molecular underpinnings of the disease and provide avenues to potential new therapies. Although the program is in its final year of existence, it has been key to invaluable contributions to cancer science and laid the foundation for further exploration through new collaborative programs. MD Anderson has played an integral role in TCGA — from its formation to the present — through leadership and participation in TCGA-related studies.
Data from the Cancer Genome Atlas and other studies “has clarified that cancer is a disease of the genome,” wrote Douglas Lowy, M.D., acting director of the National Cancer Institute (NCI), and Francis Collins, M.D., Ph.D., director of the National Institutes of Health (NIH) in the May 19, 2016, issue of the New England Journal of Medicine. “It has become increasingly apparent that knowing what driver mutations are present in a particular tumor is often more important than knowing which organ system it arose from. Genomic technology has also shown that although each tumor is molecularly unique, certain pathways are repeatedly affected — findings that have informed the design and use of a new generation of drugs targeting those pathways.”
“What motivates me every day is the thought that somewhere in these datasets the answer to how to improve patient outcomes might be buried," says Roeland Verhaak, Ph.D. “And that is what it is all about in the end.”
Data tsunami
The origin of TCGA may, in part, be traced to John Weinstein, M.D., Ph.D., chair of Bioinformatics and Computational Biology at MD Anderson. But it almost didn’t happen.
It began back in 1991 when Weinstein, who was then working at the NCI, was planning to take the day off. Then he remembered his boss was speaking at Grand Rounds that day. Going to work was a fortuitous decision that would lead Weinstein to pursue the development of vital data-related programs that would ultimately influence the creation of TCGA.
“During his talk, I had an idea that led to my laboratory group spearheading a comprehensive molecular profiling of the NCI-60, a set of 60 human cancer cell lines used by the NCI to screen more than 100,000 chemical compounds, plus natural products, for anti-cancer activity,” says Weinstein.
He calls the supernova explosion of data resulting from these earlier efforts a “data tsunami,” and likens the TCGA to a “12,000-square chess board of patient samples.”
“It provides us a context within which to work out the rules of the genomic game and generate useful, potentially significant results, benefiting cancer patients and their families,” he says.
Those results can provide researchers with a fresh look at how tumors are developing, and point the way to new therapies. Weinstein often is referred to as a “pioneer of postgenomic biology” in part for another one of his inventions: clustered heat maps. These maps allow researchers to visualize patterns in the huge masses of data and more quickly take advantage of them. Clustered heat maps have appeared in many thousands of publications, and Weinstein’s group has now developed “Next-Generation Clustered Heat Maps (NG-CHMs),” which can be zoomed in on and navigated like a Google map. NG-CHMs are routinely used for data from many different cancer types in TCGA.
Today, TCGA is a massive effort that is studying genomic changes in more than 30 different cancer types and, according to Rehan Akbani, Ph.D., assistant professor of Bioinformatics and Computational Biology, MD Anderson is a “heavy hitter and proud participant in every one of TCGA’s disease working groups.”
“The MD Anderson Genome Data Analysis Center (GDAC) has established itself as the premiere provider of reverse-phase protein array (RPPA)-based TCGA proteomics data and analysis as well as batch effects analysis and data quality control,” says Akbani.
Batch effects are the findings and results that occur due to technical artifacts when aggregating data from multiple institutions with varied computer systems, lab procedures and data collection methods. MD Anderson’s work with assessing batch effects has helped assure that the data being studied by research institutions worldwide via TCGA are accurate and consistent. RPPAs involve performing protein assays on thousands of samples simultaneously, allowing measurements of protein expression, as well as protein modifications such as phosphorylation, which turn protein enzymes on and off.
“The true value of TCGA will probably not be recognized until years from now,” says Rehan Akbani, Ph.D. “As John Weinstein said so eloquently, ‘It is the work of a generation to mine all of this information, and although the project is concluding, it is really just the beginning.’”
Finding the treasure
Sifting through enormous amounts of data has often been referred to as “mining,” and scientists are the treasure-seekers in search of valuable nuggets that will lead to a better understanding of cancer’s molecular workings.
MD Anderson investigators have had significant success in mining TCGA data and, working with multiple partners at cancer research institutions globally, have made important discoveries about the molecular nuances of cancer cells.
“MD Anderson researchers not only have contributed at a leadership level to The Cancer Genome Atlas, helping to ensure its success, but we also have participated in and served as lead investigators for TCGA-based studies that have opened up new possibilities for cancer diagnosis and treatment,” says MD Anderson President Ronald DePinho, M.D. “I am proud of the effort that has gone into this pioneering program, and I know we will continue to benefit from the data for many years to come.”
Revelations about two aggressive cancers
Roeland Verhaak, Ph.D., associate professor of Bioinformatics and Computational Biology, has led research investigations based on TCGA data that have revealed startling findings for two aggressive forms of cancer.
Earlier this year, Verhaak published findings from a study co-led by the University of São Paulo’s Ribeirão Preto Medical School and Columbia University that revealed new information about diffuse glioma, which is found in some adult brain cancer patients.
Analyzing TCGA data, the team defined a complete set of glioma-associated genes from patient samples and used molecular profiling to improve disease classification. They were able to identify molecular correlations and provide insight into disease progression from low to high grades.
Another study, led by Verhaak and colleagues at the University of Michigan, revealed significant new findings about adrenocortical carcinoma (ACC), a rare cancer typically associated with poor prognosis.
The researchers — including those from 39 international institutions — examined 91 ACC tumor specimens from four continents and observed “massive” DNA loss followed by whole genome doubling (WGD). WGD occurs when tumor cells acquire an extra copy of their entire genome. The researchers found that WGD was associated with an aggressive clinical course, suggesting that it could be a hallmark of disease progression. They speculated that tumor growth could be slowed if they could prohibit WGD in future pre-clinical studies.
After TCGA comes to a close in early 2017, new NCI genomics initiatives run through the NCI’s Center for Cancer Genomics (CCG) will continue to build upon TCGA’s success by using the same model of collaboration for large-scale genomic analysis and by making the genomics data publicly available.
The effects of RNA editing and gender on cancer
Han Liang, Ph.D., associate professor of Bioinformatics & Computational Biology, also has relied on TCGA data in his research.
One study, which assessed 6,226 samples from patients with 17 different cancer types, revealed new information about RNA editing events in tumors versus normal tissue, and provided evidence that
RNA editing could selectively affect drug sensitivity.
The study findings opened up yet another avenue for understanding the biological reasons why some people live longer or respond better to treatment.
Another TCGA-related study led by Liang pinpointed previously unknown differences related to gender and cancer. Liang reviewed 13 cancer types and provided a new molecular understanding of how a person’s sex affects diverse cancers. The research revealed two
cancer-type groups associated with cancer incidence and mortality, suggesting a “pressing need” to develop sex-specific therapeutic strategies for some cancers.
“This is a crucial finding because male and female patients with many cancer types often are treated in a similar way without explicitly considering their gender,” Liang says.
A foundation for future discovery
Like TCGA and the new National Moon Shot Initiative, MD Anderson’s Moon Shots Program emphasizes collaboration.
“The MD Anderson GDAC and TCGA communities as a whole are rich sources of data, analysis tools and expertise for the MD Anderson Moon Shots Program,” says Weinstein.
“MD Anderson investigators who are members of the GDAC also participate in research for almost all of MD Anderson’s moon shots.”
Liang sees direct ties to the moon shots in his investigations, which are generating new insight into unexpected types of novel drivers behind cancer formation.
“With this knowledge, the moon shots can focus on these driver events and assess how they can be used to benefit larger groups of patients in clinical settings,” he explains.
Liang believes that TCGA also “lays a solid foundation for the National Moon Shot Initiative,” which is still in its formative stages.
“In a sense, TCGA catalogs the molecular drivers in different cancer types,” he says. “But the best way to use those drivers to guide patient care remains unclear. Institutions need to work together to
incorporate the key drivers identified by TCGA into clinical trials.”
The Cancer Genome Atlas glossary
Genome: The complete set of DNA (genetic material) in an organism.
Almost every cell in the human body contains a complete copy of the genome. The genome contains all of the information needed for a person to develop and grow. Studying the genome may help researchers understand how different types of cancer form and
respond to treatment. This may lead to new ways to diagnose, treat and prevent cancer.
The Cancer Genome Atlas has generated comprehensive, multidimensional maps of the key genomic changes in 33 types of cancer.
The TCGA dataset is 2.5 petabytes of data describing tumor tissue and matched normal tissues from more than 11,000 patients. It’s available to the public and has been used widely by the research community.
How big is a petabyte? One petabyte is equal to 900 billion pages of plain text. According to a New York Times story on science and data, it’s roughly equivalent to 799 million copies of “Moby Dick.”
Genome Data Analysis Centers (GDACS)
Immense amounts of data from array and second-generation sequencing technologies must be integrated across thousands of samples.
These centers provide novel informatics tools to the entire research community to facilitate broader use of TCGA data. MD Anderson is home to two GDACs.
Genome Characterization Centers (GCCs) are responsible for characterizing all of the genomic changes found in the tumors studied as part of the TCGA program. The GCCs use state-of-the-art technologies to analyze genomic changes involved in cancer and provide this data to the cancer research community.
MD Anderson's role in the Cancer Genome Atlas
MD Anderson has played a leading role in establishing TCGA’s Genome Characterization Centers. Gordon Mills, M.D., Ph.D., chair of Systems Biology, heads up the Proteomics GCC, which is aimed at improved analysis of cancer cell proteins with the goal of identifying proteins that could be used as drug targets or biomarkers for screening and diagnosis.
MD Anderson also leads two of TCGA’s seven Genome Data Analysis Centers, which work with the GCCs to develop tools that help researchers process and integrate vast amounts of data analyses from
across the entire genome.
MD Anderson principal investigators and co-PIs for two of the GDACs are Weinstein, Akbani, Mills and Al Yung, M.D., professor of Neuro-Oncology.