What does a data scientist do?
February 08, 2024
Medically Reviewed | Last reviewed by an MD Anderson Cancer Center medical professional on February 08, 2024
Theoretically, anyone who analyzes data to do science could call themselves a data scientist. But to me, that term also implies the use of computers. So, it’s data plus computers that makes someone a data scientist in my mind.
I also tend to apply a slightly narrower definition: I think of a data scientist as someone who’s concerned with deriving hidden knowledge from data trends and then making predictions based upon them.
To accomplish those goals, you need two things. The first is the ability to handle, organize, standardize, label, test, move and make data amenable to analysis. The second is the ability to make predictions based on that data, and to develop the artificial intelligence (AI) tools needed to analyze it, learn from it and evolve as an organization.
What makes a good data scientist?
You don’t have to be an oncologist to be a good data scientist. In fact, very few data scientists at MD Anderson come from an oncology background. My team consists of everything from astrophysicists to shopping website analysts. But many of their skillsets are completely transferrable, so I am thrilled to have their talents here.
I didn’t even study oncology myself — just pure computer science and molecular biology. I applied data science to drug discovery when I started out in the biotech industry. It wasn’t until I became junior faculty in academia that I started developing the oncology knowledge I use today.
It’s easy to fall into the trap of thinking you’re poised for success in the field of data science, just because you’ve gotten some training on the most recent technologies. But the tools being used today are very different from the tools that will be used two years from now. So, if all you know how to do is push buttons on the latest fad, you’re going to get lost straightaway.
To be a good data scientist, you’ve got to have a good grasp of the fundamentals of math and computer science — and a really solid understanding of the underlying methodologies to identify trends and make predictions. You’ve also got to understand the limitations of any tools you’re using and how to design questions to make sure your experiment is both unbiased and testing the actual hypothesis.
“Bilingual” people who can “speak” both oncology and data science — and take complex biological problems and translate them into computational questions — are what I call “translational” data scientists. That’s what I consider myself. And that’s what I try to help each of my new team members to become, if they’re not one already.
How MD Anderson is harnessing the power of data
As a drug discovery scientist trained in molecular biology, I’ve always been fascinated by the idea of doing things at scale. Bringing all the data together and identifying hidden patterns in it that no one else can see — then using those insights to inform drug discovery efforts — is much more satisfying to me than trying to find the answers to very specific questions. But we need both types of science to advance medicine, of course.
I just love the process of drug discovery. It brings together experts from so many different disciplines, including genomics, physics and chemistry, to name a few. It’s a really complex field.
It’s also exciting to be exploring drug discovery here at MD Anderson, where I get to work with people like Andy Futreal, Ph.D., who is leading initiatives to collect data and profile patients in really meaningful ways; and with Tim Heffernan, Ph.D. ,who is leading our Therapeutics Discovery teams to explore new ideas through experiments that lead to the development of new drugs.
MD Anderson already has so many phenomenal initiatives— like the Patient Mosaic™ — that don’t exist anywhere else. All that was lacking was a cohesive way to harness its collective data to effectively drive our decision-making. That’s why I was recruited: to develop a kind of “information superhighway” that sits right in the middle, allowing a continuous feedback loop to keep us all on track.
Although it’s still early, we have already been able to harness knowledge from our rare tumor patient samples and identify potential “Achilles heels” genes using our cutting-edge AI methods. We then demonstrated these genes’ importance using our Therapeutics Discovery's translational biology expertise, and we are moving them to the drug discovery stage. This shows how MD Anderson’s unique capabilities enable us to quickly change the way we make advances that benefit patients.
As a co-lead for Computational Modeling for Precision Medicine in our Institute for Data Science in Oncology (IDSO), I’m spearheading that initiative with the Adaptive AI-Augmented Drug Discovery and Development program, “A3D3a.” I contrived the name deliberately so that we could call it “Ada.” It’s a tribute to one of my heroes, Ada Lovelace, the world’s first computer programmer.
The daughter of British aristocrat and Romantic poet Lord Byron, Lovelace worked with inventor Charles Babbage on machines that could make calculations at great scale. Then one day, she said to him, “Why don’t we make a machine that can be programmed to do whatever calculation we want it to?” She wrote the first computer program and with that, the era of computer programming was born.
Why I joined MD Anderson
The collective brain power at MD Anderson is truly unequaled. It simply doesn’t exist anywhere else. Neither does the ability to make advances benefit patients more quickly. That’s why I believe MD Anderson is the only place on the planet where we can do this. But to be successful, our work needs to start and end with the patient. That means:
- coming up with each hypothesis based on the existing patient data
- validating it in an experimental setting relevant to patients
- taking it to the preclinical and clinical trial stages in our own hospital
- bringing it to our patients at the bedside and in the clinic, and
- using the feedback generated by that process to refine any new drug therapy or patient care practices.
For me, personally, that also means developing algorithms to help us learn more about cancer, and uncovering new information that can further refine our decision-making processes. Using AI to inform each and every one of the thousands of decisions our faculty and staff make each day is what I’ve dedicated my career to — and precisely why I joined MD Anderson.
A lot of drugs that get approved to treat cancer today are considered “me, too” drugs. This means they ride the coattails of the ones that came before them. But when you see a brand-new drug that you helped develop enter a Phase I clinical trial to be tested for the first time — and you know that it will soon start directly benefiting patients — it really is the most exciting thing in the world. That’s where I get my buzz every morning, and it’s my favorite part of the job.
It’s too soon for anything I’ve been working on here to be entering Phase I clinical trials yet. But by applying data science, we’ve already identified several targets for potential therapy development in very rare cancers, such as metastatic uveal melanoma, which is really difficult. And that’s exactly the kind of thing we can only do at MD Anderson, because you need all the unique components of the entire pathway in one spot.
Now, with the help of AI knowledge bases like the CanSAR platform I created, we’ll soon be able to make drug discoveries that even places like MD Anderson couldn’t possibly have made on their own.
To be successful, our work needs to start and end with the patient.
Bissan Al-Lazikani, Ph.D.
Researcher