After Doing Your Research, It's Time to Trust Your Instinct

Author Spotlight

In the Author Spotlight series, TDS Editors chat with members of our community about their career path in Data Science, their writing, and their sources of inspiration. Today, we’re thrilled to share our conversation with Murtaza Ali.

Murtaza is a PhD student at the University of Washington, studying Human-Centered Design and Engineering, with a concentration in Data Science. His research focuses on Data Visualization, human-centered data science, and computing education, and some of his other (occasionally related) interests include writing and teaching.

What sparked your initial interest in data-related topics?

I was an Applied Math major back in undergrad at UC Berkeley, and during my junior year I took a popular course – "Data 100: Principles and Techniques of Data Science." I didn’t have too many expectations going into the course; I mostly just took it because others recommended it and it covered one of my major requirements.

That class had an immense impact on my future academic trajectory. It had a unique structure: it was co-taught by two professors, one an expert statistician and the other a renowned computer scientist. This was intentionally done to highlight that data science is by its very nature interdisciplinary, something I emphasize a lot in my writing and work. During weekly lectures they discussed high-level topics as well as societal implications, and assigned labs and projects to give students hands-on practice with data wrangling, processing, and analysis tasks.

What was particularly unique about that semester, though, is that it was Spring 2020 – which was interrupted as we went online due to COVID-19. That semester, I spent a lot of time using the skills from the class to explore and analyze various online datasets, and even participated in my first Kaggle competition. I attended weekly Zoom office hours to discuss my budding interests with the professor, who encouraged me and even directed me into a data-focused research group. It’s been about two years since then, and my fascination with data shows no sign of stopping any time soon.

As someone currently pursuing a PhD, what are the problems and questions you find yourself most drawn to?

Long before I took the "Data 100" class and became interested in data science, I was an introductory computer science TA and knew I loved teaching—it’s what pushed me to get a PhD in the first place. After taking multiple trainings, struggling in front of many groups of students, and reviewing the results of various teaching evaluations, I realized that students (I use that term loosely, as it doesn’t just mean college undergraduates) desired one thing above all others in their teachers: the ability to simplify complex concepts in an easy-to-understand way. That was the gold standard, and I worked hard to achieve it.

In many ways, I think this is what attracted me to data science – and within data science, into data visualization. I see many connections between visualization and Education, the primary one being that they both aim to take complex ideas and simplify them for some target audience. This is what I wrote about in my PhD application, and is largely why I mostly work on problems surrounding data visualization and computing education.

Do you already know what shape your work will take in the next couple of years?

Since I am still in the early years of my PhD, I don’t yet have one, specific project that takes up most of my time, but I am instead exploring various avenues in an attempt to find my niche, so to speak. That said, there are a couple of the questions I am currently interested in answering:

Text data is inherently unstructured and unordered, and that makes it hard to visualize. What if we simplify it to focus on texts and documents that have a human-defined order? One example of such a text is an academic research paper, organized by section, and within sections, by paragraph. Are there effective ways to design and implement visualizations for these and related documents?
People like pretty things. What is the right balance between making a visualization appealing to the human eye, but also effective for presenting data? We want people to learn, but we also want them to enjoy themselves.
Computing education is becoming increasingly relevant in today’s tech-centered world, yet the U.S. education system is also incredibly flawed, biased, and broken. This is an extremely broad question, but an equally important one: how do we make it better? And what exactly does "better" mean?
A slightly related question: is all this stuff about connecting visualization and education just a fantasy in my head, or can we actually intermix techniques from both fields in order to individually benefit each one?

These sound like very important questions to explore! Do you think advanced degrees are necessary in order to approach these kinds of big topics?

Around the time I was finishing up undergrad, many people were adamant that at minimum, a Master’s Degree was necessary to have a fighting chance at data science positions. While I think there is definitely still some merit to this claim, I also think that as undergraduate data science programs become more advanced and widespread, an advanced degree may not be strictly necessary.

How do you recommend people go about making that decision?

There are a few important questions to consider for oneself.

How does my current skill set fit in with the jobs I want? Will an additional one or two years obtaining a Master’s degree allow me to gain more foundational knowledge in statistics, probability, and data science, or enable me to specialize in an area that interests me, such as artificial intelligence? Do I really need to go to school, or can I self-learn?

Many people blindly ascribe to the cult of self-learning, but I personally believe it varies from person to person. Some, such as myself, benefit immensely from an organized class with structured assignments and external accountability.

With a PhD, the questions are different. Before getting a PhD, you need to be absolutely sure you enjoy either research or teaching. While it is certainly possible to get a PhD and then return to industry instead of staying in academia, the duration of the PhD will involve heavy research and some teaching. Most PhD programs are paid, and students receive a living stipend, but it is very low (usually between $2000 and $3000 a month, 9 months of the year, depending on the area the school is in as well as the program), so that is also worth thinking about.

Of course, the payoff can also be great: you will learn a huge amount, you will become a leading expert in the specific concentration of your thesis, and you will be a hot commodity on the job market. The two questions you must ask yourself, however, are 1) do I really need this, and 2) do I really want this?

And another thing: after you’ve done all the research you need to make a decision, trust your instinct. It usually knows what’s right.

Does your public writing influence or draw on your current academic work? How do these two spheres interact?

At the moment, because I am still in the early stages of my PhD, I haven’t yet written about my research projects. This is largely because they are still unpublished and need to remain private while they are undergoing peer review.

However, one of the main reasons I am getting my PhD is because I have a passion for teaching, and this is something which interacts with my writing immensely. One of my goals is to design better techniques for computer science and data science education, especially for those with no experience. As a result, many of the articles I write are targeted at beginners. I try to make it a point of taking complex ideas and expressing them in simple ways, as I mentioned above.

This is something we see in many of your recent articles on TDS. What inspired you to start writing for a broader audience to begin with?

Part of this has to do with what I mentioned above: one of my passions is teaching, and so writing pedagogical articles is a natural extension of that. Wanting to share my insights and thoughts on Medium is in some ways an expression of my frustration with the concept of "academic gatekeeping," which I have become all too familiar with as part of my PhD.

In academia, everything is written in complex, flowery language and never targeted toward the general public. I find this to be a major issue, especially because the entire point of research is to make the world in general a better place (broadly speaking). Medium gives me an outlet to write simply and colloquially, and make concepts accessible to everyone. As someone who struggled a lot during the earlier years of college, this is something incredibly important to me, and I am thankful to have the opportunity to do it.

To learn more about Murtaza’s work and explore his latest articles, follow him here on Medium and on Twitter. To get a taste of some Murtaza’s accessible and helpful data science guides, here are a few highlights from the TDS archives: