Working AI: Stoking GPU Clusters With Swetha Mandava

Title: Senior Deep Learning Engineer, Nvidia

Location: Santa Clara, California

Education: Bachelor of Technology, Electronics and Communication Engineering, Manipal University; MS, Electrical and Computer Engineering, Carnegie Mellon University

Favorite areas: Natural language processing, autoML, and interpretable AI

Favorite researchers: Christopher Manning, Fei-Fei Li, Tom Mitchell

Swetha Mandava didn’t set out to be a deep learning engineer. In college, she was interested in, well, everything. Call it luck, then, that she caught the AI bug. At Nvidia, she develops algorithms that make models for language and vision run more efficiently on large-scale hardware. She’s also an up-and-coming presence at conferences, having given talks at AI NextCon, Deep Learning for Science, and the Grace Hopper Celebration of Women in Computing. She spoke with DeepLearning.AI about breaking into the field and how a solid foundation helped her make the most of her wide-ranging curiosity.

What is your role at Nvidia?

I joined Nvidia about two years ago, right out of college. In my team, we develop state-of-the-art research algorithms. We use large-batch optimization techniques and improve their performance, so they work much faster on big GPU clusters. Then we make them open source, so our customers can use them.

What do you like most about your job?

The best thing about Nvidia is the amount of compute you have readily available. When I first implemented simple deep learning algorithms at CMU, it would take weeks, because I only had one GPU. Now I’m training algorithms that are thousands of times bigger, but they are so much easier to train!

What do you find most challenging?

Keeping up-to-date with state of the art and what’s coming next. Also, coming up with good ideas worth pursuing and staying motivated until the end.

The Batch recently featured your paper on PAR transformers. Can you tell us a bit more about that project?

Traditionally, transformer models are formed by interleaving self-attention and feed-forward layers. In our paper, we analyzed the optimal composition of transformer models using a differential neural architecture search. We studied the search results to understand if there are saturation points when using one block versus the other, and how accuracy depends on the relative number of blocks of each type and their order. Based on our analysis, we proposed new design rules that allow you to design efficient transformers that retain accuracy with a neat 35 percent performance benefit.

When did you first encounter AI?

The first memory I have of AI is probably Google Translate. As an undergrad, I spent a summer abroad. I was stranded in a random city where nobody spoke English, which was quite exciting. The first time I worked on AI algorithms was when I was in school at Manipal. I was a restless student, trying out everything and doing all sorts of activities and projects. A bunch of my professors at the time were just getting into AI. They gave me papers to read, and we started researching ways to detect failures in analog networks. I also took the Introduction to Machine Learning course on Coursera, which gave me a solid foundation.

What made you decide to pursue a career as an engineer?

During my undergraduate years at Manipal, I interned at many cool robotics startups. I continued my robotics work when I went to Carnegie Mellon University in Pittsburgh, Pennsylvania. AI was everywhere at CMU, and I think many of us wound up taking some AI-related course. Along the way, I naturally got more involved in machine learning and deep learning.

What did you do as an intern?

One of my favorite internships was with Rapyuta Robotics. I was working on coverage planning algorithms for drones. Ideally, if you want to inspect a wind turbine, you want to send a robot instead of having a human climb around it. I was designing the path that the robot should take to inspect the whole structure.

You also did internships in speech extraction and texture perception. What’s the common thread that carries you from one project to the next?

All of them have the same foundation in computer science and algorithms. It’s exciting to apply this foundation to different applications, and that’s helped me grow. For example, I developed coverage-planning algorithms for inspection drones and picked up programming and computing infrastructure skills along the way. These helped me build a good foundation for machine learning and deep learning.

You did your undergraduate studies in India and earned your graduate degree in the U.S. Did your educational experience differ in the two countries.

My experience as a CMU student was very different from my experience as a Manipal student, to a large extent because of my journey. At Manipal, since the focus is on the final exams, I had a lot of time to explore unrelated things throughout the semester. At CMU, the focus was on weekly assignments and projects. So we were always on the grind, and I got to deep dive into one particular area.

How do you stay up-to-date? And how do you continue learning?

Twitter has been useful for both staying up-to-date and learning. I follow a lot of excellent researchers, and they’re very generous with the content they share. That offers a good overview of the current state of the art. I also follow a lot of newsletters, like The Batch. I also have a lot of friends in the field, and communicating with them helps me stay up-to-date as well. We have groups where people share the content they are reading. And at Nvidia, we have weekly paper reading groups. It’s a quick way for somebody to take the lead and discuss what they liked in a paper.

How do you think the industry is doing when it comes to nurturing new talent?

I’ve had a very traditional entry into deep learning by pursuing a master’s in computer engineering. But there are many boot camps and courses online that allow people to build a foundation outside those traditional paths. It might still be hard to get a machine learning or AI job. The standard route is to get into a company [as a non-machine learning engineer] and transition into a machine learning or deep learning role.

What advice do you have for people trying to break into the field?

Take up a lot of projects and be active in the open source community. There are hackathons and conferences where you can network with people. And it helps to show up within your company, to show interest in what the machine learning and AI teams are doing, so you can collaborate with them and build toward working with them in the future.

Before moving to California, you lived in Pittsburgh. Which do you prefer, perogies or tacos?

Tacos. But I miss the fall on the U.S. east coast. Pittsburgh has beautiful seasons.