Title: Senior Machine Learning Engineer, Github
Location: San Francisco, CA
Education: BS Computer Science, MS Intelligent Systems, PhD Computer Science Education
Years in industry: More than 20 years in computing, four years post-PhD
Favorite machine learning researchers: Francois Chollet for his writing, Rachel Thomas and Jeremy Howard for ethical brass-tacks approach
Omoju Miller’s path to machine learning was anything but straightforward. She first encountered computer science more than 20 years ago as an undergrad and followed her passion where it led. After stints leading Google’s nonprofit investment in computer science education and serving as a volunteer advisor to White House Presidential Innovation Fellows, Omoju came to Github in 2017 to help build recommendation engines for the popular open source software development platform. Read on for Omoju’s thoughts on the future of machine learning, whether or not to pursue a PhD, and the challenges of working in industry.
How did you get started in AI?
I got into AI in undergrad at the University of Memphis. I took an intro to cognitive science seminar that introduced me to the ideas around artificial intelligence. I became riveted and have not looked back.
What are you working on at GitHub?
I build recommendation engines that support developers’ decision making around their engagement in open source. I contributed to github.com/explore engine. For that product, we used a collaborative filtering approach based on user’s past behavior on GitHub.
What’s the most challenging thing about your job? What do you like most?
The most challenging thing is acquiring patience about how ML-based data products are adopted in the company. When I started at GitHub, I had all these ideas around the kinds of data products I wanted to build. I was in an egocentric place as a product creator; I didn’t ask the question, “What is the most important thing for the customer?” Further, our team is the inaugural ML team for the company. To be successful in my role, I had to understand that building data products within the already-defined system of GitHub would take a long time.
What I like most about my job is the dataset that I get to work with and the ability to improve the process of software development. The GitHub dataset is unique because a significant chunk of technological history in the last 10 years is in our systems. GitHub is like the custodian of technological progress. The code for the first image of a black hole is on GitHub. The Python packages that supported this scientific breakthrough were developed and maintained on GitHub. That is just one facet of human progress that exists on our platform. I see myself as a kind of applied anthropologist who leverages an understanding of human technical behavior to help accelerate that progress.
Take us through your typical workday.
I usually start at around 9 am. I try to split my week into 1.5 days for meetings and the rest for focus time. On the meeting days, I take things easy and catch up on administrative tasks like emails, abstracts, reading and so forth. On the focus days, I start writing code at about 11 am, break for lunch at noon and go on till 5, 6, or 7 pm depending on what I’m working on.
What tech stack do you use and why?
For building data pipelines, I use SQL and Presto. I use Atom as my editor with very few plug-ins. When I’m writing Python code, I use Jupyter notebooks; I like the fact that I can write my thoughts down and do exploratory data analysis with it. For developing and deploying models, I use Kubeflow. For model building, I often default to the Keras implementation of TensorFlow.
You’ve worked on a number of initiatives outside of building ML products, from leading Google’s non-profit investment in computer science education to serving as a volunteer advisor to the Presidential Innovation Fellows. How have these experiences shaped your understanding of machine learning and the future of education?
The biggest thing these experiences have taught me is that technology is ultimately a tool in service to people. If one stays in the academic arena too long, one can easily slip into a fascination with the technology. As an ML engineer, that is a dangerous place to be. It isn’t the technology that is important but how the technology can be deployed to address user needs, while optimizing for the mission of the company.
I came to GitHub because it’s a platform that supports humanity’s intellectual aspirations. As Marc Andreessen said years ago, software is eating the world. Today, GitHub is the place where that feast happens. One unintended consequence is that users come to us for an informal education around technology. As an ML engineer, I see my work as building the planetary scale decision-making service around technical work, which ultimately lowers the barrier to entry into software development. As we incrementally achieve this, we allow the world to come on our platform and build the solutions to its own problems.
How do you keep learning?
I leverage my colleagues on the ML team, follow folks on Twitter like @WaleAkinfaderin and @seb_ruder, and watch conference talks on YouTube. I typically pay attention to the talks coming out of PyCon, PyData, and the Data Council conferences. I particularly like the Data Council conference because it’s strictly centered around applied data work. Other conferences have talks that bleed into other areas, but this is a data practitioner’s conference.
You completed a PhD at the intersection of computer science and education. Would you recommend that aspiring machine learning engineers pursue a PhD? Why or why not?
If it’s an applied PhD, yes. If it’s theoretical, no. I believe in interdisciplinary work because it forces you to apply ML expertise to a field outside of pure computer science. One of the things I’ve seen in a lot of people coming from PhD programs is a conflation of ML researcher and ML engineer. We are MLEs; we’re not paid to write papers and publish. Our role as MLEs is not to push the boundaries of knowledge; our role is to push the boundaries of what is possible for our users in the near future. The horizon of our impact should be about a year out, whereas a researcher’s horizon of impact can be 10-plus years. If you choose to pursue a PhD, it should be in something that lets you optimize the process of understanding, building, and shipping, with an emphasis on the last two.
What’s a problem that you’d like to see AI solve?
Truly serving the broad spectrum of society by building cutting edge products for various customer segments; from middle-class parents, to low-income service workers, to the luxury segment, and so on.
What advice do you have for people trying to break into AI?
This is simple. Keep trying, be persistent, and muster the discipline necessary to hone your craft. Don’t worry about the rejections and the failures; They don’t matter in the long run. Keep at it, and one day, you will find yourself exactly where you need to be.