Working AI: At the Office with Research Scientist Or Cohen
Title: Research Scientist
Location: San Francisco, CA
Education: BA, MS, and PhD in Physics
Years in industry: 6 years
Favorite machine learning researchers: Geoffrey Hinton: He’s one of the pioneers in this field. I admire his original approach in suggesting the capsule network solution when tackling the problem of pose invariance. Although this idea has not yet worked on a large scale, I think the field needs more original ideas like these to make significant progress towards strong AI. Demis Hassabis: I’m impressed by his role in DeepMind and all of their latest achievements. These are things that were considered unbelievable only a few years ago.
Or Cohen’s deep background in physics gave him the theoretical foundation to dive into machine learning. While gaining valuable experience building machine vision and NLP products at an AI startup, Or also filled in the gaps in his ML knowledge by taking online courses and experimenting in his own personal projects. Now he’s prototyping new models at Lyft to boost the accuracy and efficiency of the ridesharing marketplace. Read about Or’s experience transitioning from the theoretical to the practical, the problem he’d like to see AI solve, and how he keeps learning.
How did you get started in AI?
During my PhD in Statistical Physics, I explored how complex systems of many particles behave from a theoretical perspective. This is an intriguing field with many open questions, and it’s still very close to my heart. Nonetheless, after finishing my PhD, I wanted to work on more tangible problems where I can see the impact of my work. A mathematician I worked with connected me with his past student who was just starting an AI startup. It was a great opportunity to create a product from scratch in the quickly developing area of machine vision and NLP. After some exploration, I decided to join the founding team and lead the engineering aspects of the product.
My background in applied math, as well as numerical analysis and coding from my Masters and PhD, made me a good candidate for the position. I also did some machine learning coding in CERN as a summer student. But I still had gaps in my machine learning knowledge, so I took a few courses on Coursera (Stanford’s Machine Learning course and Buffalo University’s Computer Vision Specialization) and explored scikit-learn models during the first few months at the startup. When I started working, deep learning was barely used outside of academia. It was only after a year at the startup that we began adopting deep learning.
What are you working on at Lyft?
I work on the ETA team. Our job is to predict how long it’ll take to get to a destination given a ride start location, end location, and start time. These predictions feed into much of Lyft’s work. For instance, they help determine which driver to dispatch to you or which passenger to match to your shared ride. The more accurate our predictions, the more efficient Lyft’s marketplace is.
What’s the most challenging thing about your job? What do you like most?
Unlike my previous job at the startup, I am not creating the first version of a product, where the level of accuracy is not crucial. For Lyft’s ETA modeling, all of the “easy” initial problems have been solved, and we’re working hard to push the accuracy of our models to the limit. This means that simple ideas or off-the-shelf packages don’t work anymore, and we have to dig deep into applying sophisticated statistical methods and coding unique solutions to gain additional accuracy.
I also like that we can see very tangible results of our efforts. Every small increase in accuracy can be linked to an improvement in our users’ experience (say by matching them with the absolute closest driver) and reduced cost for Lyft (by making the whole marketplace more efficient). It’s very satisfying!
How is being a research scientist different than a machine learning engineer on your team?
At Lyft, research scientists belong to use-case specific teams such as ETA, pricing, or shared rides. Machine learning engineers belong to the machine learning platform team. Some of them maintain our machine learning platform, where we prototype, train, and ship our models. Others help implement machine learning models in teams that don’t have embedded research scientists.
In general, research scientists are responsible for the modeling effort, from gathering data, to cleaning, training, deploying and tracking performance, as well as communicating with downstream or upstream teams. By staying on the same team, research scientists develop a very good sense of their team’s specific problem, and thus are able to come up with unique solutions. Machine learning engineers are more focused on finding generalizable solutions to support the whole organization. Other companies may structure these roles differently.
Take us through your typical workday.
Depending on where I am in my project, I spend a little more than half of my day doing individual contribution (I.C.) work—prototyping a new model in our code base, training the model, evaluating its performance, or preparing it for an experiment. As our engineers are improving our infrastructure, I can spend more and more of that time prototyping models. The rest of the day, I’m in meetings where I’ll discuss my ideas with other scientists, help others in my team, coordinate with engineers and client teams, and report on my progress.
What tech stack do you use and why?
We code all of our ML algorithms in Python, where we can easily use various open-source packages such as numpy, scikit-learn, or XGBoost. For the most complex ML modeling, we use gradient boosting decision (GBT) tree packages, such as XGBoost and LightGBM, and more recently an in-house GBT package that I helped create, which is better suited for spatio-temporal data. Gradient boosted trees provide very high accuracy for spatio-temporal problems with relatively few parameters, fast training and little tuning. When implementing neural networks, I personally prefer to use Keras because it’s a very simple Python interface and has a good selection of initial weights for the layer parameters. When working with large datasets, I use either Hive or PySpark. The latter allows us to do very efficient large-scale data querying and manipulation, which is especially helpful for high-volume map-based data.
You completed an undergraduate degree and a PhD in physics before you started working on machine learning in industry. What was that transition like?
My PhD was in the field of statistical physics, which has a lot of connections to machine learning. Statistical physics studies how many particles interact under a certain energy function, which defines the interactions between them. In machine learning, we’re interested in how many parameters (of, say, a neural network) interact under a certain cost function, which also defines the interactions between them. It was therefore natural for me to be interested in machine learning as well.
But the transition wasn’t easy. I had to learn a lot of new concepts and get used to working with complex code bases, but it was nonetheless fun and very interesting. It also took me a while to change my mindset, from the theoretical approach of understanding how complex systems work, to the more practical one of how to make them do what you need. It was easier for me than it may be for other physicists because I had a lot of coding experience from the many numerical simulations I had to code during my masters and PhD.
How do you keep learning?
I try to keep myself up to date by reading blogs such as https://blog.re-work.co/ and https://www.topbots.com/. Other than that, I work on my own private projects. I have one in the area of image similarity, and another about accelerating the training of deep learning networks. These projects keep me in touch with other fields I’m not currently working on in my day job. Whenever I encounter an important topic I want to learn, I try to take a Udemy course. Right now I’m taking a course about Hidden Markov Models.
What’s a problem that you’d like to see AI solve?
Today, we’re using gradient boosting decision trees for many of our problems, but neural networks have a lot of advantages, most importantly in representing high cardinality features such as user ids with embedding (which GBT cannot do). However, neural networks still cannot represent map location features well. There are many possible solutions, but all the ones I came across were inferior in performance to GBT. There’s still more theoretical work needed to come up with a good solution to this problem; It would allow the geospatial community to fully adopt neural networks and reap the benefits they can provide.
What advice do you have for people trying to break into AI?
Today, there’s a lot of emphasis on acquiring practical experience in AI to break into the field. It’s true that a lot of knowledge on how to produce performant AI systems is gained from experience. However, I think it’s also important to learn the theoretical background of machine learning and statistics. Having a strong and broad theoretical background can help you make good decisions when choosing the type of classifier to use (which may not always be a neural network), understand why something isn’t working, and suggest unique solutions. Having that theoretical background can save a lot of time, which is why it’s an important factor in our interview process at Lyft.