The O’Reilly Data Show Podcast: Reza Zadeh on deep learning, hardware/software interfaces, and why computer vision is so exciting.
In this episode of the Data Show, I spoke with Reza Zadeh, adjunct professor at Stanford University, co-organizer of ScaledML, and co-founder of Matroid, a startup focused on commercial applications of deep learning and computer vision. Zadeh also is the co-author of the forthcoming book TensorFlow for Deep Learning (now in early release). Our conversation took place on the eve of the recent ScaledML conference, and much of our conversation was focused on practical and real-world strategies for scaling machine learning. In particular, we spoke about the rise of deep learning, hardware/software interfaces for machine learning, and the many commercial applications of computer vision.
Prior to starting Matroid, Zadeh was immersed in the Apache Spark community as a core member of the MLlib team. As such, he has firsthand experience trying to scale algorithms from within the big data ecosystem. Most recently, he’s been building computer vision applications with TensorFlow and other tools. While most of the open source big data tools of the past decade were written in JVM languages, many emerging AI tools and applications are not. Having spent time in both the big data and AI communities, I was interested to hear Zadeh’s take on the topic.
Here are some highlights from our conversation:
Scaling machine learning: Big data, big models, many models
Having big data, having big models, and having many models are all ways to scale machine learning in a particular dimension. There are problems where we probably don’t have the right kinds of models yet, so scaling machine learning might not necessarily be the best thing in those cases. For a while, we thought we could just do speech recognition with more traditional machine learning models; it turns out, neural networks are particularly good at that. For things like recommender systems, most companies are still using more traditional methods like factorization and even more basic linear models.
In computer vision, big models and a large amount of labeled data did result in a significant improvement. But keep in mind, ImageNet and the models that were used to win the ImageNet competition are separate things. Actually, the models that were used to win ImageNet—convolutional neural networks (CNNs)—were around long before ImageNet was founded. So what we saw, actually, in the ImageNet competition, was a significant improvement when we took a CNN and ran it on ImageNet versus other methods.
So, the win there was actually the architecture—it allowed us to say, “Look, this is a robust finding that is not due to noise. It is not due to some weird quirk in the data, it was that the data was so large.” The data was terabytes, so if you could take terabytes of data and explain it with a model, chances are people will take you much more seriously. Because it was in a competition format where there was a third party verifying results (Stanford in this case), it really was a very dependable result.
In the case of ImageNet, it’s definitely clear that scaling was crucial for the result to be as significant as it has been. The fact that we had such a large data set means that data set is actually used in real life to train models that are used in a lot of places. Because the ImageNet data set had so many data points, it’s a pretty applicable model. When you build models off of it, those models are also pretty applicable to the real world because there’s just so much data in ImageNet.
Lack of flexible hardware acceleration from the JVM
The Java runtime only needs to be compiled for a specific architecture; then, the bytecode that the Java developer writes is portable between different machines that have a Java runtime. So if I write a Java application on my Mac, then that application can run in all kinds of places, including Windows, phones, and other places.
That has been a blessing and a curse. The blessing is that your code is portable. The curse is that your code cannot touch hardware particularly closely, because if it does, then that means you would be going around the capabilities of the bytecode that you have to commit to. That is a curse because when you’re writing machine learning operations that want to use the very advanced features of your hardware—let’s say you have hardware that does many batch matrix multiplies all at once and you want that capability because it’s going to speed up your code dramatically—JVM won’t let you do that because, well, you’re in Java world. You’re not supposed to know which platform you’re running on. So, all of that is hidden from you deliberately.
… The JVM is the big difference between the customized cloud infrastructure inside Google and the rest of the world, which uses the open source stack made up of the Hadoop and the Spark ecosystem. Google is able to eke out more performance from their machines because they do everything in C++ and compile down. They have the luxury of knowing what hardware they’re running on because they buy the hardware as well as run the software, as well as build the software. But when you’re in the open source world, you want to be a little bit more portable.
The rise of computer vision
Matroid is a computer vision company. The reason we’re focused on computer vision is because the opportunity there has only opened up in the past few years. Computer vision, I think, has been not so usable in the past. Ten years ago, it wasn’t so usable. In the past few years, computer vision has come to a point where it is ready for industry, and we’re seeing many opportunities.
In contrast, speech recognition, natural language understanding, and machine translation are tasks that have been around for decades. Speech recognition has been around for half a century. Machine translation has definitely been around for half a century, and these tasks have been getting better and better and better over time. The commercial opportunities associated with them have been slowly getting picked off across this half a century.
With computer vision, it’s been so hard to make any headway as a research community that the opportunities have not been picked off. We think there’s an unmet need in industry for computer vision, and we would like to become the computer vision company. It’s a tremendously exciting field to be in from a technical perspective as well as from the opportunities available commercially.
Full disclosure: I’m an advisor to Matroid, and Reza Zadeh and I are both advisors to Databricks.