Yann LeCun On Reinforcement Learning, Deep Learning & Self Driving

Yann LeCun On Reinforcement Learning, Deep Learning & Self Driving

Artificial Intelligence & Machine Learning
Source: NYU
RR: We have a living legend with us, Professor Yann LeCun of NYU. He is the Head of AI for Facebook. His work with computer vision has redefined the field, and he is considered a godfather of CNNs, a godfather of deep learning, a godfather of AI really. So, let’s start with deep reinforcement learning. Do you have an opinion on it?

Yann LeCun: I actually have pretty strong unorthodox opinions on this, on reinforcement learning in general. In the context of machine learning and AI, machine learning has taken over AI to some extent these days. There are sort of three types of learning that use three paradigms. The most common one that everybody uses is supervised learning, and there is the less common one, reinforcement learning, which is used mostly for games. There are a bunch of applications in real-world applications, in which it’s used. Then there is unsupervised self-supervised learning, something that’s ill-defined and is perhaps the type of learning that we observe in animals and humans. And the question is “Where is the future of AI?”

  • As I said, almost all AI is supervised learning. Almost all machine learning is supervised learning. A little bit of it is reinforcement learning, and a growing number is actually based on what’s called self-supervised learning, particularly in natural language processing and a little bit in computer vision. What makes the difference between all of those things? It is that in supervised learning, you tell the machine where the answer is for every example you show it. You show it an image of a car or truck, or an elephant or a table, and you tell it this is a truck or an elephant. 

And if the answer is different from what you want, the one you want, you adjust the parameters the answer gets closer to the one you want. You refresh my learning. You don’t tell the machine the correct answer. You just tell the machine the answer you produce was good or was not so good. You give it some sort of rating on how good the answer or bad the answer was. And what the machine used to do is figure out in which direction to change itself so that its answer gets closer to the one you want.

This rating is called the reinforcement or the value function.

Actually, you don’t know this function, and the machine doesn’t know this function; it has to try things to figure out how to improve itself. Self-supervised unsupervised learning is more like learning about the world, learning how the world works without actually directing it towards a particular task. Reinforcement learning has excited people a lot over the last six years or so because of big success in games like Atari games, video games, Chess, and Go.

four-switch VCS (Atari 2600)

What you notice there is that those machines require an enormous amount of interaction with the game to be able to learn things. The best Go players have to train themselves. They have to play the equivalent of tens of millions of games before they play well. And there is a system put together by the deep mind to play starcraft, and, to play just on one map, the aircraft has to train itself for the equivalent of 200 years of real-time play, which is way more than any human, of course, on starcraft. Reinforcement learning is very powerful, but it’s unbelievably inefficient in terms of the number of trials. That’s because it has to be trained from scratch. The system has to learn absolutely everything from scratch.

RR:  Beyond finance and games where do you see reinforcement learning moving?

Yann LeCun: Interaction-like situations where you have some sort of ongoing process that you have to learn online [is where] reinforcement training can be used. There are a lot of those situations where the objective is very clear, and you don’t need to [use reinforcement learning]; you can use supervised learning. You want to make a time series prediction. For example, a time series prediction is a form of supervised learning and, you could say, self-supervised learning because the data you use to train the machine is of the same nature as the input it observes. So there, you don’t need to use reinforcement learning because reinforcement learning is used in  situations where the cost function is not clear.  To compute how well you’re doing, you have to be told how well you’re doing by some external system.

RR: So, jumping to deep learning, do you think it’s overhyped? 

Professor Yann LeCun: I think it depends on whom you talk to or whom you listen to. There are certainly a lot of people who are going to overhype deep learning, and those are people who are either looking for money or attention essentially. And there are people who are completely realistic about the possibilities of the potential outcomes of deep learning, and those tend to be the people who are more on the science side of things or on the research side of things. 

People on the research or science side of things are well aware of the limitations of the techniques they use, and they don’t have a very strong incentive necessarily to oversell what they’re doing except to their peers. But their peers are trained to detect BS, so it doesn’t work very well. But, in the context of industry or the media, it’s different for the public. There is a lot of hype going on there, and it has to be cold because it creates very high expectations for a lot of people who then do not get fulfilled. And when that’s happened in the past in the context of AI, it created a big disappointment and essentially an AI winter, where people said you promised me the moon, but you’re not delivering it.

RR: Do you think Q learning is overhyped? 

Yann LeCun: Possibly. It’s very useful in some situations. I think there are a lot of situations where people attempt to use reinforcement learning, and there are situations where reinforcement learning is really not the most efficient thing to do. 

If you’re a quant, you want to take the right action, but you also want to model the market or the set of financial instruments you’re manipulating, investing in, or making decisions about. The second part here is prediction. Given that you have a good prediction and a good model of the system you’re trying to control, what action are you gonna take?

For the first part of modeling the system, we’re not talking about reinforcement learning here, we’re talking about supervised or self-supervised learning time service prediction essentially. 

For the second part, it depends on what quantity you’re optimizing. You want to maximize returns, and it could be [the case] that the sequence of actions to do so is not derived directly from your predictions, but it also could be [the case] that they can [be derived from your predictions]. Classically going back decades, there are two ways to approach this kind of problem. One is reinforcement learning, and the other one is optimal control. In many situations, the right thing to do is more akin to optimal control than to reinforcement learning. It’s much more efficient in many ways.

RR: Was deep learning used at Bell Labs?

Yann LeCun: It wasn’t called deep learning at the time. It was called neural nets, or, multi-layer neural nets. We changed the name in the mid-2000s to reflect the fact that the systems used were not all neural nets. They were slightly more general than that. But, yes, John was paralyzed in late 1988. 

The first transistor, a point-contact germanium device, invented at Bell Laboratories in 1947.

One of the first things I did was develop convolutional nets and apply them to character recognition. And AT&T eventually built a bunch of systems based on this technology that was commercialized for reading checks. 

RR: You would say deep learning was around in the 80s or as early as the 70s?

Yann LeCun: The late 80s really was when things started. The back-propagation algorithm that is universally used for training neural nets popped up around 1986, 1987, and that’s really what enabled multi-air neural nets. Commercial nets popped up around 1988, and that’s what enabled us to do computer vision. At the time it was [limited to] black and white images;  very few applications were practical because of lack of data and compute power. 

Character recognition was certainly a success. Those were deployed commercially by AT&T around 1994, and continued on until the early 2000s. 

RR: Are you trying to make Facebook AI more like Bell Labs in terms of your research or?

Yann LeCun: It is very much like their lives in many ways. When I started Facebook gear research, that was what I was asked to do. I drew my experience from Bell Labs and AT&T labs. I also worked briefly at NEC labs. I worked at a couple of other companies. I had many friends in various industrial research labs. I knew about how research and industry can be successful. I picked some of the best ideas from Bell Labs, Xerox Park, IBM Microsoft Research, etc. 

Try to create a research organization that is ambitious in this goal [and thus] perform long-term research, where people are not directed top-down. The research done at Facebook research is released bottom-up. The researchers picked their topics but at the same time established channels with the development group so that whatever innovation comes out can have an impact on the product. That’s been very successful actually.

RR: Looking at general adversarial networks, what’s your feeling on that?

Yann LeCun: I think it opened the eyes of a lot of people, including me on new ways of building neural network learning systems that basically capture the structure of data.

RR: And how do they differ from a Bayesian neural net?

Yann LeCun: They’re not Bayesian at all. They’re not. They’re not even probabilistic, actually. They claim they are, but they’re not. So, what a GAN does is that it can turn a bunch of random numbers into a structured object [such as] an image. 

A simple Bayesian network with conditional probability tables

You train a neural net so that when you draw a random set of numbers, say 100 random numbers from, say, a Gaussian distribution, you run it through this neural net, and at the other end comes out an image of a face or a dog, for example. When you change those random numbers slightly, what you get is a slightly different face or a slightly different dog. And that’s the idea of a generative model. 

Art created by a GAN.

Some of them are probabilistic, and some of them are not.  GANs are not really probabilistic in many ways.  They don’t model the density of the probability density of the output, but they are a really interesting set of methods, and I was really excited about them for a while.  I’m less excited about them now. 

I’m trying to replace them with something more efficient because they have some flaws, which can be a little technical, but I’m trying to find a somewhat general method to train machines to learn the structure of data without being trained to do a particular task. Our particular way of doing things.

RR: Do you think that training a single model on multiple problems at once is one neural network that can do both face and speech recognition? Would that be more efficient? What do you think about the future? 

Yann LeCun: There is this big question; How do you make learning machines more general than they are? You train a system to do image recognition on a particular data set, and you show images that are slightly different from a slightly different context, and those systems are somewhat brittle. Sometimes they focus on biases in the data that really are irrelevant. 


For example, you train a system on the standard data set that everybody uses called ImageNet. One of the categories is a cow, and all the cow pictures are cows in the field. Now you show the system a picture of a cow on the beach, and the system doesn’t say it’s a cow because every single cow it’s ever seen has been on the green background.  So, it’s using the context to make the recognition. 

© frank wouters

In this case, the context is different. To some extent, it has not completely figured out what the concept of the cow is. So, that’s the question: how do you make those systems less specialized and less brittle? It is called multitask learning, which is what you’re describing. You train the system not just on a single task but on lots of different tasks or on some tasks that are so general it may not be the task you’re interested in at the end, but it’s much more general.  

Let me give you an example.  A few years ago, the best computer vision system that Facebook was using was trained.  It was a big commercial net that was trained to predict the hashtags that people type on Instagram when they upload a photo.  The engineers selected something like 17,000 different hashtags and then trained some giant neural net to predict which of those 17,000 hashtags would be present in any particular picture. It’s not a particularly interesting task, but that neural net was very well trained to be able to recognize just about anything afterward.  

What you do afterward is you chop off the last layer, and you stick a new last layer which you train for the tasks you want. Because the bulk of the neural net has learned how to recognize images, now you don’t need many samples to recognize a cow, a track, or a table. That’s this idea of weekly supervised learning or transfer learning.  You train on a very general task, and then you specialize the system on a new task. You can multitask, and you can do the transfer.

But ultimately, what people are working on now in computer vision — and they’ve been working on this for a while in natural language processing — is self-supervised learning. You train the system not to recognize anything but to represent the data in some way in some good way, and then you use this pre-trained system as input to a system that does the tasks that you want.  That recognizes objects, or in the case of text, translations. That has been astonishingly successful in natural language processing, and so far, not so much in vision, but it’s making very fast progress.  

RR: What are you most excited about in the world of AI right now? 

Yann LeCun: This whole idea of self-supervised learning, really. The big question we need to solve is whether you observe someone like a young child, learning to manipulate objects or a young child learning new concepts.  You show a young child a few pictures of an elephant. It doesn’t need to be a photo; it can be a drawing.  This young child will know what an elephant is, regardless of pose and everything.  You are 17 or 16, you’re going to drive a car, and in about 20 hours of training, you can learn to drive a car — maybe not perfectly, but, a pretty good job. And almost no one has [explicitly] told you: here is how you drive a car. 

You basically learn more or less by yourself.  Now, if you were to take today’s machine learning systems, supervised or reinforcement learning, to manipulate to recognize objects or to drive a car, it would take millions of trials, millions or billions of examples, and millions of hours of practice causing many accidents.  In the case of the self-driving car, this is one of the main obstacles to completely autonomous driving. 

RR: And so, I guess you’d agree that Elon Musk’s Tesla lead is quite severe, then? 

Yann LeCun: Elon Musk & Tesla are not very much ahead of everyone else. 

RR: Do you think it’s two years ahead, a year ahead?  Would you put it at a number? 

Yann LeCun: Nobody is much ahead by more than a few months than anybody, in terms of concept, algorithms, et cetera. Some people invest in hardware for a long time. If the hardware turns out to be really important, it’s very hard for other companies to catch up, but eventually, they do. 

Tesla Autopilot is classified as an SAE Level 2 system. Source: Ian Maddox

In AI research, development is a different thing. Nobody is ahead of anybody by much.

RR: Many of the biggest fund managers own Tesla stock because they believe that they’ll have five to ten years of a monopoly on driverless technology, and that’s why they think the value is justifying a higher than average P/E ratio.

Yann LeCun: That’s probably wrong.  

RR: Please elaborate!  

Yann LeCun: What they may have five or ten years in advance is battery technology because they have the big factories that nobody else is building, and it takes many years to build them.  But [in terms of] AI, I doubt it.  I think a lot of other companies have similar technology.  

They may not have enough of the same amount of data. They may not have the same kind of specialized hardware, but they can buy from Nvidia and arm now.  

RR: What is your feeling on the future of robotics?

Yann LeCun:  There are many issues in robotics.  I think one of them is this issue of supervision I was telling you about. Self-driving cars are some sort of robot, but what we’d like are  virtual assistants you can talk to and can answer any question you want or household robots that can assist you in your daily lives.  For this, we need to make the next big leap in AI that, in my opinion, will come from finding ways to get machines to learn as efficiently as humans and animals.  

As I was saying, it takes too many samples and too many trials for current paradigms of machine learning to learn what humans and animals learn in a few hours. What is it that we’re missing? We’re missing a big piece; in my opinion, it’s called supervised learning, but we’ll see. 

RR: This was an amazing conversation, Professor. You have a fantastic wealth of knowledge.

Yann LeCun: Thank you, you too. You’re most welcome. It was fun.

Yann LeCun On Reinforcement Learning, Deep Learning & Self Driving

Edited by Avhan Misra

Artificial Intelligence & Machine Learning

Yann LeCun On Reinforcement Learning, Deep Learning & Self Driving