深度学习学习笔记——C1W2-17——吴恩达与彼得·阿比尔的访谈

预见未来to50

于 2024-12-31 11:15:59 发布

阅读量502

点赞数 5

CC 4.0 BY-SA版权

分类专栏：机器学习、深度学习（ML/DL) 文章标签：深度学习学习笔记

本文链接：https://blog.csdn.net/hpdlzu80100/article/details/144845672

Pieter Abbeel Interview 吴恩达与彼得·阿比尔的访谈

[Andrew] So, thanks a lot, Pieter, for joining me today. I think a lot of people know you as a well-known machine learning and deep learning and robotics researcher. I'd like to have people hear a bit about your story. How did you end up doing the work that you do?

[Pieter] That's a good question and actually if you would have asked me as a 14-year-old, what I was aspiring to do, it probably would not have been this. In fact, at the time, I thought being a professional basketball player would be the right way to go. I don't think I was able to achieve it. I feel the machine learning lucked out, that the basketball thing didn't work out.

[Andrew] Yes, that didn't work out. It was a lot of fun playing basketball but it didn't work out to try to make it into a career.

[Pieter] So, what I really liked in school was physics and math. And so, from there, it seemed pretty natural to study engineering which is applying physics and math in the real world. And actually then, after my undergrad in electrical engineering, I actually wasn't so sure what to do because, literally, anything engineering seemed interesting to me. Understanding how anything works seems interesting. Trying to build anything is interesting. And in some sense, artificial intelligence won out because it seemed like it could somehow help all disciplines in some way. And also, it seemed somehow a little more at the core of everything. You think about how a machine can think, then maybe that's more the core of everything else than picking any specific discipline.

[Andrew] I've been saying AI is the new electricity, sounds like the 14-year-old version of you; had an earlier version of that even. You know, in the past few years you've done a lot of work in deep reinforcement learning. What's happening? Why is deep reinforcement learning suddenly taking off?

[Pieter] Before I worked in deep reinforcement learning, I worked a lot in reinforcement learning; actually with you and Durant at Stanford, of course. And so, we worked on autonomous helicopter flight, then later at Berkeley with some of my students who worked on getting a robot to learn to fold laundry. And kind of what characterized the work was a combination of learning that enabled things that would not be possible without learning, but also a lot of domain expertise in combination with the learning to get this to work. And it was very interesting because you needed domain expertise which was fun to acquire but, at the same time, was very time-consuming for every new application you wanted to succeed of; you needed domain expertise plus machine learning expertise. And for me it was in 2012 with the ImageNet breakthrough results from Geoff Hinton's group in Toronto, AlexNet showing that supervised learning, all of a sudden, could be done with far less engineering for the domain at hand. There was very little engineering by vision in AlexNet. It made me think we really should revisit reinforcement learning under the same kind of viewpoint and see if we can get the diversion of reinforcement learning to work and do equally interesting things as had just happened in the supervised learning.

[Andrew] It sounds like you saw earlier than most people the potential of deep reinforcement learning. So now looking in to the future, what do you see next? What are your predictions for the next several ways to come in deep reinforcement learning? S

[Pieter] o, I think what's interesting about deep reinforcement learning is that, in some sense, there is many more questions than in supervised learning. In supervised learning, it's about learning an input output mapping. In reinforcement learning there is the notion of: Where does the data even come from? So that's the exploration problem. When you have data, how do you do credit assignment? How do you understand what actions you took early on got you the reward later? And then, there is issues of safety. When you have a system autonomously collecting data, it's actually rather dangerous in most situations. Imagine a self-driving car company that says, we're just going to run deep reinforcement learning. It's pretty likely that car would get into a lot of accidents before it does anything useful.

[Andrew] You needed negative examples of that, right?

[Pieter] You do need some negative examples somehow, yes; and positive ones, hopefully. So, I think there is still a lot of challenges in deep reinforcement learning in terms