Reinforcement learning with robots

CMPUT 652: Reinforcement Learning with Robots (Fall 2019)

In this graduate course, students learn how to develop control methods that they can evaluate in their own created worlds by understanding the fundamentals of MDPs, iterative methods, stochastic approximation methods and policy gradient methods. Then they apply their developed methods to learn to control physical robots. Enroute, they develop tools and understanding for using physical robots nearly as easily as the simulated ones.

UR-Reacher-6 Create-Docker

[Schedule] (Contains slides and course materials)

Syllabus

Overview

In this new course, we will study how reinforcement learning (RL) algorithms can be used to learn to control physical robots in real-time. One of the main goals of RL agents is to learn to solve a given task by interacting with an unknown, unstructured environment. Due to simple and intuitive foundations and a series of breakthroughs in computer and board games, RL has become one of the most appealing branches of artificial intelligence. Recent RL methods have also shown excellent results in controlling robots in the virtual world. Now RL is viewed as a promising approach for controlling and adapting to unstructured environments with physical robots. To what extent are current methods capable of achieving this goal?

We will study the foundations of RL to be able to develop policy learning methods and learn about systematic ways of studying a real-time system to reveal the uncertainties involved in real-world tasks. This investigation will allow us to understand the differences between real-world and standard simulated tasks so that we can adapt task setups and algorithmic implementations to the real world as well as enhance the simulated tasks to incorporate the additional challenges in real-time systems. En route, we will learn about other promising approaches to learning in robotics that are not performed in real-time, such as learning from demonstration and simulation-to-reality transfer.

Background in machine learning, programming, or robotics would be essential in this course.

Objectives

Being able to derive and implement deep policy learning methods
Understanding different approaches to learning in robotics
Discovering the primary challenges of learning in real-time systems
Designing experiments for real-time policy learning with robots
Addressing the challenges of real-time learning by designing new algorithms

Prerequisites

This course requires knowledge in basic probability theory and programming ability using deep learning frameworks such as PyTorch in Python 3.

Course Materials

The book by Sutton and Barto (2018) will be used as the main textbook. Apart from that, we will rely on scientific papers, which will be provided in the course schedule.

Course Topics

Stochastic gradient descent and averaging
Bandits
Markov decision processes
Value function approximation
Policy gradient methods
Classical control approaches
Controlling Dynamixels
Challenges in real-time learning
Architectures for real-time learning tasks
Simulation-to-reality transfer
Learning from demonstration

Course work and evaluation

Class notes 5%
Readings 5%
Assignment on MDPs and value function 10%
Assignment on policy gradient methods 15%
Assignment on learning in real-time 15%
Course project 50%
- Proposal presentation 10%
- Project presentation 10%
- Project report 30%