CMPUT 653 Real-Time Policy Learning

eClass Link

Schedule

Syllabus

Term:

Fall, 2021

Lecture Date and Time:

MW 11:00 a.m. - 12:20 p.m.

Lecture Location:

NRE 2-127

Instructor:

Rupam Mahmood (armahmood@ualberta.ca)

Overview

When the input-output interface of a robot is determined, can we just deploy a general-purpose system for controlling the robot without extensive hand-engineering? Agents based on policy gradient methods are a candidate for such systems. However, they behave differently on a real robot than they do in standard simulations. In this course, we learn the foundations of policy gradient methods and focus on the characterizing differences between simulated and real-time policy learning. While discussing recent papers on policy gradient methods, we will scrutinize them in light of computational frugality and compatibility with real-time updates.

Objectives

Derive policy gradient methods
Implement policy gradient methods
Summarize research in policy gradient methods
Characterize real-time policy learning

Prerequisites

This course requires knowledge in basic probability theory, linear algebra, introductory reinforcement learning as well as experience programming deep neural networks using PyTorch in Python 3.

Course Topics

Objectives, estimation, gradient bandit & function approximation
Markov decision processes, value functions & dynamic programming
Temporal difference learning
Off-policy learning
REINFORCE, online and batch actor-critic methods
Lambda returns, advantage estimation and PPO
Policy gradient methods for continuous actions

Course Work and Evaluation

Written assignment on basics of machine learning 10%
Programming assignment on gradient bandit 10%
Midterm 20%
Programming assignment on policy gradient methods 15%
Programming assignment on deep policy gradient methods 20%
Course participation (forums, wiki, slack & class) 5%
Course participation (mini-presentation) 5%
Paper summary 15%

Course Materials

Deep Policy Gradient Methods is a similar course given in Fall 2020. All course reading material will be available online. We will be using the following textbook extensively: Sutton and Barto, Reinforcement Learning: An Introduction, MIT Press. The book is available from the bookstore or online as a pdf here: http://www.incompleteideas.net/book/the-book-2nd.html

Academic Integrity

All assignments written and programming are to be done individually. No exceptions. Students must write their own answers and code. Students are permitted and encouraged to discuss assignment problems and the contents of the course. However, the discussion should always be about high-level ideas. Students should not discuss with each other (or tutors) while writing answers to written questions our programming. Absolutely no sharing of answers or code sharing with other students or tutors. All the sources used for problem solution must be acknowledged, e.g. web sites, books, research papers, personal communication with people, etc. The University of Alberta is committed to the highest standards of academic integrity and honesty. Students are expected to be familiar with these standards regarding academic honesty and to uphold the policies of the University in this respect. Students are particularly urged to familiarize themselves with the provisions of the Code of Student Behaviour and avoid any behaviour which could potentially result in suspicions of cheating, plagiarism, misrepresentation of facts and/or participation in an offence. Academic dishonesty is a serious offence and can result in suspension or expulsion from the University. (GFC 29 SEP 2003)

Fall 2021 Start of Term health and safety guide for students
Complementary masks available at the Student Services Center on the main floor of CCIS.