dpgm

CMPUT 653 Deep policy gradient methods

This project is maintained by armahmood

CMPUT 653 Deep Policy Gradient Methods

Schedule

eClass

Syllabus

Term:

Fall, 2020

Lecture Date and Time:

MW 2:00 - 3:20 p.m.

Lecture Location:

Zoom link

Instructor:

Rupam Mahmood (armahmood@ualberta.ca)

Office Hours:

MW 3:30 - 4:00 p.m. Office hours can be booked in my website

Overview

When the input-output interface of a robot is determined, can we just deploy a general-purpose system for controlling the robot without extensive hand-engineering? Neural networks with parameters learned by policy gradient methods are a candidate for such systems that are already shown to learn controlling real robots from scratch.

In this course, we learn the foundations of policy gradient methods and some of the fundamental differences between standard policy gradient methods such as actor-critic and those combining well with neural networks and achieving practical success such as Proximal Policy Optimization. We discuss a number of recent papers on policy gradient methods and conclude the course with a project guided by the instructor toward developing a mini-research contribution. Throughout the course, there will be a focus on computational frugality and compatibility with real-time updates.

Objectives

Prerequisites

This course requires knowledge in basic probability theory, linear algebra, introductory reinforcement learning as well as experience programming deep neural networks using PyTorch in Python 3.

Course Topics

Course Work and Evaluation

Course Materials

All course reading material will be available online. We will be using the following textbook extensively: Sutton and Barto, Reinforcement Learning: An Introduction, MIT Press. The book is available from the bookstore or online as a pdf here: http://www.incompleteideas.net/book/the-book-2nd.html

Academic Integrity

All assignments written and programming are to be done individually. No exceptions. Students must write their own answers and code. Students are permitted and encouraged to discuss assignment problems and the contents of the course. However, the discussion should always be about high-level ideas. Students should not discuss with each other (or tutors) while writing answers to written questions our programming. Absolutely no sharing of answers or code sharing with other students or tutors. All the sources used for problem solution must be acknowledged, e.g. web sites, books, research papers, personal communication with people, etc. The University of Alberta is committed to the highest standards of academic integrity and honesty. Students are expected to be familiar with these standards regarding academic honesty and to uphold the policies of the University in this respect. Students are particularly urged to familiarize themselves with the provisions of the Code of Student Behaviour and avoid any behaviour which could potentially result in suspicions of cheating, plagiarism, misrepresentation of facts and/or participation in an offence. Academic dishonesty is a serious offence and can result in suspension or expulsion from the University. (GFC 29 SEP 2003)