A. Rupam Mahmood

Rupam
Assistant Professor
Canada CIFAR AI Chair
Director of RLAI; Fellow at Amii
Department of Computing Science
University of Alberta
Email: armahmood@ualberta.ca
Google Scholar Profile


Research

I develop reinforcement learning algorithms and real-time learning systems for controlling physical robots. My research objective is developing general and constructive mechanisms for continually improving robot minds. Currently, I am working on two long-term programs consisting of several short-term projects.

A Simple and General Reinforcement Learning System for Robot Control

In this program, we develop an RL system that can be easily deployed in many different robots for solving various tasks. Our system compliments the current robot learning systems based on learning from simulations or human demonstrations by solely training with real-time interactions under self-propelled behavior. Our system is general in the sense that the same system is expected to learn to control various different robots. On the other hand, we develop the system to be simple and accessible to enable community-wide scientific understanding and large-scale industrial adoption.

Core Constructive Mechanisms for Continually Learning Agents

In this program, we develop and analyze algorithms for learning policies and representations in a continual learning setup, where the agent is expected to go through a series of changes in the environment and tasks. Therefore, the agent has a vested interest in not only learning the specific task of the moment but also acquiring and accumulating knowledge and skills that are generally pertinent to its environment and likely applicable to an unseen future task in the same environment. We analyze the shortcomings of current policy and representation learning methods based on deep learning in a continual learning setting and address them by developing new methods for self-propelled behavior, off-policy policy gradient updates, and curation of useful state features.


Publications

Korenkevych, D., Mahmood, A. R., Vasan, G., Bergstra, J. (2019). Autoregressive policies for continuous control deep reinforcement learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. Companion video and source code.

Mahmood, A. R., Korenkevych, D., Vasan, G., Ma, W., Bergstra, J. (2018). Benchmarking reinforcement learning algorithms on real-world robots. In Proceedings of the 2nd Annual Conference on Robot Learning (CoRL). arXiv. Companion video and source code.

Mahmood A. R., Korenkevych, D., Komer, B. J., Bergstra, J. (2018). Setting up a reinforcement learning task with a real-world robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Companion video and source code.

Yu, H., Mahmood, A. R., Sutton, R. S. (2018). On generalized Bellman equations and temporal-difference learning. Journal of Machine Learning Research (JMLR) 19(48):1-49.

Mahmood, A. R. (2017). Incremental Off-policy Reinforcement Learning Algorithms. PhD thesis, Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8.

Yu, H., Mahmood, A. R., Sutton, R. S. (2017). On generalized Bellman equations and temporal-difference learning. In Proceedings of the 30th Canadian Conference on Artificial Intelligence (CAI), Edmonton, Canada. arXiv.

Mahmood, A. R., Yu, H., Sutton, R. S. (2017). Multi-step off-policy learning without importance sampling ratios. arXiv:1702.03006.

Sutton, R. S., Mahmood, A. R., White, M. (2016). An emphatic approach to the problem of off-policy temporal-difference learning. Journal of Machine Learning Research (JMLR) 17(73):1-29.

van Seijen, H., Mahmood, A. R., Pilarski, P. M., Machado, M. C., Sutton, R. S. (2016). True online temporal-difference learning. Journal of Machine Learning Research (JMLR) 17(1):5057-5096. Code for random MDP experiments.

Mahmood, A. R., Sutton, R. S. (2015). Off-policy learning based on weighted importance sampling with linear computational complexity. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI), Amsterdam, Netherlands. Code for on-policy and off-policy experiments.

Mahmood, A. R., Yu, H., White, M., Sutton, R. S. (2015). Emphatic temporal-difference learning. In the 2015 European Workshop on Reinforcement Learning (EWRL). arXiv:1507.01569.

van Seijen, H., Mahmood, A. R., Pilarski, P. M., Sutton, R. S. (2015). An empirical evaluation of true online TD(λ). In the 2015 European Workshop on Reinforcement Learning (EWRL). arXiv:1507.00353. Code for random MDP experiments.

Mahmood, A. R., van Hasselt, H., Sutton, R. S. (2014). Weighted importance sampling for off-policy learning with linear function approximation. Advances in Neural Information Processing Systems (NeurIPS) 27, Montreal, Canada. Pseudo-code and Code for experiments.

van Hasselt, H., Mahmood, A. R., Sutton, R. S. (2014). Off-policy TD(λ) with a true online equivalence. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), Quebec City, Canada.

Sutton, R. S., Mahmood, A. R. , Precup, D., van Hasselt, H. (2014). A new Q(λ) with interim forward view and Monte Carlo equivalence. In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China.

Mahmood, A. R., Sutton, R. S. (2013). Representation Search through Generate and Test. In Proceedings of the AAAI Workshop on Learning Rich Representations from Low-Level Sensors, Bellevue, Washington, USA.

Mahmood, A. R., Sutton, R. S., Degris, T., Pilarski, P. M. (2012). Tuning-free step-size adaptation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan. Code.

Mahmood, A. R. (2011). Structure learning of causal bayesian networks: A survey. Technical report TR11-01, Department of Computing Science, University of Alberta, Edmonton, AB, Canada T6G 2E8.

Mahmood, A. R. (2010). Automatic Step-size Adaptation in Incremental Supervised Learning. Master’s thesis, Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8.


Software

A computational framework and a benchmark task suite for developing and evaluating reinforcement learning methods with physical robots.

A platform written in Python for running model-free policy-evaluation experiments on randomly generated Markov Decision Processes.

CSP3: an RL platform for iRobot Create Robots written in C for extracting data with minimal delay streamed by the robot and running simultaneously a reinforcement learning agent using the data.



Family

web analytics