A. Rupam Mahmood

Rupam
Assistant Professor
Principal Investigator of RLAI and Amii
Department of Computing Science
University of Alberta
Email: armahmood@ualberta.ca
Google Scholar Profile


About Me

I develop reinforcement learning algorithms and real-time learning systems for controlling physical robots. My research objective is developing a computational and scientific understanding of general-purpose mind-like systems for robots.


Teaching

I am teaching a course on reinforcement learning with robots. In this course, graduate students learn how to develop control methods that they can evaluate in their own created worlds by understanding the fundamentals of MDPs, iterative methods, stochastic approximation methods and policy gradient methods. Then they apply their developed methods to learn to control physical robots. Enroute, they develop tools and understanding for using physical robots nearly as easily as the simulated ones.


Publications

Korenkevych, D., Mahmood, A. R., Vasan, G., Bergstra, J. (2019). Autoregressive policies for continuous control deep reinforcement learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. Companion video and source code.

Mahmood, A. R., Korenkevych, D., Vasan, G., Ma, W., Bergstra, J. (2018). Benchmarking reinforcement learning algorithms on real-world robots. In Proceedings of the 2nd Annual Conference on Robot Learning (CoRL). arXiv. Companion video and source code.

Mahmood A. R., Korenkevych, D., Komer, B. J., Bergstra, J. (2018). Setting up a reinforcement learning task with a real-world robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Companion video and source code.

Yu, H., Mahmood, A. R., Sutton, R. S. (2018). On generalized Bellman equations and temporal-difference learning. Journal of Machine Learning Research (JMLR) 19(48):1-49.

Mahmood, A. R. (2017). Incremental Off-policy Reinforcement Learning Algorithms. PhD thesis, Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8.

Yu, H., Mahmood, A. R., Sutton, R. S. (2017). On generalized Bellman equations and temporal-difference learning. In Proceedings of the 30th Canadian Conference on Artificial Intelligence (CAI), Edmonton, Canada. arXiv.

Mahmood, A. R., Yu, H., Sutton, R. S. (2017). Multi-step off-policy learning without importance sampling ratios. arXiv:1702.03006.

Sutton, R. S., Mahmood, A. R., White, M. (2016). An emphatic approach to the problem of off-policy temporal-difference learning. Journal of Machine Learning Research (JMLR) 17(73):1-29.

van Seijen, H., Mahmood, A. R., Pilarski, P. M., Machado, M. C., Sutton, R. S. (2016). True online temporal-difference learning. Journal of Machine Learning Research (JMLR) 17(1):5057-5096. Code for random MDP experiments.

Mahmood, A. R., Sutton, R. S. (2015). Off-policy learning based on weighted importance sampling with linear computational complexity. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI), Amsterdam, Netherlands. Code for on-policy and off-policy experiments.

Mahmood, A. R., Yu, H., White, M., Sutton, R. S. (2015). Emphatic temporal-difference learning. In the 2015 European Workshop on Reinforcement Learning (EWRL). arXiv:1507.01569.

van Seijen, H., Mahmood, A. R., Pilarski, P. M., Sutton, R. S. (2015). An empirical evaluation of true online TD(λ). In the 2015 European Workshop on Reinforcement Learning (EWRL). arXiv:1507.00353. Code for random MDP experiments.

Mahmood, A. R., van Hasselt, H., Sutton, R. S. (2014). Weighted importance sampling for off-policy learning with linear function approximation. Advances in Neural Information Processing Systems (NeurIPS) 27, Montreal, Canada. Pseudo-code and Code for experiments.

van Hasselt, H., Mahmood, A. R., Sutton, R. S. (2014). Off-policy TD(λ) with a true online equivalence. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), Quebec City, Canada.

Sutton, R. S., Mahmood, A. R. , Precup, D., van Hasselt, H. (2014). A new Q(λ) with interim forward view and Monte Carlo equivalence. In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China.

Mahmood, A. R., Sutton, R. S. (2013). Representation Search through Generate and Test. In Proceedings of the AAAI Workshop on Learning Rich Representations from Low-Level Sensors, Bellevue, Washington, USA.

Mahmood, A. R., Sutton, R. S., Degris, T., Pilarski, P. M. (2012). Tuning-free step-size adaptation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan. Code.

Mahmood, A. R. (2011). Structure learning of causal bayesian networks: A survey. Technical report TR11-01, Department of Computing Science, University of Alberta, Edmonton, AB, Canada T6G 2E8.

Mahmood, A. R. (2010). Automatic Step-size Adaptation in Incremental Supervised Learning. Master’s thesis, Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8.


Software

A computational framework and a benchmark task suite for developing and evaluating reinforcement learning methods with physical robots.

A platform written in Python for running model-free policy-evaluation experiments on randomly generated Markov Decision Processes.

CSP3: an RL platform for iRobot Create Robots written in C for extracting data with minimal delay streamed by the robot and running simultaneously a reinforcement learning agent using the data.



A tutorial on probabilities and expectations


@rupammahmood