Google Scholar

HQP stands for supervised Highly Qualified Personnel

Pre-print

Dohare, S., Hernandez-Garcia, J. F., Rahman, P., Sutton, R. S., & Mahmood, A. R. (2023). Loss of plasticity in deep continual learning. arXiv preprint arXiv:2306.13812.
    HQP: Shibhansh Dohare

Publications

Elsayed, M., & Mahmood, A. R. (2024). Addressing loss of plasticity and catastrophic forgetting in continual learning. In Proceedings of the 12th International Conference on Learning Representations (ICLR).
    HQP: Mohamed Elsayed

Ishfaq, H.⋆, Lan, Q.⋆, Xu, P., Mahmood, A. R., Precup, D., Anandkumar, A., & Azizzadenesheli, K. (2024). Provable and practical: efficient exploration in reinforcement learning via Langevin Monte Carlo. In Proceedings of the 12th International Conference on Learning Representations (ICLR).
    HQP: Qingfeng Lan

Grooten, B., Tomilin, T., Vasan, G., Taylor, M.E., Mahmood, A.R., Fang, M., Pechenizkiy, M. & Mocanu, D.C. (2024). MaDi: learning to mask distractions for generalization in visual deep reinforcement learning. In Proceedings of the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS).
    HQP: Gautham Vasan

Che, F., Vasan, G., & Mahmood, A. R. (2023). Correcting discount-factor mismatch in on-policy policy gradient methods. In Proceedings of the 40th International Conference on Machine Learning (ICML).
    HQP: Fengdi Che, Gautham Vasan

He, J., Che, F., Wan, Y., & Mahmood, A. R. (2023). Loosely consistent emphatic temporal-difference learning. In Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI).
    HQP: Jiamin He, Fengdi Che

Karimi, A., Jin, J., Luo, J., Mahmood, A. R., Jagersand, M., & Tosatto, S. (2023). Variable-decision frequency option critic. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
    HQP: Amirmohammad Karimi, Samuele Tosatto

Lan, Q., Pan, Y., Luo, J., & Mahmood, A. R. (2023). Memory-efficient reinforcement learning with value-based knowledge consolidation. Transactions on Machine Learning Research (TMLR).
    HQP: Qingfeng Lan

Farrahi, H., & Mahmood, A. R. (2023). Reducing the cost of cycle-time tuning for real-World policy optimization. In Proceedings of 2023 International Joint Conference on Neural Networks (IJCNN).
    HQP: Homayoon Farrahi

Wang, Y.⋆, Vasan, G.⋆, & Mahmood, A. R. (2023). Real-time reinforcement learning for vision-based robotics utilizing local and remote computers. In Proceedings of the 2023 International Conference on Robotics and Automation (ICRA).
    HQP: Yan Wang, Gautham Vasan

Chan, A., Silva, H., Lim, S., Kozuno, T., Mahmood, A. R., & White, M. (2022). Greedification operators for policy optimization: Investigating forward and reverse kl divergences. Journal of Machine Learning Research (JMLR).

Tosatto, S., Patterson, A., White, M., & Mahmood, A. R. (2022). A temporal-difference approach to policy gradient estimation. In Proceedings of the 39th International Conference on Machine Learning (ICML).
    HQP: Samuele Tosatto

Yuan, Y., & Mahmood, A. R. (2022). Asynchronous reinforcement learning for real-time control of physical robots. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA).
    HQP: Yufeng Yuan

Lan, Q., Tosatto, S., Farrahi, H., & Mahmood, A. R. (2022). Model-free policy learning with reward gradients. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS).
    HQP: Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi

Garg, S., Tosatto, S., Pan, Y., White, M., & Mahmood, A. R. (2022). An alternate policy gradient estimator for softmax policies. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS).
(Winner of the Best CAIAC MSc thesis award)
    HQP: Shivam Garg, Samuele Tosatto

Przystupa, M., Dehghan, M., Jagersand, M., & Mahmood, A. R. (2021). Analyzing neural Jacobian methods in applications of visual servoing and kinematic control. In Proceedings of the 2021 International Conference on Robotics and Automation (ICRA).

Limoyo, O., Chan, B., Marić, F., Wagstaff, B., Mahmood, A. R., & Kelly, J. (2020). Heteroscedastic uncertainty for robust generative latent dynamics. In IEEE Robotics and Automation Letters (RA-L) 5(4), 6654-6661.

Korenkevych, D., Mahmood, A. R., Vasan, G., & Bergstra, J. (2019). Autoregressive policies for continuous control deep reinforcement learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). Companion video and source code.

Mahmood, A. R., Korenkevych, D., Vasan, G., Ma, W., & Bergstra, J. (2018). Benchmarking reinforcement learning algorithms on real-world robots. In Proceedings of the 2nd Annual Conference on Robot Learning (CoRL). arXiv. Companion video and source code.

Mahmood A. R., Korenkevych, D., Komer, B. J., & Bergstra, J. (2018). Setting up a reinforcement learning task with a real-world robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Companion video and source code.

Yu, H., Mahmood, A. R., & Sutton, R. S. (2018). On generalized Bellman equations and temporal-difference learning. Journal of Machine Learning Research (JMLR) 19(48):1-49.

Mahmood, A. R. (2017). Incremental Off-policy Reinforcement Learning Algorithms. PhD thesis, Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8.

Yu, H., Mahmood, A. R., & Sutton, R. S. (2017). On generalized Bellman equations and temporal-difference learning. In Proceedings of the 30th Canadian Conference on Artificial Intelligence (CAI), Edmonton, Canada. arXiv.

Mahmood, A. R., Yu, H., & Sutton, R. S. (2017). Multi-step off-policy learning without importance sampling ratios. arXiv:1702.03006.

Sutton, R. S., Mahmood, A. R., & White, M. (2016). An emphatic approach to the problem of off-policy temporal-difference learning. Journal of Machine Learning Research (JMLR) 17(73):1-29.

van Seijen, H., Mahmood, A. R., Pilarski, P. M., Machado, M. C., & Sutton, R. S. (2016). True online temporal-difference learning. Journal of Machine Learning Research (JMLR) 17(1):5057-5096. Code for random MDP experiments.

Mahmood, A. R., & Sutton, R. S. (2015). Off-policy learning based on weighted importance sampling with linear computational complexity. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI), Amsterdam, Netherlands. Code for on-policy and off-policy experiments.

Mahmood, A. R., Yu, H., White, M., & Sutton, R. S. (2015). Emphatic temporal-difference learning. In the 2015 European Workshop on Reinforcement Learning (EWRL). arXiv:1507.01569.

van Seijen, H., Mahmood, A. R., Pilarski, P. M., & Sutton, R. S. (2015). An empirical evaluation of true online TD(λ). In the 2015 European Workshop on Reinforcement Learning (EWRL). arXiv:1507.00353. Code for random MDP experiments.

Mahmood, A. R., van Hasselt, H., & Sutton, R. S. (2014). Weighted importance sampling for off-policy learning with linear function approximation. Advances in Neural Information Processing Systems (NeurIPS) 27, Montreal, Canada. Pseudo-code and Code for experiments.

van Hasselt, H., Mahmood, A. R., & Sutton, R. S. (2014). Off-policy TD(λ) with a true online equivalence. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), Quebec City, Canada.

Sutton, R. S., Mahmood, A. R. , Precup, D., & van Hasselt, H. (2014). A new Q(λ) with interim forward view and Monte Carlo equivalence. In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China.

Mahmood, A. R., & Sutton, R. S. (2013). Representation Search through Generate and Test. In Proceedings of the AAAI Workshop on Learning Rich Representations from Low-Level Sensors, Bellevue, Washington, USA.

Mahmood, A. R., Sutton, R. S., Degris, T., & Pilarski, P. M. (2012). Tuning-free step-size adaptation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan. Code.

Mahmood, A. R. (2011). Structure learning of causal bayesian networks: A survey. Technical report TR11-01, Department of Computing Science, University of Alberta, Edmonton, AB, Canada T6G 2E8.

Mahmood, A. R. (2010). Automatic Step-size Adaptation in Incremental Supervised Learning. Master’s thesis, Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8.