Elsayed, M., Vasan, G., & Mahmood, A. R. (2024). Streaming deep reinforcement learning finally works. arXiv preprint  arXiv:2410.14606.      
      
    HQP: Mohamed Elsayed, Gautham Vasan
      
Vasan, G., Elsayed, M., Azimi, S. A., He, J., Shahriar, F., Bellinger, C., White, M., & Mahmood, A. R. (2024). Deep policy gradient methods without batch updates, target networks, or replay buffers. In The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS).
      
    HQP: Gautham Vasan, Mohamed Elsayed, Seyed Alireza Azimi, Fahim Shahriar
      
Dohare, S., Hernandez-Garcia, J. F., Lan, Q., Rahman, P., Mahmood, A. R. & Sutton, R. S. (2024). Loss of plasticity in deep continual learning. Nature 632 (8026), 768–774.
      
    HQP: Shibhansh Dohare, Qingfeng Lan
      
Vasan, G., Wang, Y., Shahriar, F., Bergstra, J., Jagersand, M., & Mahmood, A. R.  (2024). Revisiting sparse rewards for goal-reaching reinforcement learning. In Proceedings of Reinforcement Learning Conference (RLC).
      
    HQP: Gautham Vasan, Yan Wang, Fahim Shahriar
      
Elsayed, M., Lan, Q., Lyle, C., & Mahmood, A. R. (2024). Weight clipping for deep continual and reinforcement learning. In Proceedings of Reinforcement Learning Conference (RLC).
      
    HQP: Mohamed Elsayed, Qingfeng Lan
      
Lan, Q., Mahmood, A. R., Yan, S., & Xu, Z. (2024). Learning to optimize for reinforcement learning. In Proceedings of Reinforcement Learning Conference (RLC).
      
    HQP: Qingfeng Lan
      
Ishfaq, H., Tan, Y., Yang, Y., Lan, Q., Lu, J., Mahmood, A.R., Precup, D. & Xu, P. (2024). More efficient randomized exploration for reinforcement learning via approximate sampling. In Proceedings of Reinforcement Learning Conference (RLC).
      
    HQP: Qingfeng Lan
      
Che, F., Xiao, C., Mei, J., Dai, B., Gummadi, R., Ramirez, O. A., Harris, C. K., Mahmood, A. R., & Schuurmans, D. (2024).
      Target networks and over-parameterization stabilize off-policy bootstrapping with function approximation.
      In Proceedings of the 41st International Conference on Machine Learning (ICML) spotlight.
      
    HQP: Fengdi Che
      
Elsayed, M., Farrahi, H., Dangel, F., & Mahmood, A. R. (2024).
      Revisiting scalable Hessian diagonal approximations for applications in reinforcement learning.
      In Proceedings of the 41st International Conference on Machine Learning (ICML).
      
    HQP: Mohamed Elsayed, Homayoon Farrahi
      
Elsayed, M., & Mahmood, A. R. (2024).
      Addressing loss of plasticity and catastrophic forgetting in continual learning.
      In Proceedings of the 12th International Conference on Learning Representations (ICLR).
      
    HQP: Mohamed Elsayed
      
Ishfaq, H.⋆, Lan, Q.⋆, Xu, P., Mahmood, A. R., Precup, D., Anandkumar, A., & Azizzadenesheli, K. (2024).
      Provable and practical: efficient exploration in reinforcement learning via Langevin Monte Carlo.
      In Proceedings of the 12th International Conference on Learning Representations (ICLR).
      
    HQP: Qingfeng Lan
      
Grooten, B., Tomilin, T., Vasan, G., Taylor, M.E., Mahmood, A.R., Fang, M., Pechenizkiy, M. & Mocanu, D.C. (2024).
      MaDi: learning to mask distractions for generalization in visual deep reinforcement learning.
      In Proceedings of the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS).
       
    HQP: Gautham Vasan
      
Che, F., Vasan, G., & Mahmood, A. R. (2023).
      Correcting discount-factor mismatch in on-policy policy gradient methods.
      In Proceedings of the 40th International Conference on Machine Learning (ICML).
       
    HQP: Fengdi Che, Gautham Vasan
      
He, J., Che, F., Wan, Y., & Mahmood, A. R. (2023).
      Loosely consistent emphatic temporal-difference learning.
      In Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence (UAI).
       
    HQP: Jiamin He, Fengdi Che
      
Karimi, A., Jin, J., Luo, J., Mahmood, A. R., Jagersand, M., & Tosatto, S. (2023).
      Variable-decision frequency option critic.
      In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
       
    HQP: Amirmohammad Karimi, Samuele Tosatto
      
Lan, Q., Pan, Y., Luo, J., & Mahmood, A. R. (2023).
      Memory-efficient reinforcement learning with value-based knowledge consolidation.
      Transactions on Machine Learning Research (TMLR).
       
    HQP: Qingfeng Lan 
      
Farrahi, H., & Mahmood, A. R. (2023).
      Reducing the cost of cycle-time tuning for real-World policy optimization.
      In Proceedings of 2023 International Joint Conference on Neural Networks (IJCNN).
       
    HQP: Homayoon Farrahi
      
Wang, Y.⋆, Vasan, G.⋆, & Mahmood, A. R. (2023).
      Real-time reinforcement learning for vision-based robotics utilizing local and remote computers.
      In Proceedings of the 2023 International Conference on Robotics and Automation (ICRA).
       
    HQP: Yan Wang, Gautham Vasan
      
Chan, A., Silva, H., Lim, S., Kozuno, T., Mahmood, A. R., & White, M. (2022). Greedification operators for policy optimization: Investigating forward and reverse kl divergences. Journal of Machine Learning Research (JMLR).
Tosatto, S., Patterson, A., White, M., & Mahmood, A. R. (2022).
      A temporal-difference approach to policy gradient estimation.
      In Proceedings of the 39th International Conference on Machine Learning (ICML).
      
     HQP: Samuele Tosatto
      
Yuan, Y., & Mahmood, A. R. (2022).
      Asynchronous reinforcement learning for real-time control of physical robots.
      In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA).
      
     HQP: Yufeng Yuan
      
Lan, Q., Tosatto, S., Farrahi, H., & Mahmood, A. R. (2022).
      Model-free policy learning with reward gradients.
      In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS).
      
     HQP: Qingfeng Lan, Samuele Tosatto, Homayoon Farrahi
      
Garg, S., Tosatto, S., Pan, Y., White, M., & Mahmood, A. R. (2022).
      An alternate policy gradient estimator for softmax policies.
      In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS). (Winner of the Best CAIAC MSc thesis award)
      
     HQP: Shivam Garg, Samuele Tosatto
      
Przystupa, M., Dehghan, M., Jagersand, M., & Mahmood, A. R. (2021). Analyzing neural Jacobian methods in applications of visual servoing and kinematic control. In Proceedings of the 2021 International Conference on Robotics and Automation (ICRA).
Limoyo, O., Chan, B., Marić, F., Wagstaff, B., Mahmood, A. R., & Kelly, J. (2020). Heteroscedastic uncertainty for robust generative latent dynamics. In IEEE Robotics and Automation Letters (RA-L) 5(4), 6654-6661.
Korenkevych, D., Mahmood, A. R., Vasan, G., & Bergstra, J. (2019). Autoregressive policies for continuous control deep reinforcement learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). Companion video and source code.
Mahmood, A. R., Korenkevych, D., Vasan, G., Ma, W., & Bergstra, J. (2018). Benchmarking reinforcement learning algorithms on real-world robots. In Proceedings of the 2nd Annual Conference on Robot Learning (CoRL). arXiv. Companion video and source code.
Mahmood A. R., Korenkevych, D., Komer, B. J., & Bergstra, J. (2018). Setting up a reinforcement learning task with a real-world robot. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Companion video and source code.
Yu, H., Mahmood, A. R., & Sutton, R. S. (2018). On generalized Bellman equations and temporal-difference learning. Journal of Machine Learning Research (JMLR) 19(48):1-49.
Mahmood, A. R. (2017). Incremental Off-policy Reinforcement Learning Algorithms. PhD thesis, Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8.
Yu, H., Mahmood, A. R., & Sutton, R. S. (2017). On generalized Bellman equations and temporal-difference learning. In Proceedings of the 30th Canadian Conference on Artificial Intelligence (CAI), Edmonton, Canada. arXiv.
Mahmood, A. R., Yu, H., & Sutton, R. S. (2017). Multi-step off-policy learning without importance sampling ratios. arXiv:1702.03006.
Sutton, R. S., Mahmood, A. R., & White, M. (2016). An emphatic approach to the problem of off-policy temporal-difference learning. Journal of Machine Learning Research (JMLR) 17(73):1-29.
van Seijen, H., Mahmood, A. R., Pilarski, P. M., Machado, M. C., & Sutton, R. S. (2016). True online temporal-difference learning. Journal of Machine Learning Research (JMLR) 17(1):5057-5096. Code for random MDP experiments.
Mahmood, A. R., & Sutton, R. S. (2015). Off-policy learning based on weighted importance sampling with linear computational complexity. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI), Amsterdam, Netherlands. Code for on-policy and off-policy experiments.
Mahmood, A. R., Yu, H., White, M., & Sutton, R. S. (2015). Emphatic temporal-difference learning. In the 2015 European Workshop on Reinforcement Learning (EWRL). arXiv:1507.01569.
van Seijen, H., Mahmood, A. R., Pilarski, P. M., & Sutton, R. S. (2015). An empirical evaluation of true online TD(λ). In the 2015 European Workshop on Reinforcement Learning (EWRL). arXiv:1507.00353. Code for random MDP experiments.
Mahmood, A. R., van Hasselt, H., & Sutton, R. S. (2014). Weighted importance sampling for off-policy learning with linear function approximation. Advances in Neural Information Processing Systems (NeurIPS) 27, Montreal, Canada. Pseudo-code and Code for experiments.
van Hasselt, H., Mahmood, A. R., & Sutton, R. S. (2014). Off-policy TD(λ) with a true online equivalence. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), Quebec City, Canada.
Sutton, R. S., Mahmood, A. R. , Precup, D., & van Hasselt, H. (2014). A new Q(λ) with interim forward view and Monte Carlo equivalence. In Proceedings of the 31st International Conference on Machine Learning (ICML), Beijing, China.
Mahmood, A. R., & Sutton, R. S. (2013). Representation Search through Generate and Test. In Proceedings of the AAAI Workshop on Learning Rich Representations from Low-Level Sensors, Bellevue, Washington, USA.
Mahmood, A. R., Sutton, R. S., Degris, T., & Pilarski, P. M. (2012). Tuning-free step-size adaptation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan. Code.
Mahmood, A. R. (2011). Structure learning of causal bayesian networks: A survey. Technical report TR11-01, Department of Computing Science, University of Alberta, Edmonton, AB, Canada T6G 2E8.
Mahmood, A. R. (2010). Automatic Step-size Adaptation in Incremental Supervised Learning. Master’s thesis, Department of Computing Science, University of Alberta, Edmonton, AB T6G 2E8.