The paper pdf can be accessed by clicking on the title of the paper. For incorrect links / fixes, please create a pull request at https://github.com/rlbrew-workshop/rlbrew-workshop.github.io .
Language Reward Modulation for Pretraining Reinforcement Learning
|
Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel
|
Skill-Based Reinforcement Learning with Intrinsic Reward Matching
|
Ademi Adeniji, Amber Xie, Pieter Abbeel
|
Learn to Outperform Demonstrators via Reward and Policy Co-learning
|
Mingkang Wu, Feng Tao, Yongcan Cao
|
Adaptive Deep Q-Networks for Decision Making in Non-Stationary Environments: A Case Study with the Wisconsin Card Sorting Test
|
Dieu-Donné Fangnon, Eduardo H. Ramirez-Rangel
|
The Reward Problem: Where does the most important signal in reinforcement learning come from?
|
Kory Wallace Mathewson
|
Integrating Feedback and Noisy Preferences for Adaptable Robotic Control
|
Yuxuan Li, Srijita Das, Qinglin Liu, Matthew E. Taylor
|
Representation Learning for Cross-Embodiment Inverse Reinforcement Learning from Mixed-Quality Demonstrations
|
Anurag Sidharth Aribandi, Connor Mattson, Daniel S. Brown
|
Learning Action-based Representations Using Invariance
|
Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, Amy Zhang
|
Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input
|
Andi Peng, Yuying Sun, Tianmin Shu, David Abel
|
Multistep Inverse Is Not All You Need
|
Alexander Levine, Peter Stone, Amy Zhang
|
Adaptive Feedback Selection for Learning to Avoid Negative Side Effects in Autonomous Agents
|
Yashwanthi Anand, Sandhya Saisubramanian
|
REBEL: Reinforcement Learning via Regressing Relative Rewards
|
Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
|
Efficient Inverse Reinforcement Learning without Compounding Errors
|
Nicolas Espinosa Dice, Gokul Swamy, Sanjiban Choudhury, Wen Sun
|
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard
|
Jingwu Tang, Gokul Swamy, Fei Fang, Steven Wu
|
PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
|
Utsav Singh, Wesley A Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Bedi
|
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
|
Mingqi Yuan, Roger Creus Castanyer, Bo Li, Xin Jin, Glen Berseth, Wenjun Zeng
|
A Dual Approach to Imitation Learning from Observations with Suboptimal Offline Datasets
|
Harshit Sikchi, Caleb Chuck, Amy Zhang, Scott Niekum
|
SkiLD: Unsupervised Skill Discovery Guided by Local Dependencies
|
Zizhao Wang, Jiaheng Hu, Caleb Chuck, Stephen Chen, Roberto Martín-Martín, Amy Zhang, Scott Niekum, Peter Stone
|
Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment
|
Hao Sun, Mihaela van der Schaar
|
Value Implicit Pretraining does not learn Representations suitable for Reinforcement Learning
|
Harshit Sikchi, Siddhant Agarwal, Peter Stone, Amy Zhang, Scott Niekum
|
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
|
Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, W. Bradley Knox, Chelsea Finn, Scott Niekum
|
Tell my why: Training preferences-based RL with human preferences and step-level explanations
|
Jakob Karalus
|
Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning
|
Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martín-Martín
|
Aligning Agents like Large Language Models
|
Adam Jelley, Yuhan Cao, David Bignell, Sam Devlin, Tabish Rashid
|
Prioritizing safety via curriculum learning
|
Cevahir Koprulu, Thiago D. Simão, Nils Jansen, ufuk topcu
|
Agent Q: Combining Search, Self-Critique and Reinforcement Learning for Autonomous Web Agents
|
Pranav Putta, Edmund Mills, Naman Garg, Chelsea Finn, Divyansh Garg, Rafael Rafailov
|
External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling
|
Rishav Bhagat, Jonathan C Balloch, Zhiyu Lin, Mark Riedl
|
Understanding Preference Fine-Tuning Through the Lens of Coverage
|
Yuda Song, Gokul Swamy, Aarti Singh, Drew Bagnell, Wen Sun
|
Generalization of Temporal Logic Tasks via Future Dependent Options
|
Duo XU
|
Learning Abstract Skillsets with Empowerment Bandits
|
Andrew Levy, Alessandro G Allievi, George Konidaris
|
Proto Successor Measure: Representing the space of all possible solutions of Reinforcement Learning
|
Siddhant Agarwal, Harshit Sikchi, Peter Stone, Amy Zhang
|
Offline Reinforcement Learning with Imputed Rewards
|
Carlo Romeo, Andrew D. Bagdanov
|
A Reward Analysis of Reinforcement Learning from Large Language Model Feedback
|
Muhan Lin, Shuyang Shi, Yue Guo, Behdad Chalaki, Vaishnav Tadiparthi, Simon Stepputtis, Joseph Campbell, Katia P. Sycara
|
OCALM: Object-Centric Assessment with Language Models
|
Timo Kaufmann, Jannis Blüml, Antonia Wüst, Quentin Delfosse, Kristian Kersting, Eyke Hüllermeier
|
Towards Principled Representation Learning from Videos for Reinforcement Learning
|
Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford
|
Dynamics Generalisation with Behaviour Foundation Models
|
Scott Jeen, Jonathan Cullen
|
Task-Oriented Slot-Based Cumulant Discovery in General Value Functions
|
Vincent Michalski, Somjit Nath, Derek Nowrouzezahrai, Doina Precup, Samira Ebrahimi Kahou
|