Workshop on Reinforcement Learning Beyond Rewards

Reinforcement Learning Conference (RLC) 2024

August 9, 2024

@RLBRew_2024 · #RLBRew_2024


The paper pdf can be accessed by clicking on the title of the paper. For incorrect links / fixes, please create a pull request at https://github.com/rlbrew-workshop/rlbrew-workshop.github.io .

Language Reward Modulation for Pretraining Reinforcement Learning
Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel

Skill-Based Reinforcement Learning with Intrinsic Reward Matching
Ademi Adeniji, Amber Xie, Pieter Abbeel

Learn to Outperform Demonstrators via Reward and Policy Co-learning
Mingkang Wu, Feng Tao, Yongcan Cao

Adaptive Deep Q-Networks for Decision Making in Non-Stationary Environments: A Case Study with the Wisconsin Card Sorting Test
Dieu-Donné Fangnon, Eduardo H. Ramirez-Rangel

The Reward Problem: Where does the most important signal in reinforcement learning come from?
Kory Wallace Mathewson

Integrating Feedback and Noisy Preferences for Adaptable Robotic Control
Yuxuan Li, Srijita Das, Qinglin Liu, Matthew E. Taylor

Representation Learning for Cross-Embodiment Inverse Reinforcement Learning from Mixed-Quality Demonstrations
Anurag Sidharth Aribandi, Connor Mattson, Daniel S. Brown

Learning Action-based Representations Using Invariance
Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, Amy Zhang

Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input
Andi Peng, Yuying Sun, Tianmin Shu, David Abel

Multistep Inverse Is Not All You Need
Alexander Levine, Peter Stone, Amy Zhang

Adaptive Feedback Selection for Learning to Avoid Negative Side Effects in Autonomous Agents
Yashwanthi Anand, Sandhya Saisubramanian

REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

Efficient Inverse Reinforcement Learning without Compounding Errors
Nicolas Espinosa Dice, Gokul Swamy, Sanjiban Choudhury, Wen Sun

Multi-Agent Imitation Learning: Value is Easy, Regret is Hard
Jingwu Tang, Gokul Swamy, Fei Fang, Steven Wu

PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
Utsav Singh, Wesley A Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Bedi

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
Mingqi Yuan, Roger Creus Castanyer, Bo Li, Xin Jin, Glen Berseth, Wenjun Zeng

A Dual Approach to Imitation Learning from Observations with Suboptimal Offline Datasets
Harshit Sikchi, Caleb Chuck, Amy Zhang, Scott Niekum

SkiLD: Unsupervised Skill Discovery Guided by Local Dependencies
Zizhao Wang, Jiaheng Hu, Caleb Chuck, Stephen Chen, Roberto Martín-Martín, Amy Zhang, Scott Niekum, Peter Stone

Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment
Hao Sun, Mihaela van der Schaar

Value Implicit Pretraining does not learn Representations suitable for Reinforcement Learning
Harshit Sikchi, Siddhant Agarwal, Peter Stone, Amy Zhang, Scott Niekum

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, W. Bradley Knox, Chelsea Finn, Scott Niekum

Tell my why: Training preferences-based RL with human preferences and step-level explanations
Jakob Karalus

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning
Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martín-Martín

Aligning Agents like Large Language Models
Adam Jelley, Yuhan Cao, David Bignell, Sam Devlin, Tabish Rashid

Prioritizing safety via curriculum learning
Cevahir Koprulu, Thiago D. Simão, Nils Jansen, ufuk topcu

Agent Q: Combining Search, Self-Critique and Reinforcement Learning for Autonomous Web Agents
Pranav Putta, Edmund Mills, Naman Garg, Chelsea Finn, Divyansh Garg, Rafael Rafailov

External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling
Rishav Bhagat, Jonathan C Balloch, Zhiyu Lin, Mark Riedl

Understanding Preference Fine-Tuning Through the Lens of Coverage
Yuda Song, Gokul Swamy, Aarti Singh, Drew Bagnell, Wen Sun

Generalization of Temporal Logic Tasks via Future Dependent Options
Duo XU

Learning Abstract Skillsets with Empowerment Bandits
Andrew Levy, Alessandro G Allievi, George Konidaris

Proto Successor Measure: Representing the space of all possible solutions of Reinforcement Learning
Siddhant Agarwal, Harshit Sikchi, Peter Stone, Amy Zhang

Offline Reinforcement Learning with Imputed Rewards
Carlo Romeo, Andrew D. Bagdanov

A Reward Analysis of Reinforcement Learning from Large Language Model Feedback
Muhan Lin, Shuyang Shi, Yue Guo, Behdad Chalaki, Vaishnav Tadiparthi, Simon Stepputtis, Joseph Campbell, Katia P. Sycara

OCALM: Object-Centric Assessment with Language Models
Timo Kaufmann, Jannis Blüml, Antonia Wüst, Quentin Delfosse, Kristian Kersting, Eyke Hüllermeier

Towards Principled Representation Learning from Videos for Reinforcement Learning
Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford

Dynamics Generalisation with Behaviour Foundation Models
Scott Jeen, Jonathan Cullen

Task-Oriented Slot-Based Cumulant Discovery in General Value Functions
Vincent Michalski, Somjit Nath, Derek Nowrouzezahrai, Doina Precup, Samira Ebrahimi Kahou