Modeling Others Using Oneself in Multi-agent Reinforcement Learning
Overview
I worked on this project during my mobility semester at Comenius University in Bratislava, where I replicated and extended a self–other modeling approach in reinforcement learning inspired by theories from cognitive psychology, particularly simulation theory and mirror neurons. The aim was to enable an agent to infer another agent's hidden goals by using its own neural network to simulate the other's behavior.
Technical Approach
To achieve this, I modified a standard reinforcement learning architecture written in PyTorch to implement a custom backpropagation mechanism, where the loss was not used to update the network's weights but instead backpropagated into the model input, specifically the agent's estimate of the other agent's goal. This allowed the agent to iteratively improve its predictions by comparing simulated actions with the other agent's actual behavior.
The key innovation was treating the goal estimation as a differentiable parameter that could be optimized through gradient descent, effectively creating an inverse reinforcement learning system that worked in real-time during agent interactions.

Evaluation & Results
The approach was evaluated in a multi-agent grid-world coin game, where two agents move in an 8×8 environment containing coins of different colors. Each agent is assigned a hidden goal specifying which color of coins it should collect, and agents take turns moving around the grid to pick up coins. Rewards depend on collecting both one's own target coins and the opponent's coins, while collecting unrelated coins is penalized, making it important to quickly infer the other agent's objective.
The model was able to correctly infer the other agent's goals with around 65% accuracy, demonstrating that ideas from cognitive psychology can be effectively translated into reinforcement learning systems that exhibit theory-of-mind–like behavior.