Print Email Facebook Twitter Improving a Reinforcement Learning Negotiating Agent’s Performance by Extracting Information from the Opponent’s Sequence of Offers Title Improving a Reinforcement Learning Negotiating Agent’s Performance by Extracting Information from the Opponent’s Sequence of Offers Author Agrawal, Arpit (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Renting, B.M. (mentor) Murukannaiah, P.K. (mentor) Zhang, X. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2022-06-23 Abstract With the prospects of decentralized multi-agent systems becoming more prevalent in daily life, automated negotiation agents have made their place in these collaborative settings. They are an approach to promote communication between the agents in reaching solutions that are better for all involved.Recent literature has shown great potential in using machine learning, particularly model-free deep reinforcement learning like Proximal Policy Optimization (PPO), to develop more performant automated negotiation strategies. This work focuses on using information from the opponent's sequence of offers in a bilateral negotiation to further improve a baseline PPO agent. This involves extracting and representing information from the opponent's sequence of offers into a state vector with a fixed dimension to modify the input to the agent's policy, and then comparing the utilities this modified agent achieves to the baseline PPO agent. Since there is a large variety of numerical measures to represent a sequence of offers, an ablation study is conducted to investigate the effectiveness of each.The modified agents consistently reached solutions that had higher social welfare, although the agent's own utility did not improve or diminish significantly in comparison to the base PPO agent. Subject Reinforcement LearningDeep Reinforcement LearningProximal Policy OptimizationnegotiationAutomated negotiation To reference this document use: http://resolver.tudelft.nl/uuid:924499b9-0edd-448b-a89b-989e36a6657e Bibliographical note https://github.com/brenting/negotiation_PPO The repository containing all the code this paper used. The code for this specific paper was done in the 'sequence-of-offers-single-thread' branch. Part of collection Student theses Document type bachelor thesis Rights © 2022 Arpit Agrawal Files PDF Arpit_Agrawal_Research_Pr ... _FINAL.pdf 503.3 KB Close viewer /islandora/object/uuid:924499b9-0edd-448b-a89b-989e36a6657e/datastream/OBJ/view