Improving a Reinforcement Learning Negotiating Agent’s Performance by Extracting Information from the Opponent’s Sequence of Offers

Agrawal, Arpit

Improving a Reinforcement Learning Negotiating Agent’s Performance by Extracting Information from the Opponent’s Sequence of Offers

Title

Improving a Reinforcement Learning Negotiating Agent’s Performance by Extracting Information from the Opponent’s Sequence of Offers

Author

Agrawal, Arpit (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Renting, B.M. (mentor)
Murukannaiah, P.K. (mentor)
Zhang, X. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Computer Science and Engineering

Project

CSE3000 Research Project

Date

2022-06-23

Abstract

With the prospects of decentralized multi-agent systems becoming more prevalent in daily life, automated negotiation agents have made their place in these collaborative settings. They are an approach to promote communication between the agents in reaching solutions that are better for all involved.

Recent literature has shown great potential in using machine learning, particularly model-free deep reinforcement learning like Proximal Policy Optimization (PPO), to develop more performant automated negotiation strategies. This work focuses on using information from the opponent's sequence of offers in a bilateral negotiation to further improve a baseline PPO agent. This involves extracting and representing information from the opponent's sequence of offers into a state vector with a fixed dimension to modify the input to the agent's policy, and then comparing the utilities this modified agent achieves to the baseline PPO agent. Since there is a large variety of numerical measures to represent a sequence of offers, an ablation study is conducted to investigate the effectiveness of each.

The modified agents consistently reached solutions that had higher social welfare, although the agent's own utility did not improve or diminish significantly in comparison to the base PPO agent.

Subject

Reinforcement Learning
Deep Reinforcement Learning
Proximal Policy Optimization
negotiation
Automated negotiation

To reference this document use:

http://resolver.tudelft.nl/uuid:924499b9-0edd-448b-a89b-989e36a6657e

Bibliographical note

https://github.com/brenting/negotiation_PPO The repository containing all the code this paper used. The code for this specific paper was done in the 'sequence-of-offers-single-thread' branch.

Part of collection

Student theses

Document type

bachelor thesis

Rights

Files

PDF

Arpit_Agrawal_Research_Pr ... _FINAL.pdf

503.3 KB

Close viewer