Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Multi-agent reinforcement learning programs based on Game theory

License

Notifications You must be signed in to change notification settings

ismorphism/DeepGame

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

15 Commits

Repository files navigation

MIT license

Status: Not really active (not a lot of time to develop and support this repo, breaking changes may occur). The program is checked in the case of n_agents=2

DeepGame

Here you can see implementations of the following single- and multi-agent learning algorithms:

  1. Vanilla Q-learning algorithm (Every agent behaves in optimal Q-value way)
  2. Nash Q-learning (Agents try to achieve Nash-equilibrium)

Prerequisites:

  • Python>=3.6
  • Numpy>=1.14.1
  • Nashpy>=0.0.17
  • Matplotlib>=2.2.2

Also it is based on https://github.com/ml3705454/mapr2 repo and you have to install mapr2 module:

  1. Clone rllrb
cd <installation_path_of_your_choice>
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
sudo pip3 install -e .
  1. Intsall other dependencies
sudo pip3 install joblib,path.py,gtimer,theano,keras,tensorflow,gym, tensorflow_probability
  1. Intsall maci
cd maci
sudo pip3 install -e .

Instructions for running the program

Run the file run_grid_game.py with the following command and default parameters:

python run_grid_game.py

If your want to change the size of the grid world, number of iterations or some hyperparametes you should look through these arguments of run_grid_game.py:

  • --grid_size, the size of the grid world. Default one is 3,
  • --gamma, the value of the Gamma in Bellman equation. Default one is 0.95,
  • --epsilon, the size of the epsilon. Default one is 0.5,
  • --iterations, the number of steps in the grid World. Default one is 1000,
  • --learning_rate, the learning rate (alpha) value for Q-learning method. Default one is 0.9
  • --method, the method to choose (Q-learning, Nash-Q). Default one is 'Q'

AltStyle によって変換されたページ (->オリジナル) /