Developing a Roguelike Game with Reinforcement Learning using GCP | by Matt Gray | Jan, 2021


Matt Gray
Photo by Sigmund on Unsplash

The sport is a conventional roguelike sport: a turn-based dungeon crawler with RPG parts and a great amount of procedural era. The participant’s objective is to flee the ice palace, flooring by flooring, preventing monsters and collecting useful pieces alongside the way in which. While enemies and pieces that seem on every flooring are historically randomly generated, this sport lets in the RL fashion to generate those entities in response to information amassed.

The core thought of Reinforcement Learning is that an automated Agent interacts with an atmosphere thru making observations and taking movements, as depicted in Fig. 1. Through interacting with the surroundings, the Agent might obtain rewards (both sure or unfavorable) which the Agent makes use of to be told and affect long run choice making.

Fig. 1. Interaction between Agent and Environment for Reinforcement Learning (Image by Author)


The state is any statement the Agent makes in regards to the atmosphere, that may be utilized in deciding which movements to take. While there may be a wealth of various information the Agent might follow (the well being of the participant, the selection of turns required for the participant to advance a flooring, and many others…), the variables for the primary model of the sport believe most effective the ground the participant has reached and the extent of the participant’s persona.


Due to the procedurally generated nature of the sport, the Agent will make a decision to spawn monsters/pieces stochastically versus having a deterministic choice every time. Since there may be a huge part of randomness, the Agent does no longer discover/exploit within the conventional RL way, and as an alternative controls weighted chances of other enemies/pieces spawning in sport.


The praise fashion for the Reinforcement Learning set of rules is the most important for the improvement of the meant behaviours the discovered fashion will have to show, as Machine Learning strategies notoriously take shortcuts to succeed in their objective. As the meant function is to maximise enjoyment for the participant, the next assumptions had been made to quantify enjoyment with regards to rewards for the RL set of rules:

The RL set of rules employs Q-Learning, which has been changed to house stochastic movements carried out by the Agent. Modified from conventional Q-Learning [3] during which an Agent takes 1 motion between states, the Agent’s motion is up to date bearing in mind the likelihood distribution of the entire enemies/pieces that had been spawned for the ground, proven within the equation beneath.

The international AI fashion is skilled using sport information amassed by all avid gamers, and is used as the bottom RL fashion when a participant has no longer but performed a sport. A brand new participant will get a native replica of the worldwide RL fashion when first beginning, which turns into adapted to their very own play taste as they play the sport, whilst their sport information will probably be used to additional give a boost to the worldwide AI fashion for long run new avid gamers.

Fig. 2 Data Pipeline and RL Model Training Architecture (Image by Author)

The paintings offered on this article describes an utility of ways Reinforcement Learning used to be used to give a boost to the avid gamers enjoy of enjoying a sport, versus extra commonplace RL programs used to automate human movements. Game consultation information throughout all avid gamers used to be amassed using parts of the loose tier GCP structure, taking into account the introduction of a international RL fashion. While avid gamers’ start the sport with the worldwide RL fashion, their particular person reviews create a customized adapted native RL fashion to higher swimsuit their very own play types.

[1] OpenAI Gym,


Please enter your comment!
Please enter your name here