Tips for Running High-Fidelity Deep Reinforcement Learning Experiments | by Ryan Sander | Feb, 2021


Photo by Jason Leung on Unsplash
  1. Run (a few of) Your Seeds
  2. Ablations and Baselines
  3. Visualize Everything
  4. Start Analytic, then Start Simple
  5. When in Doubt, Look to the (GitHub) Stars
Photo by Brett Jordan on Unsplash
  1. The preliminary state of your agent. This can have an effect on the transitions and rollouts the agent studies.
  2. If your coverage is stochastic, then the movements your agent chooses. This can have an effect on the transitions you pattern, or even whole rollouts!
  3. If the surroundings your agent is in may be stochastic, this will additionally have an effect on the transitions and rollouts that your agent samples.
def set_seeds(seed):
torch.manual_seed(seed) # Sets seed for PyTorch RNG
torch.cuda.manual_seed_all(seed) # Sets seeds of GPU RNG
np.random.seed(seed=seed) # Set seed for NumPy RNG
random.seed(seed) # Set seed for random RNG

When validating your RL framework, it’s crucial to check your brokers and algorithms on a couple of seeds. Some seeds will produce higher effects than others, and by operating on only a unmarried seed, you’ll want to have merely gotten fortunate/unfortunate. Particularly in RL literature, it’s common to run any place from 4–10 random seeds in an experiment.

Example plot appearing imply efficiency (cast strains) and self belief durations (colour bars) for other units of random seeds (on this case, similar to other experiments). Image supply: creator.
Photo by Anne Nygård on Unsplash
Example ablation learn about with other experiments. Image supply: creator.

Reinforcement finding out may also be tricky to debug, as a result of every now and then insects don’t merely manifest as mistakes — your set of rules might run, however the agent’s efficiency is also sub-optimal as a result of some amount isn’t being computed as it should be, a community’s weights aren’t being up to date, and so on. To debug successfully, one technique is to do what people do smartly: visualize! Some helpful visualization gear and amounts to believe visualizing come with:

Example of a few plots from Tensorboard. Image supply: Author.
You aren’t limited to just producing plots with Tensorboard! Check out the Appendix for code you’ll be able to use to generate GIFs from symbol information! Image Source [3].
Example of a praise floor parameterized over states and movements for the OpenAI Gym Pendulum surroundings. See the appendix for code on the best way to generate this plot.
Example of a parameter/hyperparameter distribution from Gaussian Process Regression. Image supply: creator.

5a. Start Analytic

Before you overview your set of rules in a dynamic surroundings, ask your self: Is there an analytic serve as I will overview this on? This is particularly treasured for duties through which you aren’t supplied with a floor fact price. There are a mess of check purposes that can be utilized, ranging in complexity from a easy elementwise sine serve as to test functions used for optimization [4].

An instance of a Rastrigin check serve as. Image supply: creator.

5b. Start Simple

You’re now in a position to transition your RL fashion to an atmosphere! But sooner than you overview your fashion on complicated environments, for example, with a state area of 17 dimensions [1], possibly it will be higher to start out your analysis process in an atmosphere with simply 4 [1]?

The OpenAI Gym CartPole-v1 surroundings is an instance of a just right beginning surroundings, since its state area has most effective Four dimensions, and its motion area has most effective 1 measurement. Image supply: creator.
Photo by Astrid Lingnau on Unsplash
  1. OpenAI Baselines
  2. Ray/RLlib RL-Experiments
  3. Stable Baselines
  4. TensorFlow-Agents Benchmarks
Photo by Adam Lukomski on Unsplash

A different because of my mentors at MIT Distributed Robotics Laboratory for educating me the following tips. Learning about those tactics has been in reality worthwhile and has made me a some distance higher researcher.

[1] Brockman, Greg, et al. “Openai fitness center.” arXiv preprint arXiv:1606.01540 (2016).

Generating Analytic Reward Functions


Please enter your comment!
Please enter your name here