Breaking a Pong-playing RL agent

Mask out important stuff and watch it fail!

Apr 13, 2021

I recently discovered Streamlit, and naturally had a bunch of ideas for things to do with it.

One of those is to build a simple web-app which allows you to apply a mask to the screen of an Atari game and then watch a pre-trained agent completely fail at the game. So, I’ve built exactly that.

How was the agent trained?

The agent was trained using the Proximal Policy Optimization algorithm as implemented in the stable-baselines3 RL library. It was trained using a convolutional neural network policy (specifically, the pre-built one provided in their library) and it ran for 1 million time steps. By the end of training, it got around 19 reward. This is pretty good. Basically 19 reward means that the built-in Pong game AI gets 1 point per game while the PPO agent wins each game (with 20 points). After training, I used stable-baselines save functions and I also use their loading and evaluation functionality to load and evaluate the trained agent.

What does the web-app do?

The web app is super simple, it just lets you define a mask to apply to the screen image of the Pong game and then evaluate the pre-trained agent in the environment with that mask on the screen. After it’s done evaluating, it prints out the mean and standard deviation of the reward earned by the agent and it displays a video of the agent in the environment with your mask applied for you to watch.

Results?

Honestly, I knew that the pre-trained agent would be easy to break. RL agents are quite brittle and don’t generalize well and, depending on the mask applied, putting a mask on the observations can be a massive change. This part of my hypothesis is right, it is extremely easy to break the pre-trained agent.

Where I’m wrong, though, is about how hard I thought it would be to find a patch that doesn’t break the agent. I’ve tried really small patches, like 5 by 3, and it still breaks the agent.

In order to make it not break the PPO agent, I had to keep the patch small (5 by 3), gray (pixel intensity 114) and put it into the upper left corner of the screen, by where the score is. Then, the agent could play with a minor drop in performance (average reward of 16.7 over 10 episodes).

What’s next?

Stay tuned, I’m working on some things. For one, I’m certainly going to clean up my Streamlit app and put it online so that anyone can use it. Expect that by next week. I’ll put a link to it here when it’s up.

I might go a little further with this breaking-Atari-agents thing. Might try it on some other environments, or try training an agent to be robust to these patches by randomly sampling a bunch of them during training. But that might take more compute than I have access to.

Do you have thoughts about what other experiments I should run? Or do you really want a feature added to the Streamlit app that I haven’t mentioned? Let me know in the comments.

The Merge

Discussion about this post