Last Week in Review #2

Github, Blog news, RL, and NASA!

May 10, 2021

A new Github repository to host my blog code

Although last week I planned to build a Streamlit app for breaking a Pong-playing PPO agent, it didn't work out. When I tried to deploy on Heroku, the resulting Python environment was too large. Then I tried to deploy it on Streamlit's sharing platform, but had weird errors that I couldn't figure out on their cloud Python environment. So instead I've put all the code into a Github repository. It's easy to install and run it on your machine, provided you've got python 3.8+ installed and know your way around the terminal:

git clone https://github.com/jfpettit/exppy.git cd exppy pip install -e .

The above commands will clone the repository onto your machine, move into the respository (that's what cd does) and will install it as an editable python package using pip.

Once everything is installed, you can run the Atari breaking code with this command:

At the end it should print out a folder where the resulting videos are stored. The videos save as MP4s, so it should be easy to open and view them. You can print out all of the optional arguments you can pass in to the code by appending a --help flag onto the end of the command, like this: python -m exppy.atari_mask --help

Code from the Weird RL with hyperparameter optimizers post is in the repository too, and can be run in the same manner as above: python -m exppy.optuna_rl

As I write code for small experiments and for blog posts, I'll add them to the repository. I'll put any projects bigger than a couple of files into their own repository.

Blog movements

Something I've been meaning to do for a while was to clean up my personal website. Originally my plan was to only post on Substack and to host a simple CV-style site elsewhere, but, then I found Ghost and, after much deliberation I decided to host my blog, CV, etc. on one website. BUT if you're a Substack lover, don't worry, I'll still upload my posts there too. If you don't care or you want to follow along on my website but you've already subscribed to my Substack (before April 20th, 2021), you don't need to do anything. You should automatically receive posts via email from my website. Just unsubscribe from wherever you don't want to receive emails from (Substack or website).

Why Ghost, when there are so many options for site hosting out there? I like that Ghost supports open-source software. Indeed, you can find their entire platform on Github. They're also a nonprofit, which is another bonus. For these reasons, I don't mind paying a few dollars a month to host my site with them.

If you want to support me, consider subscribing.

Quantum ML

Pennylane, an open-source library for quantum machine learning, released support for JAX recently. Quantum ML is one of those things that I don't quite understand, but it seems like it could be a really big deal. It also might be one of those things that takes years and years to live up to its potential, I just can't tell. Either way, it seems pretty interesting, and I'm planning to set aside a little time to learn about the basics. Luckily, Pennylane's tutorials seem great, so I'll be starting there. I did find a paper on quantum reinforcement learning which fails to even match typical performance from a REINFORCE style policy gradient agent. In fact, in their results they claim that their method beats a neural network with 10 parameters on Cartpole, but in my blog post Weird RL with hyperparameter optimization CMA-ES is able to solve the Inverted Pendulum task (same idea as cartpole) with only 5 parameters. These researchers certainly take an important starting step towards applying quantum ML to deep reinforcement learning, and I'm not knocking that, but there is definitely plenty of work to be done in this area.

JIT-able RL environments?

Something I've been thinking about lately is the possibility of using just-in-time compilation (JIT) to compile RL environments into much faster machine code. It's not a guarantee that we'd actually see a speedup in the environment though. Experimentation and tests are certainly necessary to determine what the benefit would be. Plus there's a bit of a limit in that it would likely be challenging to JIT any environments besides the classic control set. But it might still be worth it, even for just those environments. If JAX could be used to implement and JIT the environment, then it may be possible to use grad to differentiate the environment, and pmap / vmap to parallelize / vectorize the environment.

Differentiable environments are great since we can use the differentiated environment to inform physics-based models of the environment for model-based RL. See this cool blog post for more.

Parallelizing and vectorizing an environment allows us to run the environments even faster by executing many copies of them at once and by batching them together.

NASA flew the Ingenuity helicopter on Mars

Humans have successfully designed a rotary-wing aircraft that has flown on another planet! That fact is incredible by itself, but it's even more mindblowing when you consider that helicopters were only invented about 82 years ago. Human ingenuity is incredible, and when we work hard together, we can achieve seemingly impossible things.

Thank you for reading this post! 🙏 Don't forget to subscribe so you get an email when I write new stuff. Bye! 👋

The Merge

Discussion about this post