4
votes

# What programming/technical projects have you been working on?

This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?

TL;DR: I use an algorithm to choose what hero to play in Overwatch. The experimental results are inconclusive as of yet, because I haven't actually played enough matches yet.I recently started playing Overwatch again, which gave me an excuse to also play with Multi-Armed Bandit algorithms.

The idea is that I rate every match I play on a scale from 0 to 10 and the bandit algorithm figures out which heroes are the most fun to play. Normally I'd mostly stick to the single hero I know best, but then I would never know if one of the others is more fun after some practice. The algorithm automatically figures out who to play in a (theoretically) optimal manner, taking into account the fact that games have a lot of randomness.

(

Note: This isn't a terribly good idea, the same could probably be done with a simple spread-sheet or just gut-feeling, but I used it as an excuse to learn more about the game and also play with shiny new algorithms...)## The Multi-Armed Bandit

The Multi-Armed Bandit problem pits a hypothetical gambler against an array of slot machines ("one-armed bandits"). There are many variants of the problem formulation, but I'm concerned with the stochastic variant where pulling arms is free and each slot machine pays out a reward (possibly zero) whenever you pull its arm.

The arms of the bandits correspond to my hero choices, and the rewards are my ratings of how much fun the match was.

The MAB-problem is so interesting because it posits an exploration-exploitation dilemma -- do you pull the arm that you think is best, or do you explore a different arm in the hope of learning more about its reward distribution?

## Algorithms considered

The simplest MAB-algorithm is

Epsilon-greedy, which pulls a random arm once in a while in order to learn about it, while pulling the empirically best arm the remainder of the time. Epsilon-greedy is surprisingly good if you tune the fraction epsilon -- how often to sample a random arm -- but in general it is quite bad when not tuned.Another well-known algorithm is

UCB1, which is effective (and well-known, among other things, due to its inclusion in the standard formulation of UCT/MCTS). It acts on the principle of "Optimism in the Face of Uncertainty" (OFU) and always pulls the arm with the highest upper confidence bound. It, too, must be tuned for best performance.Thompson Samplingis among the best algorithms, both theoretically and empirically. It chooses an arm with probability proportional to its belief that the arm is optimal.The downside of Thompson Sampling is that it requires knowledge of the reward distribution type (Bernoulli, Gaussian, etc.) to achieve its theoretical performance.

The new-ish

Subsampling Dueling Algorithmsdon't have that flaw. The basic idea is that you compare ("duel") two arm choices based on a subsample of their past rewards, and then pull the winning arm after you (mentally) duel all arms against each other.## Hero-chooser

In the end, I chose to go with multinomial Thompson Sampling, because my rewards are bounded in

`[0, 10]`

, and, given a library that can generate Dirichlet random numbers, it is very easy to implement in Ruby:## Ruby source code

where

`ratings`

is a nested hash of the form`{hero_name => {rating => number_of_games_with_that_rating}}`

read from a YAML file.