4 votes

What programming/technical projects have you been working on?

This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?

1 comment

  1. DataWraith
    Link
    TL;DR: I use an algorithm to choose what hero to play in Overwatch. The experimental results are inconclusive as of yet, because I haven't actually played enough matches yet. I recently started...

    TL;DR: I use an algorithm to choose what hero to play in Overwatch. The experimental results are inconclusive as of yet, because I haven't actually played enough matches yet.


    I recently started playing Overwatch again, which gave me an excuse to also play with Multi-Armed Bandit algorithms.

    The idea is that I rate every match I play on a scale from 0 to 10 and the bandit algorithm figures out which heroes are the most fun to play. Normally I'd mostly stick to the single hero I know best, but then I would never know if one of the others is more fun after some practice. The algorithm automatically figures out who to play in a (theoretically) optimal manner, taking into account the fact that games have a lot of randomness.

    (Note: This isn't a terribly good idea, the same could probably be done with a simple spread-sheet or just gut-feeling, but I used it as an excuse to learn more about the game and also play with shiny new algorithms...)

    The Multi-Armed Bandit

    The Multi-Armed Bandit problem pits a hypothetical gambler against an array of slot machines ("one-armed bandits"). There are many variants of the problem formulation, but I'm concerned with the stochastic variant where pulling arms is free and each slot machine pays out a reward (possibly zero) whenever you pull its arm.

    The arms of the bandits correspond to my hero choices, and the rewards are my ratings of how much fun the match was.

    The MAB-problem is so interesting because it posits an exploration-exploitation dilemma -- do you pull the arm that you think is best, or do you explore a different arm in the hope of learning more about its reward distribution?

    Algorithms considered

    The simplest MAB-algorithm is Epsilon-greedy, which pulls a random arm once in a while in order to learn about it, while pulling the empirically best arm the remainder of the time. Epsilon-greedy is surprisingly good if you tune the fraction epsilon -- how often to sample a random arm -- but in general it is quite bad when not tuned.

    Another well-known algorithm is UCB1, which is effective (and well-known, among other things, due to its inclusion in the standard formulation of UCT/MCTS). It acts on the principle of "Optimism in the Face of Uncertainty" (OFU) and always pulls the arm with the highest upper confidence bound. It, too, must be tuned for best performance.

    Thompson Sampling is among the best algorithms, both theoretically and empirically. It chooses an arm with probability proportional to its belief that the arm is optimal.

    The downside of Thompson Sampling is that it requires knowledge of the reward distribution type (Bernoulli, Gaussian, etc.) to achieve its theoretical performance.

    The new-ish Subsampling Dueling Algorithms don't have that flaw. The basic idea is that you compare ("duel") two arm choices based on a subsample of their past rewards, and then pull the winning arm after you (mentally) duel all arms against each other.

    Hero-chooser

    In the end, I chose to go with multinomial Thompson Sampling, because my rewards are bounded in [0, 10], and, given a library that can generate Dirichlet random numbers, it is very easy to implement in Ruby:

    Ruby source code
    require 'yaml'
    require 'simple-random'
    
    ratings = YAML.load(ARGF)
    
    rng = SimpleRandom.new
    rng.set_seed(rand(2**32))
    
    ranked = []
    
    ratings.each_pair do |hero, h_ratings|
      # All rating counts must be initialized to 1, so r is an Array of 11 entries
      r = h_ratings.to_a.sort.map(&:last)
    
      sample = rng.dirichlet(*r)
      
      # Sampled hero score (pick hero with highest score next)
      v = (0..10).to_a.zip(sample).map { |a, b| a.to_f / 10 * b }.sum
      
      ranked << [v, hero]
    end
    
    # Display hero names and their scores.
    ranked.sort.reverse.each do |v, hero|
      puts "#{hero}\t#{v}"
    end
    

    where ratings is a nested hash of the form {hero_name => {rating => number_of_games_with_that_rating}} read from a YAML file.

    4 votes