5 votes

Introducing Dreamer: Scalable reinforcement learning using world models

1 comment

  1. skybrian
    Link
    From the article:

    From the article:

    On the benchmark of 20 tasks, Dreamer outperforms the best model-free agent (D4PG) with an average score of 823 compared to 786, while learning from 20 times fewer environment interactions. Moreover, it exceeds the final performance of the previously best model-based agent (PlaNet) across almost all of the tasks. The computation time of 16 hours for training Dreamer is less than the 24 hours required for the other methods.