moridinamael comments on OpenAI releases functional Dota 5v5 bot, aims to beat world champions by August

moridinamael 27 Jun 2018 19:02 UTC
29 points
OpenAI Five plays 180 years worth of games against itself every day, learning via self-play. It trains using a scaled-up version of Proximal Policy Optimization running on 256 GPUs and 128,000 CPU cores — a larger-scale version of the system we built to play the much-simpler solo variant of the game last year. Using a separate LSTM for each hero and no human data, it learns recognizable strategies. This indicates that reinforcement learning can yield long-term planning with large but achievable scale — without fundamental advances, contrary to our own expectations upon starting the project.
RL researchers (including ourselves) have generally believed that long time horizons would require fundamentally new advances, such as hierarchical reinforcement learning. Our results suggest that we haven’t been giving today’s algorithms enough credit — at least when they’re run at sufficient scale and with a reasonable way of exploring.
From a Hacker News comment by one of the researchers:
We are very encouraged by the algorithmic implication of this result — in fact, it mirrors closely the story of deep learning (existing algorithms at large scale solve otherwise unsolvable problems). If you have a very hard problem for which you have a simulator, our results imply there is a real, practical path towards solving it. This still needs to be proven out in real-world domains, but it will be very interesting to see the full ramifications of this finding.
In other words: Current algorithms do seem to be able to tackle levels of sophistication (long time horizons, imperfect information, high-dimensional option space) that even experienced researchers wouldn’t have predicted, if you give them enough compute. And this person suggests that they could tackle even more sophisticated problems as long as you have a simulator for the problem domain.
- ChristianKl 3 Jul 2018 10:53 UTC
  5 points
  Parent
  One implication would be that we can solve protein folding.
  - Noosphere89 6 Oct 2022 17:31 UTC
    10 points
    Parent
    Indeed, we did with AlphaFold II 2 years ago.
    - ChristianKl 6 Oct 2022 18:26 UTC
      4 points
      Parent
      Receptor-binding behavior is the next step to go.