AlphaGo is one of our favorite topics to write about — not only are games of strategy interesting in and of themselves, but pitting human masters against a computer in the most complex board game we’ve yet invented is thrilling stuff. Part of what made AlphaGo so interesting is the way the computer came to win; or more precisely, how it learned the game and then formulated a winning strategy. For the original AlphaGo, it was fed a massive trove of previous games’ data, and then it was able to eventually formulate winning strategies through extrapolation from those previous games’ data. But then came AlphaGo Zero, which harnessed a novel approach to machine learning called reinforcement learning; that allowed it to not only dominate human players, but to dominate the original incarnation of AlphaGo as well.
Reinforcement learning is different from some other types of machine learning in that it’s primarily focused on how a software agent out to take actions within a given environment so as to maximize some notion of desirable reward. In other words, it’s playing a game and trying to win it by refining its choices over time, to the point where it can calculate and anticipate winning strategies based on both prior data as well as future models/predictions.
For AlphaGo, it was learning from prior games and then trying to predict/compute optimal moves in the present.
For AlphaGo Zero, however, there were no such human biases baked in — by that I mean, if every game the computer learns from was between two humans, human strategy is baked into the findings/takeaways for the computer because that’s what the computer is modeling its behavior after.
With reinforcement learning, however, the computer can not only learn from past games, but it can also simply be taught the rules of the system and then figure out the game for itself.
In this way, AlphaGo Zero taught itself to play Go, then played it against itself over and over again until it perfected it, then proceeded to beat AlphaGo senseless because Zero could play the game free from human encumbrances.
A startup called CogitAI has developed a platform that lets companies apply reinforcement learning to novel and disparate use cases. According to the MIT Technology Review, “CogitAI was founded by several smart AI experts, including Peter Stone, a professor at the University of Texas. Rich Sutton, one of the fathers of reinforcement learning, is an advisor.” To that end, Stone said their platform allows systems to incorporate the ability to apply what it learned in one situation to a new one, a first step toward “lifelong learning” for AI programs. “The platform has all of the cutting-edge RL algorithms and then some of our steps toward continual learning,” he said.
As more and more major companies open the doors to their machine learning prowess and platforms, the more the potential benefits can and will trickle down to other enterprises. The more tools we gain access to, the wider the adoption and deeper the benefits become. We’e re excited to see what interesting and unique problems researchers and developers set reinforcement learning against next.