IBM's Deep Blue vs Google's AlphaGo & Gary Kasparov
On February 17th 1996, world chess champion Garry Kasparov played against Deep Blue in a chess contest. After six tiring games, Kasparov beat Deep Blue with a 4:2, taking home $400,000 prize money from IBM. Humans beat machines again, marking the end of the first human v. machine chess games in history.
This gave relief to many people who feared the eventual computer uprising but did not prevent the teams at IBM from optimizing their algorithms and hardware. After a year-long overhaul and upgrade, the new program successfully beat Kasparov in a rematch in May 1997, thus becoming the first computer system that beat a human world champion in a standard chess match.
After the game, Deep Blue retired with all the glories it deserved.
23 years later, Deep Blue’s power is far inferior to modern computers and the algorithm itself has become less impressive – even a talented undergraduate student can write one by him/herself for the capstone project. But its influence lingers throughout the development history of AI.
The 1990s was a terrible time for artificial intelligence. Limited to the hardware at that time, artificial intelligence can do little to solve real problems. Fancy programs like neural networks and natural language processing were just fantasies on Powerpoint Presentations.
In fact, most AIs at that time were linear logic machines like the RPG games we played in childhood – the NPCs seemed intelligent, but it was just following a written script. This script couldn’t solve real world problems and generate economic values for investors.
So, the AI needed a win; the computer scientists needed a win; most importantly, IBM needed a win. This was why IBM invested 10 million USD in a team at Carnegie Mellon University to develop the Deep Blue in 1997 to play with humans.
After the win, many thought the era of humans was put on a countdown and the singularity point – the point when machines overtake humans – was just around the corner. This influenced the mainstream media from the success of Matrix to the mandatory appearance of AI assistants in Sci-fi movies. More importantly, however, it sparked tremendous interests in the artificial intelligence field, attracting billions of dollars of investments in the next decades.
So, what is Deep Blue anyway? Did it really mark the uprising of AIs?
Deep Blue’s name came from a combination of its prototype “Deep Thought” developed back at CMU and the nickname of IBM “Big Blue”. The project started under the name “ChipTest” at CMU by 3 graduate students Feng-hsiung Hsu, Murray Campbell, and Thomas Anatharaman. After graduation, these graduates agreed to work with IBM on a successor project, Deep Thought, to further their designs of the program, which eventually turned into the Deep Blue that we knew of.
In 1985, Hsu started his design with a 3 micrometer VLSI architecture to materialize a single-chip chess move generator and then he connected this generator to a chess game circuit designed by Anantharaman. Murray Campbell joined the team months later since he had professional chess knowledge.
In 1988, IBM sponsored the project, and the first Deep Thought program was finished. Deep Thought 0.01 used the state-of-the-art programs to optimize ratings of moves in chess. 6 months later, Deep Thought 0.02 was released with a computational power to calculate 720,000 moves per second powered by 2 special VLSI chess processors.
May 1989, Deep Thought 0.02 wins its first world championship in chess games among programs. Meanwhile, the three designers graduated from their PhD programs and joined IBM. This was also when Deep Thought tried to first combat the world champion Garry Kasparov, where Kasparov won both matches easily.
In 1993, the project’s name was changed to Deep Blue with hopes from IBM to boost the company’s publicity thanks to the project’s media exposure. 3 years later, IBM hired grandmaster chess player Joel Benjamin to join the Deep Blue team, but they still lost its second match with Kasparov in February 1996.
One year later, the upgraded Deep Blue program beat Kasparov for the first time in history in that well-known game, and rest is history.
Kasparov lost fair and square. Over the past 9 years, Deep Blue evolved from calculating only 720,000 moves per second, a speed that would put most chess players in despair, to 200,000,000 moves per second.
This was powered by 480 special VLSI chess processors, an upgrade in both quality and quantity from its first match with Kasparov. As a result, Kasparov can see about 10 moves ahead. Deep Blue can search and rank possible outcomes 12 moves ahead, an advantage that piled and led to the victory.
What shocked people even more was the growth of its power – from 720 thousand to 200 million, the machine only took 9 years. Kasparov, or any human in the world, can never beat this, leading to the eventual overpowering of machines to humans.
Yet why are we still not living in a matrix as batteries for the Skynet?
This lies in the fundamental algorithm of Deep Blue. As a logic machine, it had to strictly follow pre-set rules. All it did was calculating possible outcomes of different moves and ranking them based on the advantages it would bring to the game.
Therefore, it was still thinking like a human and it beat Kasparov with brutal force – its superior computation power.
But many real-world problems cannot be solved by calculations alone with too many confounding factors. Go, for example, is way too complex for AIs to simply read the complete event trees 10 moves ahead. Therefore, AlphaGo used a completely different approach to solve this problem.
People would comment on Deep Blue’s moves as brilliant or the best moves possible, yet they would be left speechless in awe by Alpha Go’s moves because they probably never think of those possibilities.
This marked the difference by Deep Blue and AlphaGo – AlphaGo can come up with things that no humans have even thought of.
This is thanks to the combination of three programs – a policy network, a value network, and a Monte-Carlo Tree Search. Each one of them alone can match amateur human players (in Asian culture, the term “amateur” is actually used as advanced players as seen below), but the combination of all three was able to beat the world champion by a large margin.
The major difference between Deep Blue and AlphaGo is training – programs are able to analyze and deduct optimal rules based on data.
The policy network consisted of three parts. First, they had a supervised learning policy network based on real human behaviors. This is the most traditional machine learning method. Millions of human players would play Go on the network and tell the Go’s result. This is the most human-like part of the AlphaGo – the program doesn’t care about winning the game; rather, it only cares about mimicking the best human players. This part decides where AlphaGo would place its next piece on the board.
The second part is the reinforced learning policy network. With the new algorithm that tries to mimic best human players, the researchers put the program to play with itself for millions of games. As a result, the algorithm would deduct subtle rules on which move would have the highest possibility to win and which would not.
After millions of games, the machine can tell you, based on data from previous games, which move would generate the best outcome on this map of a Go game. Notably, this program doesn’t try to predict any moves ahead but rather just analyze the outcomes based on locations of each piece.
This differs from Deep Blue and even human players as we win by trying to plan things ahead. This network alone is as powerful as a lower-ranked amateur player, a rather advanced level in Go. This part cross-checks the first part of the algorithm on the next move.
The third network is a fast process reading network. Since the Go board is too big, AlphaGo tends to focus on only the regions where the opponents had put their previous piece and where the program would place its own next piece. It sacrificed the holistic view of the board, but increased the processing speed by 1000 times. This part helps AlphaGo run faster.
The second program is called the value network. Instead of trying to tell the machine which place to put the next piece on, it tries to give all possible positions a value based on the possibility to win. It does not analyze outcomes but only tells you whether it is good or bad. This speeds up the process by helping AlphaGo skip the bad moves altogether.
As shown in the graph, the positions with the darkest color are the best for the players to place their next piece, per AlphaGo. It is not a move analysis network but a location analysis network. This part helps AlphaGo analyze the overall situation in a way no humans would ever use.
With the 2 programs on policy (strategy) and value (position), AlphaGo can therefore utilize a Monte Carlo Search Tree to predict the outcomes of its moves.
First, it selects some potential moves based on the first policy network result. Then, it uses a complex Monte Carlo Search Tree to predict the possible outcomes. Meanwhile, the value network would rate the positions.
So now AlphaGo has two versions of potential event trees, one from Monte Carlo and one from the value network. It will then rate the outcomes based on the possibility for AlphaGo to win and pick the highest one.
So, AlphaGo differs from Deep Blue by not trying to predict all potential outcomes.
It, based on millions of games it played with humans and itself, deducts its own view on the situation and generates the best outcome based on its own rules. Unlike Deep Blue or other old-fashioned game AIs, AlphaGo does not follow preset rules.
Simply put, if we look back at the Deep Blue’s game, we can easily understand the motives behind each move as the machine can show us the outcomes and explain to us why this is the best outcome based on its computational power.
But, when AlphaGo makes a move, we sometimes don’t understand why and AlphaGo cannot explain that to us.
AlphaGo can make a move that seems like a mistake but later gets proven to be crucial 30 moves later, yet Deep Blue can only make a move that seems like the best decision within 12 moves ahead. Therefore, we taught Deep Blue how to play chess and it beat us by calculating faster; we only showed AlphaGo the rules of Go and it taught itself how to play; now it is better than humans and we do not know why.
This is why Deep Blue may have attracted far more attention, but AlphaGo is the one that would actually revolutionize everything around us.
Written by Tianyi Li
Edited by Jeremy Knopp & Alexander Fleiss