AlphaFold Makes A Scientific Breakthrough
While the election news takes over the front page of mainstream media, a scientific discovery involving the use of Artificial Intelligence almost went unnoticed by the general public.
Last year, Google’s Deepmind created an algorithm called AlphaFold 2. It aimed to support scientific discoveries with Artificial Intelligence by predicting the 3D structure of proteins based on their amino acid sequences.
This year, AlphaFold defeated all other competitors in the Critical Assessment of Protein Structure Prediction (CASP) contest with its accurate prediction of the 3D structures of proteins based on their amino acid sequences.
Impressively, its accuracy even matched with state-of-the-art traditional methods commonly used in laboratories like CryoEM, MRI, and X-ray Crystallography.
Deepmind made its fame by defeating humans in games, like chess, Go, and Starcraft II, etc. However, the goal of AI is never about playing games. Rather, games like these provide a training ground for algorithms so that they can be trained to solve real world problems in the future.
Protein folding has been one of the most important challenges in the study of biology.
Protein is a large biological molecule consisting of chains of amino acid, whose 3D structure after the folding determines the functions of the protein. Therefore, knowing the way it folds helps us predict the functions of proteins. Most challenges in the world from diabetes to COVID-19 pandemic could be solved with knowing the roles of these proteins and mass-producing the proteins we need.
Traditional methods to analyze proteins take years and millions of dollars of investment.
Beginning in the 1950s when Sir John Cowdery Kendrew published his finding of the structure of sperm whale myoglobin, for which he won the Nobel Prize later in 1962, X-ray Crystallography had been used to determine the structure of proteins.
Researchers irradiate proteins with X-ray and then determine the locations of each atom based on the diffraction.
Using this method, scientists were able to know the structures of more than 130,000 proteins and thus linked these structures with their specific functions. Thanks to the technological advancements in the last ten years, researchers can now take advantage of a new technology called cryogenic electron microscopy (Cryo-EM), which is powerful enough to give a clearer image of proteins’ structures.
However, this is far from enough. There are 10^300 possible structures of proteins (as a comparison, there are only 10^80 atoms in the universe), and even an average human body contains more than 200 million different types of proteins. What we already know about the proteins is just a small fraction of our bodies.
Additionally, what researchers have been doing, even with the help of modern technologies and billions of dollars in funding, is still the old-fashioned trial-and-error approach. Scientists are merely making proteins one-by-one and observing them individually. Given the sheer amount of protein structures, identifying the structures of just the 200 million proteins in human bodies may take forever.
So, a better way to study proteins deems necessary. Thankfully, in 1972, Nobel laureate Christian Anfinsen published his famous hypothesis: Theoretically, a protein’s amino acid sequence should be the only determinant of its post-folding structure.
This hypothesis marked the start of a half-century long quest – since the 1-dimension sequence of amino acid can determine the 3-dimensional structure of the eventual product. However, even after the astonishing development of modern computers in the past 2 decades, it is still not an easy task to predict the structures of all proteins based on their amino acid sequences.
This challenge results from the complicated process a protein undergoes before it enters its final structure – a process called protein folding. A small change in the amino acid may result in a major change in the folding process, some of which takes only a fraction of a second, and the eventual structure. Therefore, despite the incredible development in the modern computer in the last two decades, traditional ways of calculation have been proven inadequate to accurately predict the final structure of proteins.
Artificial Intelligence was then used to solve this problem.
Fortunately, the cost of genetic sequencing has drastically decreased, providing us with enough data on the 1D structure of amino acids. This opens the possibilities for artificial intelligence to solve the problems. AlphaFold is one of these algorithms that rely on deep learning neural networks, which are trained to predict the protein structures based on genetic sequences. It can predict the distance between amino-acid residues and the angles of these biochemical bonds.
In order to build AlphaFold, the DeepMind team trained a neural network on a public database with 170,000 known proteins and their genetic sequences until it could predict 3D structures of the proteins from their amino acids’ sequences.
First, AlphaFold uses the neural network to predict the distances between pairs of amino acid residues and the angles between the chemical bonds that connect them. Then, it tweaks the draft structures to find the most energy-efficient arrangements.
The program took 2 weeks to predict its first protein structure, but now it takes merely a few hours.
DeepMind also trained a neural network to predict the distribution of possible distances between amino acid residues with confidence levels. Another algorithm was also trained to predict the difference between the projected structure and the actual structure based on the sums of distances in the protein. The end result is a score on the accuracy of the model.
Then, the team trained another neural network to substitute a small part of the protein with a new one, a method commonly used in the study of structural biology, to increase the score of the projection to its optimal level one part after another.
This traditional method was able to provide adequate data but with immense complication. The team then took advantage of gradient descent, a classic method used in machine learning. By expanding the model into series, the algorithm was able to optimize the projected structure as a whole.
Critical Assessment of Protein Structure Prediction
In 1994, Professor John Moult and Professor Krzysztof Fidelis founded CASP as a biennial blind assessment to catalyze research, monitor progress, and establish the state of the art in protein structure prediction. This competition is regarded as the Olympic Game of the protein structure prediction industry and DeepMind participated this year with its new AlphaFold2. The goal of this competition is to predict the structures of proteins based on amino acid sequences.
These proteins’ structures were recently cracked but still unpublished. So the board would later compare the results from contestants with the actual structure to determine the winner. AlphaFold2 won again with a median score of 92.4 GDT, indicating an average error of approximately 1.6 Angstroms (0.16 nanometer).
This success warranted the possibility of AI in scientific developments.
AlphaGo and AlphaZero mastered complex games and showed us their superiority to human minds in some fields, so it is very promising for AI to help humans solve challenges like protein folding on a large scale in the future.
For example, almost all diseases come from the structural changes in proteins in human bodies, like cancer.
Experimental cancer treatments utilize proteins to target cancer cells specifically and have been proven efficient yet only applicable on a small scale. This results from our extremely limited understanding of proteins – traditionally identifying even one protein’s structure would take scientists years of work and millions of dollars.
AlphaFold did it within a couple of hours with a standard deviation of one atom.
It would dramatically accelerate our study of cells and viruses, including the coronavirus family.
In the coronavirus virus structure, there are crown-like spikes called Spike Glycoprotein on the virus, responsible for binding the virus to human cells. By cracking the structure of these types of proteins, doctors could understand the toxicology of the virus and design specific drugs to block them.
This is why even the CEO of Google, Sundar Pichai, offered his highest excitement and applause to this achievement.
“DeepMind’s Incredible AI-powered protein folding breakthrough will help us better understand one of life’s fundamental building blocks and enable researchers to tackle new and hard problems, from fighting diseases to environmental sustainability”.
Written by Tianyi Li
Edited by Alexander Fleiss, Calvin Ma & Jeremy Knopp