What is the purpose of deep learning?
Deep learning allows machines to solve complex problems even when using a data set that is very diverse, unstructured and inter-connected.
With more deep learning knowledge and learning, the better they perform.
Deep Learning has taken the world by storm.
It has impacted nearly every technical field, ranging from computer vision to quantum chemistry to quantitative finance.
But it turns out, no one really understands why deep learning works.
Deep Neural Networks (DNNs) work so well, some people may ask “why ask why?” As software engineers, we use tools all the time without understanding how they work under-the-hood. In some sense, this is the entire promise of software engineering.
And one of the magical properties of DNNs is that they can be trained for one task, and then handed off to someone else to use as is of finetune for their purpose.But someone somewhere has to train these large models–and this is hard.
Take, for example, the OpenAI GPT3 language model. GPT3 has revolutionized natural language processing (NLP), and it is useful for everything from detecting fake news to generating fake text.
But OpenAI is keeping GPT3 close to the vest and has not released it to the public. To retrain GPT3 yourself, it has been estimated that it would cost $4.6 million. If we could understand why deep learning works, we can find ways to reduce the training costs by making DNNs smaller and more efficient.
Moreover, how does one test a model that generates fake text?
While we can create statistical proxies, in the end, one has to pay people to read and judge the quality of the results. And, at scale, this can also be very expensive and time consuming.
For these reasons and more, we have developed the open source python tool, weightwatcher.
pip install weightwatcher
Weightwatcher is a tool practitioners can use to evaluate their DNNs,. Weightwatcher analyzes the weight matrices of Deep Neural Networks (DNNs), telling the user how well trained a model is, and providing warnings if something is wrong.
Specifically, weightwatcher uses ideas from Theoretical Physics and Random Matrix Theory (RMT) to measure the amount of correlation from the data.
DNNs that generalize better capture more correlations.
It can also be used to denoise a model, making it possible to predict the test accuracy of a model–without having any test data. And it can make pretrained models that are easier to finetune when applying transfer learning.