What is meant by self-supervised learning?

What is meant by self-supervised learning?

Artificial Intelligence & Machine Learning

Scatterplot featuring a linear support vector machine's decision boundary (dashed line)

Scatterplot featuring a linear support vector machine’s decision boundary (dashed line)
Original: Alisneaky Vector: Zirguezi – Own work based on: Kernel Machine.png

Yann LeCun posted on LinkedIn:

“Everything you ever wanted to know about Self-Supervised Learning but were afraid to ask.

A giant cookbook of SSL recipes.

By a large crowd from Meta-FAIR with various academic collaborators led by Randall Balestriero and Mark Ibrahim.”

From the paper;

“Our goal is to lower the barrier to entry into SSL research by laying the foundations and latest SSL recipes in the style of a cookbook. We hope to empower the curious researcher to navigate the terrain of methods, understand the role of the various knobs, and gain the know-how required to explore how delicious SSL can be.”


Self-supervised learning (SSL), a type of machine learning where a model learns to solve a particular task by using the inherent structure of the data itself. Moreover, without being explicitly labeled. It has gained popularity in recent years due to its ability to learn from large unlabeled datasets, which are much easier to obtain than labeled datasets.

The history of SSL can become traced back to the early days of artificial intelligence and neural networks. Where unsupervised learning already became explored. However, the term “self-supervised learning” was first coined in a 1995 paper by Andrew Ng and Michael Jordan, where they introduced a method called “self-taught learning” for training deep neural networks without labeled data. In the following years, several other SSL methods became proposed. Such as autoencoders, contrastive learning, and generative models.

The scientific explanation of SSL rest on leveraging the inherent structure and patterns present in the data. Thus, to learn useful representations that can become used for downstream tasks. In contrast to supervised learning, where the model is provided with explicit labels and tasked with learning to predict them, SSL uses the data itself as a source of supervision. For example, in contrastive learning, the model learns to distinguish between pairs of similar and dissimilar examples. Thus, without becoming explicitly told which examples are which.

One key advantage of SSL is that it allows models to learn from large amounts of unlabeled data, which is often much easier to obtain than labeled data. This has led to significant improvements in a wide range of tasks, from computer vision to natural language processing. However, SSL is still an active area of research, and there are many open questions about how best to design and train self-supervised models.

What is meant by self-supervised learning?

Self-Supervised Learning: Definition, Tutorial & Examples (v7labs.com)

Self-supervised learning – Wikipedia

UC Berkeley’s Financial Innovation Conference

Self-Supervised Learning and Its Applications (neptune.ai)

Self-supervised learning: The dark matter of intelligence (facebook.com)

Yann LeCun On Reinforcement Learning, Deep Learning & Self Driving (rebellionresearch.com)