Deep Learning systems can achieve remarkable, even super-human performance through supervised learning on large, labeled datasets. However, there are two problems: First, collecting ever more labeled data is expensive in both time and money. Second, these deep neural networks will be high performers on their task, but cannot easily generalize to other, related tasks, or they need large amounts of data to do so. In this blog post, Yann LeCun and Ishan Misra of Facebook AI Research (FAIR) describe the current state of Self-Supervised Learning (SSL) and argue that it is the next step in the development of AI that uses fewer labels and can transfer knowledge faster than current systems. They suggest as a promising direction to build non-contrastive latent-variable predictive models, like VAEs, but ones that also provide high-quality latent representations for downstream tasks.
OUTLINE:
0:00 — Intro & Overview.
1:15 — Supervised Learning, Self-Supervised Learning, and Common Sense.
7:35 — Predicting Hidden Parts from Observed Parts.
17:50 — Self-Supervised Learning for Language vs Vision.
26:50 — Energy-Based Models.
30:15 — Joint-Embedding Models.
35:45 — Contrastive Methods.
43:45 — Latent-Variable Predictive Models and GANs.
55:00 — Summary & Conclusion.
Paper (Blog Post): https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence.
My Video on BYOL: https://www.youtube.com/watch?v=YPfUiOMYOEE
ERRATA:
- The difference between loss and energy: Energy is for inference, loss is for training.
- The R(z) term is a regularizer that restricts the capacity of the latent variable. I think I said both of those things, but never together.
- The way I explain why BERT is contrastive is wrong. I haven’t figured out why just yet, though smile
Video approved by Antonio.
Abstract:
We believe that self-supervised learning (SSL) is one of the most promising ways to build such background knowledge and approximate a form of common sense in AI systems.