Chapter 01

See the Hidden World

Your brain is a world-model machine. Give it two flat images and it conjures depth that exists in neither one alone. That trick — recovering hidden structure from raw views, with nobody telling you the answer — is exactly what self-supervised learning is after.

CONCEPT 1.1

Random-dot stereograms

Two images of pure noise. Offset a hidden region of dots between the left and right view and — crossing your eyes — a 3D shape floats out. No edges, no texture, no labels. The depth lives only in the relationship between the two views.

Drag the slider to change how far the hidden region is shifted (the disparity). The reveal panel recovers the shape, brighter as the signal gets stronger.

🌍 Everyday analogy: remember those “Magic Eye” posters from the ’90s? Two slightly different speckly patterns, and if you relax your eyes a hidden dolphin pops out in 3D. Your two eyes are the two views; your brain is the network that finds the depth they secretly share.
💡 Key idea: useful structure can be hidden in the agreement between two views — waiting to be discovered without a single label.
🎯
Challenge: push the disparity up until the reveal locks in, then click the shape you see.
Stereogram lab
CONCEPT 1.2

What is a world model?

A world model is a system that predicts what happens next. Watch a ball bounce for a moment and you instinctively know where it'll be in a second — you're running a tiny physics simulator in your head.

Toggle the model on. Faint “ghost” dots show where the model thinks the ball will travel. The better its internal model of gravity and walls, the closer the ghosts hug reality.

🌍 Everyday analogy: a pool player chalks the cue and, before striking, sees in their mind exactly where the balls will scatter. That little mental simulator — “if this, then that” — is a world model. We all run one constantly; AI is just learning to build its own.
💡 Key idea: intelligence leans heavily on prediction. The whole rest of this course is about learning good predictive representations — without hand-labelling the world.
Predict-the-future sandbox
CONCEPT 1.3

Supervised vs. self-supervised

The obvious approach: feed the network both views and ask it to output the depth. That's supervised learning — and it only works if a human first labelled the correct depth for every example. Labels are slow, costly, and run out fast.

In the simulator, data streams in. With supervised learning you must hand-label each item (you have a limited budget). Self-supervised learning instead invents its own task from the data's structure — so it can learn from all of it.

🌍 Everyday analogy: supervised learning is paying tutors to hand-write the answer on the back of every flashcard — accurate, but you go broke fast and run out of cards. Self-supervised learning is a toddler with a box of blocks: nobody labels anything, yet by stacking and knocking them over the child learns gravity, balance and shape. The world is its own answer key.
💡 Key idea: the real question of this whole field — can a network learn useful structure from the relationship between views, with no labels at all?
The labelling-budget game
BRIDGE → CH.2

So… can a machine do this?

Your brain finds the hidden depth automatically. But what if we want a machine to do the same — with no labels, on millions of examples, across any kind of data?

The key insight from Chapter 1: useful structure hides in the relationship between two views of the same thing. Neither view alone contains it. Together, they do.

In 1992, Becker & Hinton had a simple idea: take two neural networks, feed each one a different view, and reward them when their outputs agree. If only shared structure drives the agreement, that structure gets discovered — no labels, no human input.

It sounds almost too simple. And it is — there's a catastrophic loophole waiting. Chapter 2 shows both the idea and the trap.

💡 The thread: every method in this course is a variation on the same goal — learn from agreement between views, without collapsing to a useless constant.
← Journey map Chapter 2 · Two Views, One Truth →