Memory bank & the staleness problem
Idea: pre-compute an embedding for every image once, store them, and reuse them as negatives — no need to re-encode thousands of images each step. Suddenly you have a huge dictionary for free.
But there's a catch. The encoder keeps changing every step, while the stored embeddings were made by older versions of it. They go stale — inconsistent with today's encoder — and stale negatives give noisy gradients. Watch the bank rot as training marches on.
MoCo: momentum encoder + a queue
MoCo's two moves. First, a momentum encoder: the key encoder's weights are an exponential moving average (EMA) of the query encoder. It changes slowly, so all the stored negatives stay consistent. Second, a FIFO queue: each step the newest batch of keys is enqueued and the oldest is dropped.
This decouples the number of negatives from the batch size. Crank the momentum and the queue length and watch the consistency meter respond; lower momentum and the dictionary starts to wobble.
Go deeper: why an EMA teacher and not just a copy?
If the key encoder were an exact copy of the query encoder, it would jump every single step and all your stored negatives would instantly become stale again — back to square one. The exponential moving average (m·old + (1−m)·new, with m ≈ 0.99) makes the teacher drift so gently that thousands of negatives encoded over many recent steps still “speak the same language.” That’s what lets MoCo keep a giant dictionary on a normal-sized GPU: the queue holds the negatives, the slow teacher keeps them comparable.
SimCLR: augmentations & the similarity matrix
Two implementation details turn out to matter enormously. (1) Strong, random augmentations — crop, resize, blur, colour-jitter — which quietly define what the model should ignore. (2) A learnable projection head (an MLP) where the contrastive loss is applied.
For each batch we build a similarity matrix between all embeddings. Blue cells are positive pairs (the two views of one image); the rest are negatives. Training pushes the blue cells bright. Toggle augmentations and step training.