Chapter 07

World Models & Planning

Once a model can predict how a scene evolves, you can ask the deepest question of all: if the agent does this, what happens next? Turn that imagination into action and you get planning — searching for the moves that reach a goal, entirely inside a learned model.

CONCEPT 7.1

Action-conditioned world model

"Action-conditioned" means the prediction depends not just on the current state but on the agent's chosen action. Encode the current frame, pick an action, and the predictor outputs the resulting future embedding.

Fire the thrusters. The ghost trail is the model imagining where the agent ends up before it actually moves. Different action → different predicted future.

🌍 Everyday analogy: a chess player muttering “if I take the knight, then they’ll check me, then…” — running moves in their head before touching a single piece. An action-conditioned world model is that inner board: feed it a move, it shows you the imagined consequence.
💡 Key idea: a world model lets you try actions in imagination — the prerequisite for planning without touching the real world.
Imagine-the-future · fire thrusters
CONCEPT 7.2 · 7.3

Planning with the cross-entropy method

To find a good plan we roll out candidate action sequences through the model and score how close each gets to the goal. The cross-entropy method (CEM) is a beautifully simple "guess & improve" search:

① sample many random action sequences · ② roll each out & score it · ③ keep the best — the elite set · ④ refit a Gaussian to the elites & sample again. Repeat and the plans zero in on the goal. Running this fresh every step is model predictive control (MPC).

🌍 Everyday analogy: darts, blindfolded. Throw a big handful at random, peek at where the best few landed, then aim your next handful around those spots. Repeat and the cluster marches onto the bullseye. MPC is throwing one dart, then re-aiming completely before the next — essential when the wind (the real world) keeps shifting.
Go deeper: planning in a learned latent space

Classic model-predictive control needs an engineer to hand-write the physics (“the arm weighs X, the friction is Y”). Action-conditioned JEPA replaces that with a learned dynamics model that predicts the next embedding, and scores a plan by how close its predicted final embedding lands to the goal embedding. The cross-entropy method is just the search that finds good action sequences inside that imagined space. After roughly 62 hours of unlabelled robot video, this is enough to plan real manipulation tasks — no reward labels, no physics equations.

💡 Key idea: Action-conditioned JEPA plans inside a learned latent space — no hand-coded physics. Same idea, now driving real robots.
🚀
Play: hit “run iteration” a few times and watch random spaghetti converge onto the goal. Then turn on wind + MPC.
CEM planner · the rollout game
← Chapter 6 Chapter 8 · Anti-Collapse by Design →