Jay Taylor's notes

back to listing index

(4) Andrej Karpathy on X: ""Move 37" is the word-of-day - it's when an AI, trained via the trial-and-error process of reinforcement learning, discovers actions that are new, surprising, and secretly brilliant even to expert humans. It is a magical, just s

[web search]
Original source (x.com)
Tags: machine-learning ai llm andrej-karpathy move-37 word-of-the-day x.com
Clipped on: 2025-08-27

To view keyboard shortcuts, press question mark
View keyboard shortcuts

Post

Conversation

"Move 37" is the word-of-day - it's when an AI, trained via the trial-and-error process of reinforcement learning, discovers actions that are new, surprising, and secretly brilliant even to expert humans. It is a magical, just slightly unnerving, emergent phenomenon only achievable by large-scale reinforcement learning. You can't get there by expert imitation. It's when AlphaGo played move 37 in Game 2 against Lee Sedol, a weird move that was estimated to only have 1 in 10,000 chance to be played by a human, but one that was creative and brilliant in retrospect, leading to a win in that game. We've seen Move 37 in a closed, game-like environment like Go, but with the latest crop of "thinking" LLM models (e.g. OpenAI-o1, DeepSeek-R1, Gemini 2.0 Flash Thinking), we are seeing the first very early glimmers of things like it in open world domains. The models discover, in the process of trying to solve many diverse math/code/etc. problems, strategies that resemble the internal monologue of humans, which are very hard (/impossible) to directly program into the models. I call these "cognitive strategies" - things like approaching a problem from different angles, trying out different ideas, finding analogies, backtracking, re-examining, etc. Weird as it sounds, it's plausible that LLMs can discover better ways of thinking, of solving problems, of connecting ideas across disciplines, and do so in a way we will find surprising, puzzling, but creative and brilliant in retrospect. It could get plenty weirder too - it's plausible (even likely, if it's done well) that the optimization invents its own language that is inscrutable to us, but that is more efficient or effective at problem solving. The weirdness of reinforcement learning is in principle unbounded. I don't think we've seen equivalents of Move 37 yet. I don't know what it will look like. I think we're still quite early and that there is a lot of work ahead, both engineering and research. But the technology feels on track to find them. https://youtube.com/watch?v=HT-UZkiOLv8
Every single stunning example of creativity in AI comes from reinforcement learning -- Ilya
By its very definition, we won’t see Move 37 coming. It will only be clear in hindsight, once the game is over.
i feel like Deepseek def tried to evoke that, not so low key
> "cognitive strategies" - things like approaching a problem from different angles, trying out different ideas, finding analogies, backtracking, re-examining, etc A few years from now we will look back and think how crazy it was that we were manually doing things like CoT,
We taught some superhuman chess moves from AlphaZero to Grandmasters some time ago (https://arxiv.org/abs/2310.16410#), directly motivated by move 37. We wanted to see if we can teach that magic to humans. One of them ended up winning the world chess championship - the amazing :)
"RL doesn't care about your feelings". I don't think humans are ready to face that in math and coding yet.
> that the optimization invents its own language that is inscrutable to us, A weak version of this seemed to happen with R1-Zero where it wasn’t easily readable and mixed languages.
Reminds me of Polya's "How to Solve It", but they figure it out on the fly.
feels a lot like an evolutionary process what's a human example of move 37? Einstein's discovery of the theory of relativity?