Jay Taylor's notes

back to listing index

(4) Andrej Karpathy on X: ""Move 37" is the word-of-day - it's when an AI, trained via the trial-and-error process of reinforcement learning, discovers actions that are new, surprising, and secretly brilliant even to expert humans. It is a magical, just s

[web search]

Original source (x.com)

Tags: machine-learning ai llm andrej-karpathy move-37 word-of-the-day x.com

Clipped on: 2025-08-27

To view keyboard shortcuts, press question mark
View keyboard shortcuts

Post

Conversation

Andrej Karpathy

@karpathy

"Move 37" is the word-of-day - it's when an AI, trained via the trial-and-error process of reinforcement learning, discovers actions that are new, surprising, and secretly brilliant even to expert humans. It is a magical, just slightly unnerving, emergent phenomenon only achievable by large-scale reinforcement learning. You can't get there by expert imitation. It's when AlphaGo played move 37 in Game 2 against Lee Sedol, a weird move that was estimated to only have 1 in 10,000 chance to be played by a human, but one that was creative and brilliant in retrospect, leading to a win in that game.

We've seen Move 37 in a closed, game-like environment like Go, but with the latest crop of "thinking" LLM models (e.g. OpenAI-o1, DeepSeek-R1, Gemini 2.0 Flash Thinking), we are seeing the first very early glimmers of things like it in open world domains. The models discover, in the process of trying to solve many diverse math/code/etc. problems, strategies that resemble the internal monologue of humans, which are very hard (/impossible) to directly program into the models. I call these "cognitive strategies" - things like approaching a problem from different angles, trying out different ideas, finding analogies, backtracking, re-examining, etc. Weird as it sounds, it's plausible that LLMs can discover better ways of thinking, of solving problems, of connecting ideas across disciplines, and do so in a way we will find surprising, puzzling, but creative and brilliant in retrospect. It could get plenty weirder too - it's plausible (even likely, if it's done well) that the optimization invents its own language that is inscrutable to us, but that is more efficient or effective at problem solving. The weirdness of reinforcement learning is in principle unbounded.

I don't think we've seen equivalents of Move 37 yet. I don't know what it will look like. I think we're still quite early and that there is a lot of work ahead, both engineering and research. But the technology feels on track to find them.

https://youtube.com/watch?v=HT-UZkiOLv8…

12:25 PM · Jan 28, 2025

·

989.7K
Views

Yuchen Jin

@Yuchenj_UW

·

Jan 28

Every single stunning example of creativity in AI comes from reinforcement learning

-- Ilya

21K

Tesla Yoda

@teslayoda

·

Jan 28

By its very definition, we won’t see Move 37 coming. It will only be clear in hindsight, once the game is over.

22K

swyx

@swyx

·

Jan 28

i feel like Deepseek def tried to evoke that, not so low key

13K

elvis

@omarsar0

·

Jan 28

> "cognitive strategies" - things like approaching a problem from different angles, trying out different ideas, finding analogies, backtracking, re-examining, etc

A few years from now we will look back and think how crazy it was that we were manually doing things like CoT,

12K

Been Kim

@_beenkim

·

Jan 29

We taught some superhuman chess moves from AlphaZero to Grandmasters some time ago (https://arxiv.org/abs/2310.16410#), directly motivated by move 37. We wanted to see if we can teach that magic to humans. One of them ended up winning the world chess championship - the amazing @DGukesh
 :)

arxiv.org

Bridging the Human-AI Knowledge Gap: Concept Discovery and...

Artificial Intelligence (AI) systems have made remarkable progress, attaining super-human performance across various domains. This presents us with an opportunity to further human knowledge and...

6.3K

Shane Gu

@shaneguML

·

Jan 28

"RL doesn't care about your feelings". 

I don't think humans are ready to face that in math and coding yet.

Amjad Masad

@amasad

·

Jan 29

> that the optimization invents its own language that is inscrutable to us,

A weak version of this seemed to happen with R1-Zero where it wasn’t easily readable and mixed languages.

7.6K

gfodor.id

@gfodor

·

Jan 28

Reminds me of Polya's "How to Solve It", but they figure it out on the fly.

12K

Dan Mac

@daniel_mac8

·

Jan 28

feels a lot like an evolutionary process

what's a human example of move 37?

Einstein's discovery of the theory of relativity?

18K

Jay Taylor's notes

(4) Andrej Karpathy on X: ""Move 37" is the word-of-day - it's when an AI, trained via the trial-and-error process of reinforcement learning, discovers actions that are new, surprising, and secretly brilliant even to expert humans. It is a magical, just s

To view keyboard shortcuts, press question markView keyboard shortcuts

Post

Conversation

To view keyboard shortcuts, press question mark
View keyboard shortcuts