https://www.youtube.com/watch?v=9-Jl0dxWQs8
AI Alignment forum post from the Deepmind researchers referenced at the video’s start:
https://www.alignmentforum.org/posts/…
Anthropic posts about superposition referenced near the end:
https://transformer-circuits.pub/2022…
https://transformer-circuits.pub/2023…
Some added resources for those interested in learning more about mechanistic interpretability, offered by Neel Nanda
Mechanistic interpretability paper reading list
https://www.alignmentforum.org/posts/…
Getting started in mechanistic interpretability
https://www.neelnanda.io/mechanistic-…
An interactive demo of sparse autoencoders (made by Neuronpedia)
https://www.neuronpedia.org/gemma-sco…
Coding tutorials for mechanistic interpretability (made by ARENA)
https://arena3-chapter1-transformer-i…
Sections:
0:00 – Where facts in LLMs live
2:15 – Quick refresher on transformers
4:39 – Assumptions for our toy example
6:07 – Inside a multilayer perceptron
15:38 – Counting parameters
17:04 – Superposition
21:37 – Up next