Attention in transformers, visually explained | DL6
Demystifying attention, the key mechanism inside transformers and LLMs.
Demystifying attention, the key mechanism inside transformers and LLMs.
Timestamps:0:00 – Who this was made for0:41 – What are large language models?7:48 – Where to learn more
Topics: Overview of course, OptimizationPercy Liang, Associate Professor & Dorsa Sadigh, Assistant Professor – Stanford Universityhttp://onlinehub.stanford.edu/ Associate Professor Percy LiangAssociate Professor of Computer Science and Statistics (courtesy) Assistant Professor Dorsa SadighAssistant Professor in the Computer Science Department & Electrical Engineering Department To follow along with the course schedule and syllabus, visit:https://stanford-cs221.github.io/autumn2019/#schedule artificialintelligencecourse 0:00 Introduction3:30 Why…
This is the second of a series of 3 videos where we demystify Transformer models and explain them with visuals and friendly examples. 00:00 Introduction01:18 Recap: Embeddings and Context04:46 Similarity11:09 Attention20:46 The Keys and Queries Matrices25:02 The Values Matrix28:41 Self and Multi-head attention33:54: Conclusion
LLaMA3.2 has released a new set of compact models designed for on-device use cases, such as locally running assistants. Here, we show how LangGraph can enable these types of local assistant by building a multi-step RAG agent – this combines ideas from 3 advanced RAG papers (Adaptive RAG, Corrective RAG, and Self-RAG) into a single…
https://www.youtube.com/watch?v=9-Jl0dxWQs8 AI Alignment forum post from the Deepmind researchers referenced at the video’s start:https://www.alignmentforum.org/posts/… Anthropic posts about superposition referenced near the end:https://transformer-circuits.pub/2022…https://transformer-circuits.pub/2023… Some added resources for those interested in learning more about mechanistic interpretability, offered by Neel Nanda Mechanistic interpretability paper reading listhttps://www.alignmentforum.org/posts/… Getting started in mechanistic interpretabilityhttps://www.neelnanda.io/mechanistic-… An interactive demo of sparse autoencoders (made…