Similar Posts
Gradient descent, how neural networks learn | DL2
Byn0cadminTo learn more, I highly recommend the book by Michael Nielsenhttp://neuralnetworksanddeeplearning….The book walks through the code behind the example in these videos, which you can find here:https://github.com/mnielsen/neural-ne… MNIST database:http://yann.lecun.com/exdb/mnist/ Also check out Chris Olah’s blog:http://colah.github.io/His post on Neural networks and topology is particular beautiful, but honestly all of the stuff there is great. And if…
Transformers (how LLMs work) explained visually | DL5
Byn0cadminIf you’re interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from…
Prompt Engineering Tutorial – Master ChatGPT and LLM Responses
Byn0cadminLearn prompt engineering techniques to get better results from ChatGPT and other LLMs.
How might LLMs store facts | DL7
Byn0cadminhttps://www.youtube.com/watch?v=9-Jl0dxWQs8 AI Alignment forum post from the Deepmind researchers referenced at the video’s start:https://www.alignmentforum.org/posts/… Anthropic posts about superposition referenced near the end:https://transformer-circuits.pub/2022…https://transformer-circuits.pub/2023… Some added resources for those interested in learning more about mechanistic interpretability, offered by Neel Nanda Mechanistic interpretability paper reading listhttps://www.alignmentforum.org/posts/… Getting started in mechanistic interpretabilityhttps://www.neelnanda.io/mechanistic-… An interactive demo of sparse autoencoders (made…
Fine-tuning Large Language Models (LLMs) | w/ Example Code
Byn0cadminThis is the 5th video in a series on using large language models (LLMs) in practice. Here, I discuss how to fine-tune an existing LLM for a particular use case and walk through a concrete example with Python code.
What are Transformer Models and how do they work?
Byn0cadminThis is the last of a series of 3 videos where we demystify Transformer models and explain them with visuals and friendly examples. 00:00 Introduction01:50 What is a transformer?04:35 Generating one word at a time08:59 Sentiment Analysis13:05 Neural Networks18:18 Tokenization19:12 Embeddings25:06 Positional encoding27:54 Attention32:29 Softmax35:48 Architecture of a Transformer39:00 Fine-tuning42:20 Conclusion
