Understanding Transformers
A deep dive into the transformer architecture, from attention mechanisms to full model training.
Attention Mechanisms
Learn how attention works from first principles.
- [ ] What Is Attention? [text] free
- [ ] Self-Attention Demo [video]
- [ ] Multi-Head Attention [text]
Embeddings
How models represent words and positions as vectors.
- [ ] Word Vectors [text]