Understanding Transformers

A deep dive into the transformer architecture, from attention mechanisms to full model training.

Attention Mechanisms

Learn how attention works from first principles.

[ ] What Is Attention? [text] free
[ ] Self-Attention Demo [video]
[ ] Multi-Head Attention [text]

Embeddings

How models represent words and positions as vectors.

[ ] Word Vectors [text]