Understanding Transformers

A deep dive into the transformer architecture, from attention mechanisms to full model training.


Attention Mechanisms

Learn how attention works from first principles.

Embeddings

How models represent words and positions as vectors.