microgpt.py — 226 lines

Demystify LLMs in 200 Lines of Code.

Read the actual code behind language models. Highlight any line. Get a precise explanation of what it does and why.

class Head(nn.Module):

""" one head of self-attention """

def forward(self, x):

k = self.key(x)

wei = q @ k.transpose(-2,-1)

AI ›

Head is one attention head. It projects input x into queries, keys, and values, then computes scaled dot-product attention...

Let's Start

Based on Andrej Karpathy's microgpt