Mechanical Sympathy for the massive, spinning plates of hardware constraints running under the hood.
So, we built an interactive game to teach you how LLMs actually work (and fail):
π§© LLMs Are Demented: The Crossword
Play in Fullscreen Mode (if the embed window sizing is annoying)
βοΈ How the Game Works
This is a standard, technical 9-word crossword puzzle. To win, you must retrieve the definitions of core machine learning concepts (like WEIGHTS, TOKEN, ATTENTION, and EPOCH) and type them in.
But as you play, you are running directly inside the actual architectural constraints of a Large Language Model:
1. πΎ The Context Window (
Ctokensβ
)
The model only tracks your last N cell edits. If you type more letters than your context size, the oldest letters you entered fall out of context and start organically decaying. They will slowly flicker and mutate into visually similar characters (or pure noise) as the model loses track of them.
2. β° KV-Cache Expirations (
Ο
)
The board is split into 4 distinct quadrants (Q1-Q4). If you leave a quadrant untouched for too long, its cache expiresβand that entire section of the board is instantly wiped blank! You must hop between quadrants to keep their caches active.
3. π₯ Temperature (
T
)
Controls the chaos of mutations:
-
Low Temp (
Tβ€0.8
): Drifts predictably (e.g.
E becomes 3, A becomes 4).
-
High Temp (
Tβ₯1.3
): Explodes into pure symbolic entropy (emojis, percent signs, and system glyphs).
π οΈ Choose Your Hardware Preset
Before you click INITIATE RUN, select your inference endpoint difficulty:
- π’ Enterprise API (Easy): Large context window ($C=64$), 90-second cache, very low temperature. Very forgiving.
- π» Local Llama (Medium): Quantized 7B model running on a laptop ($C=32$), 45-second cache, standard temperature (0γγ«.7$). You'll need to move fast to avoid decay.
- π Smart Toaster (Hard): Edge inference on a kitchen appliance ($C=16$), 15-second cache, high temperature (1γγ«.4$). Complete hardware chaos.
- π Smart Toaster (Hard): Edge inference on a kitchen appliance ($C=16$), 15-second cache, high temperature (1γγ«.4$). Complete hardware chaos.
Tip: If you need a cheatsheet, click the π§ VIEW WEIGHTS button to dump the answers database. But be warned: the database query locks keyboard inputs, forcing you to close the weights, switch contexts, and recall the answers from memory!
πΆοΈ Challenge Mode: Blind Inference
By popular demand (shoutout to @kenielzep97 for the brilliant suggestion!), I've added a Blind Inference toggle to the hyperparameters panel.
Flip it on to play with all telemetry, warning overlays, and letter mutations completely masked. You won't know the cache is decaying or mutating until the final compiler locks your runβa harsh simulation of how an LLM has no meta-awareness of its own context limitations!
π Beat the Machine & Share Your Score
Once you fill in the last box, the system triggers RUN INFERENCE automatically to lock your scorecard.
Can you beat the local CPU (15 TPS) or a Cloud API (150 TPS)? Click COPY SCORE at the end of your run and paste your stats in the comments below!
π¬ Let's Discuss:
- What's the weirdest "mutation" you saw at High Temperature?
- What was your Time to First Token (TTFT) and highest TPS?
An educational crossword game to learn about LLMs
π§© LLMs Are Demented: The Crossword π§
Mechanical Sympathy Edition v1.0.0
Model Accuracy: 100%
Temperature: Chaos
Deployment: Cloud%20Run
Welcome, neural engineer. You have been tasked with solving a standard, technical crossword puzzle.
There's just one catch: You are running this crossword directly inside the hardware constraints of a running Large Language Model (LLM). If you type too slowly, your KV-Cache decays and evaporates. If you type too much, your Context Window overflows and older letters drift into hallucinations. If your Temperature is too high, cells erupt into symbolic garbage.
An interactive, educational game designed to teach the general public why LLMs hallucinate, decay, and make mistakesβso they stop getting so frustrated at their chat clients and gain some Mechanical Sympathy for the machine.
βοΈ The Mechanics of Frustration (How to Play)
As you solve the 9 intersecting technical clues on the board, the model's runtime architecture will actively fight your progress:
-
πΎ Context Window ($C_{\text{tokens}}$):...