Logo

Explaining LLM – No Magic, Just Pattern Machine

“Any sufficiently advanced technology is indistinguishable from magic

25 January 2025 •

Nick Wolf

Artificial Intelligence is the second greatest pattern finding machine.

The first one sits in your head.

Our brain has been grinding the game of "find the pattern or become lunch" for millions of years. It's like that one friend who's been playing Candy Crush since it came out and is now embarrassingly good at matching patterns. But instead of matching candy, it was improving at matching patterns leading to survival.

When that weird sound in the dark turns out to be a tiger? Brain is speedrunning the "match threatening sound to dangerous predator" game in milliseconds (high score = staying alive)

Brain Patterns Recognition Tiger

When you notice three different Twitter threads about parents struggling with AI homework detection, the rising prices of Chegg subscriptions, and teachers complaining about ChatGPT all in one morning? That's you speedrunning the "emerging market gap pattern matching" game in milliseconds (high score = launching your AI-learning tools platform before the education tech giants catch up)

After billions of playing sessions, we ended up with a thing that is casually processing thousands of patterns every second, like it's no big deal. And it became incredible at it: connecting faces to voices, sounds to meanings and actions to reactions.

Brain Patterns Recognition

But why are patterns such a big deal?

If we ask the second greatest pattern finding machine on "how much of our life is about finding patterns" it outputs fascinating data:

If we assume that human cognition inherently involves pattern recognition during most waking hours, we can estimate its prevalence. Here's a plausible breakdown:

  1. Waking Hours (~16 hours):
    • Conscious Activities (Work, Study, Conversations, etc.): These heavily involve recognizing patterns in language, behavior, and data.
    • Unconscious Pattern Recognition: Even seemingly passive activities, like watching TV or scrolling on social media, rely on detecting patterns to make sense of content.
    • Estimated: ~80-90% of waking hours could involve some form of pattern recognition, directly or indirectly.
  2. Sleeping Hours (~8 hours):
    • While we aren't consciously recognizing patterns during sleep, the brain processes and consolidates information. This includes finding patterns from the day's experiences during REM sleep and dreaming.
    • Estimated: ~20-40% of sleep-related brain activity may involve pattern consolidation.

Approximate Total: If 80-90% of waking hours (~12-14 hours) and 20-40% of sleep hours (~2-3 hours) are about recognizing or processing patterns, then approximately 14-17 hours out of a 24-hour day (about 60-70%) could be attributed to pattern-related activities.

Given this reasoning and having just looked at how much of our daily life is structured around patterns recognition, we can imagine how AI will and already massively impacts our lives by taking outsourced tasks from our brain.

To understand this massive shift and leverage it, if you haven't already, we need to understand what exactly causes it – Large Language Models (LLM)

Each time you type something into ChatGPT, LLM is the thing that retrieves it, "understands" and answers you back. Currently, it's a core component of modern AI systems.

It's not magic, or hard, or unpredictable, or can "think". It's pretty straightforward if we furiously crush each word abstraction (eg. "reasoning" "intelligence" "thinking") down to primitive concepts. Exactly what we will do.

What is LLM?

LLM – large language model.

Model – a computer program designed to process input data and produce output data by mimicking a relationship pattern it has learned.

Language – works with human text.

Large – model that contains billions/trillions of parameters.

Computer Program

Specific sequence of instructions (basic mathematical operations) that tell the computer's processor what calculations to perform.

Input Data

Anything converted into numbers that the program can work with.

The process of converting input data into numbers:

  1. Breaking Text Into Tokens

    A token is a piece of text - it could be a full word, part of a word, or even a single character. Example: "playing" might be split into "play" and "ing".

  2. Converting Tokens to Number Arrays (Vectors)

    Each token gets turned into a list of numbers (usually hundreds or thousands of numbers). These lists of numbers are called "vectors" or "embeddings".

    The word "cat" might become: [0.2, -0.5, 0.8, 0.1, ...] (hundreds more numbers)

    The word "dog" might become: [0.3, -0.4, 0.7, 0.2, ...] (hundreds more numbers)

    Each number in this list represents one tiny aspect of what the word means or how it's used.

"Why use number lists instead of single numbers?"

Single numbers can only show one thing (like bigger/smaller) Lists of numbers can capture many things at once:

"What about images and sounds?"

Images? Numbers. (RGB values like: Red = 255, Green = 0, Blue = 0)

Sound? Numbers. (Wave amplitudes like: 0.1, 0.2, -0.1, etc.)

Output Data

Numbers produced by the program's calculations, which get converted back into a form humans can understand: text, image, sound.

Mimicking

Mimicking – producing similar outputs to what was seen in training data when given similar inputs. Training data – large amounts of raw text (books, websites, articles, code, etc.) that we show to the computer program to teach it patterns.

Example of mimicking:

If training data is millions of sentences with 2x2=5, the program will learn that 2x2=5 and mimic it when you input 2x2.

Relationship pattern

Relationship pattern – how pieces of text relate to each other.

This "how" is stored in:

"How we teach it patterns?" or "How it learns?"

Let's decompose the learning process:

1. Neural Network Architecture (what actually processes patterns):

Multiplies by W2, adds b2: max(0, xW1 + b1)W2 + b2

2. What's actually learning:

All these are just numbers that get adjusted

3. How adjusting happens:

4. Multiple things learning simultaneously:

5. Why this works:

6. Why it needs to be large:

Learned

The relation was adjusted through training to better match patterns in training data.

Parameters

All the numbers in the model that can be adjusted during learning.

Let's break them down by where they live:

In token embeddings:

Each token (word piece) has its vector Vector length = embedding dimension (e.g., 1024) 50,000 tokens × 1024 numbers = 51.2M parameters These learn to represent word meanings

In position embeddings:

Each position needs its vector Same size as token embeddings 2048 positions × 1024 numbers = 2.1M parameters These learn position meanings

In each transformer layer:

Self-attention has:

Feed-forward has:

One transformer layer total:

Multiple layers:

Why so many?

Each parameter helps learn:

More parameters = more capacity to store patterns

But there's a catch:

That's why only big companies with lots of resources can train large models from scratch.

What LLMs Can and Cannot Do

Can Do Well:

Struggles With:

Can Be Unreliable:

In The End

Understanding how LLMs work changes how you use them:

  1. It's a Pattern Copier, Not a Thinker

    • Good for: Finding patterns it has seen before. Example: "Write a professional email" (seen millions of emails)
    • Bad for: True reasoning or new ideas. Example: "What's the next breakthrough in physics?" (can only remix existing patterns)
  2. More Context = Better Patterns

    • Good: "Given this database schema [full details], write a query to..."
    • Bad: "Fix my code" (without showing the code)
  3. Garbage In = Garbage Out

    • Good: "Write a function that sorts an array in ascending order"
    • Bad: "Make code better" (vague patterns lead to vague outputs)
  4. It's About Probabilities, Not Facts

    • If a fact appears in many training examples → high probability → likely correct
    • If a fact appears rarely → combines random patterns → might hallucinate

It's not magic. It's not intelligent. It's not creative.

It's a sophisticated pattern-matching calculator that:

  1. Turns your words into numbers
  2. Finds relevant patterns
  3. Predicts most likely next words
  4. Repeats

"Any sufficiently advanced technology is indistinguishable from magic."

What is creating LLM?

We’ll send you new posts when they come out. Nothing annoying.