“Any sufficiently advanced technology is indistinguishable from magic”
25 January 2025 •
Nick WolfArtificial Intelligence is the second greatest pattern finding machine.
The first one sits in your head.
Our brain has been grinding the game of "find the pattern or become lunch" for millions of years. It's like that one friend who's been playing Candy Crush since it came out and is now embarrassingly good at matching patterns. But instead of matching candy, it was improving at matching patterns leading to survival.
When that weird sound in the dark turns out to be a tiger? Brain is speedrunning the "match threatening sound to dangerous predator" game in milliseconds (high score = staying alive)
When you notice three different Twitter threads about parents struggling with AI homework detection, the rising prices of Chegg subscriptions, and teachers complaining about ChatGPT all in one morning? That's you speedrunning the "emerging market gap pattern matching" game in milliseconds (high score = launching your AI-learning tools platform before the education tech giants catch up)
After billions of playing sessions, we ended up with a thing that is casually processing thousands of patterns every second, like it's no big deal. And it became incredible at it: connecting faces to voices, sounds to meanings and actions to reactions.
But why are patterns such a big deal?
If we ask the second greatest pattern finding machine on "how much of our life is about finding patterns" it outputs fascinating data:
If we assume that human cognition inherently involves pattern recognition during most waking hours, we can estimate its prevalence. Here's a plausible breakdown:
- Waking Hours (~16 hours):
- Conscious Activities (Work, Study, Conversations, etc.): These heavily involve recognizing patterns in language, behavior, and data.
- Unconscious Pattern Recognition: Even seemingly passive activities, like watching TV or scrolling on social media, rely on detecting patterns to make sense of content.
- Estimated: ~80-90% of waking hours could involve some form of pattern recognition, directly or indirectly.
- Sleeping Hours (~8 hours):
- While we aren't consciously recognizing patterns during sleep, the brain processes and consolidates information. This includes finding patterns from the day's experiences during REM sleep and dreaming.
- Estimated: ~20-40% of sleep-related brain activity may involve pattern consolidation.
Approximate Total: If 80-90% of waking hours (~12-14 hours) and 20-40% of sleep hours (~2-3 hours) are about recognizing or processing patterns, then approximately 14-17 hours out of a 24-hour day (about 60-70%) could be attributed to pattern-related activities.
Given this reasoning and having just looked at how much of our daily life is structured around patterns recognition, we can imagine how AI will and already massively impacts our lives by taking outsourced tasks from our brain.
To understand this massive shift and leverage it, if you haven't already, we need to understand what exactly causes it – Large Language Models (LLM)
Each time you type something into ChatGPT, LLM is the thing that retrieves it, "understands" and answers you back. Currently, it's a core component of modern AI systems.
It's not magic, or hard, or unpredictable, or can "think". It's pretty straightforward if we furiously crush each word abstraction (eg. "reasoning" "intelligence" "thinking") down to primitive concepts. Exactly what we will do.
LLM – large language model.
Model – a computer program designed to process input data and produce output data by mimicking a relationship pattern it has learned.
Language – works with human text.
Large – model that contains billions/trillions of parameters.
Specific sequence of instructions (basic mathematical operations) that tell the computer's processor what calculations to perform.
Anything converted into numbers that the program can work with.
The process of converting input data into numbers:
Breaking Text Into Tokens
A token is a piece of text - it could be a full word, part of a word, or even a single character. Example: "playing" might be split into "play" and "ing".
Converting Tokens to Number Arrays (Vectors)
Each token gets turned into a list of numbers (usually hundreds or thousands of numbers). These lists of numbers are called "vectors" or "embeddings".
The word "cat" might become: [0.2, -0.5, 0.8, 0.1, ...] (hundreds more numbers)
The word "dog" might become: [0.3, -0.4, 0.7, 0.2, ...] (hundreds more numbers)
Each number in this list represents one tiny aspect of what the word means or how it's used.
"Why use number lists instead of single numbers?"
Single numbers can only show one thing (like bigger/smaller) Lists of numbers can capture many things at once:
"What about images and sounds?"
Images? Numbers. (RGB values like: Red = 255, Green = 0, Blue = 0)
Sound? Numbers. (Wave amplitudes like: 0.1, 0.2, -0.1, etc.)
Numbers produced by the program's calculations, which get converted back into a form humans can understand: text, image, sound.
Mimicking – producing similar outputs to what was seen in training data when given similar inputs. Training data – large amounts of raw text (books, websites, articles, code, etc.) that we show to the computer program to teach it patterns.
Example of mimicking:
If training data is millions of sentences with 2x2=5, the program will learn that 2x2=5 and mimic it when you input 2x2.
Relationship pattern – how pieces of text relate to each other.
This "how" is stored in:
"How we teach it patterns?" or "How it learns?"
Let's decompose the learning process:
1. Neural Network Architecture (what actually processes patterns):
Each transformer layer contains:
Multiplies by W2, adds b2: max(0, xW1 + b1)W2 + b2
2. What's actually learning:
All these are just numbers that get adjusted
3. How adjusting happens:
Backward pass (the actual learning):
4. Multiple things learning simultaneously:
5. Why this works:
Numbers that help correct predictions:
6. Why it needs to be large:
The relation was adjusted through training to better match patterns in training data.
All the numbers in the model that can be adjusted during learning.
Let's break them down by where they live:
In token embeddings:
Each token (word piece) has its vector Vector length = embedding dimension (e.g., 1024) 50,000 tokens × 1024 numbers = 51.2M parameters These learn to represent word meanings
In position embeddings:
Each position needs its vector Same size as token embeddings 2048 positions × 1024 numbers = 2.1M parameters These learn position meanings
In each transformer layer:
Self-attention has:
Feed-forward has:
One transformer layer total:
Multiple layers:
Why so many?
Each parameter helps learn:
More parameters = more capacity to store patterns
But there's a catch:
That's why only big companies with lots of resources can train large models from scratch.
Answer Knowledge Questions
Input: "What is the capital of France?"
Output: "Paris"
Why: strong relationship pattern exists between "capital France" and "Paris" because this pattern appears frequently in training data.
Complete Common Patterns
Input: "To make an omelet, you need to break some"
Output: "eggs"
Why: very strong relationship pattern exists between "break" and "eggs", plus context connections from cooking-related words.
Writing Style Adaptation
Input: "Explain quantum physics like I'm 5"
Output: [Simple explanation using basic words]
Why: structure relationship pattern exists after "like I'm 5", simpler words get higher scores.
Basic Math
Input: "What is 123,456 × 789,012?"
Output: [Often wrong]
Why: no direct relationship pattern for specific large numbers. Has to try combining patterns about multiplication, which often leads to errors.
How AI companies are solving it: adds more computer programs on top of LLM that can actually run code and output numbers. LLM doesn't do it.
Current Events
Input: "Who won yesterday's game?"
Output: [Makes up answer]
Why: can only use relationship patterns from training data - can't access new information.
How AI companies are solving it: add more computer programs on top of LLM that can access internet.
Logic Consistency
Input: "John is taller than Mary. Mary is taller than Pete. Who is shortest?"
Output: [Sometimes gets confused]
Why: follows text connections rather than understanding logical relationships. May have conflicting connections that lead to inconsistent answers.
How AI companies are solving it: trains LLM to break down logical problems step by step by using combination of learning with other programs.
Breaking Down Steps
Instead of trying to solve everything at once, companies train LLMs to split problems into smaller pieces, like this:
Chain of Thought
Add special instructions in training that teach LLMs to "show their work":
Input: "John is taller than Mary. Mary is taller than Pete. Who is shortest?"
Step 1: Let's list what we know
Step 2: Combine the facts
Step 3: Answer
Multiple Attempts
Programs LLMs to solve the same problem several different ways and compare answers, like having multiple students check each other's work:
Fact Hallucination
Input: "What did Einstein eat for breakfast?"
Output: [Makes up detailed but false answer]
Why: when no strong direct connections exist, combines weaker connections about Einstein, breakfast, and typical foods - creating plausible but false information.
How AI companies are solving it:
Knowledge Checking (RAG - Retrieval Augmented Generation)
Think of this like having a fact-checker standing next to the LLM. When you ask "What did Einstein eat for breakfast?" Before answering, the LLM checks a trusted database. If no reliable information exists, it says "I don't have verified information about Einstein's breakfast habits" instead of making things up.
Self-Checking Questions
Companies train LLMs to question their own answers:
Confidence Levels
Companies train LLMs to rate their confidence in each part of their answers:
The LLM then adjusts its response based on these confidence levels, being more direct with verified information and more cautious with uncertain details.
Context Forgetfulness
Input: "What's her favorite color?"
Output: [Mentions different color than stated earlier]
Why: limited context window means older connections may be lost or overridden by more recent ones.
How AI companies are solving it: nothing special, buys more computational power using billions of dollars, innovate and optimize algorithms. But more power seems to be a better approach.
Understanding how LLMs work changes how you use them:
It's a Pattern Copier, Not a Thinker
More Context = Better Patterns
Garbage In = Garbage Out
It's About Probabilities, Not Facts
It's not magic. It's not intelligent. It's not creative.
It's a sophisticated pattern-matching calculator that:
"Any sufficiently advanced technology is indistinguishable from magic."
We’ll send you new posts when they come out. Nothing annoying.