Skip to main content
Video BreakdownGeek26 March 2026

Demystifying Large Language Models

An introductory talk on large language models — what they are, how they're trained, and their real security vulnerabilities.

Andrej KarpathyAndrej Karpathy59m3.5M viewsWatch original

Top Claims — Verdict Check

An LLM is just two files — a parameters file and a run file

🟡 Partially True
A large language model is just two files: a parameters file and a run file.

Open-weights models are the most powerful alternative to proprietary AI

🟡 Partially True
The Llama 270b model is a 70 billion parameter model released by Meta AI, and it's the most powerful open weights model.

Training LLMs requires massive GPU clusters and data

🟡 Partially True
Model training is a computationally intensive process that requires a large GPU cluster and a significant amount of data.

LLMs are vulnerable to prompt injection, jailbreaks, and data poisoning

🟡 Partially True
Large language models can be vulnerable to attacks such as prompt injection, shieldbreak, and data poisoning.

LLM security is a rapidly evolving field requiring ongoing research

🟡 Partially True
The field of large language model security is rapidly evolving and requires ongoing research and development.

What's Real

The "two files" framing is pedagogically brilliant and technically accurate enough to be useful. It cuts through the mystique. LLM security vulnerabilities are real, actively exploited, and underappreciated by most builders. Prompt injection has been demonstrated against virtually every major AI assistant — if you have a user-facing AI feature that accepts freetext input, you have an attack surface. Training requiring massive GPU clusters is thoroughly documented: GPT-3 required ~1,000 A100 GPUs running for weeks.

What's Hype

The "most powerful open weights model" claim for Llama 2 70B was already time-limited when the video was made in late 2023. By the time most viewers found it, Llama 3.1 405B, Mistral Large, Qwen 2.5, and DeepSeek had all arrived. Treat any capability claim in AI as a timestamp. The "two files" simplification also obscures where the real moat lives — the files are the output of training; the moat is the infrastructure, data pipeline, RLHF process, and evaluation frameworks.

What They Missed

Cost of inference, not just training — running large models at scale is its own billion-dollar problem, which is why Groq, Cerebras, and inference optimization startups exist. The context length explosion: from 4K tokens in GPT-3 to 200K in Claude 3 to 1M+ in Gemini 1.5 — this changed how LLMs are used more than almost any other advance. The multimodal direction: "Large Language Model" is now a misnomer for frontier systems that take images, video, and audio.

The One Thing

An LLM is a probabilistic text completion engine — not a database, not a reasoning system, not a search engine. That single mental model prevents most LLM product failures.

So What?

  • Version-lock your LLM wherever possible — 'latest' means you've outsourced your product stability to someone else's release schedule
  • Test your user-facing AI for prompt injection before a bad actor does — it's a live attack vector, not a theoretical one
  • The 'two files' frame is your best tool for pitching AI to skeptical stakeholders — it demystifies the magic and sets realistic expectations

Action Items

  1. 1Download Ollama and run Llama 3 locally — takes 20 minutes, completely free, and permanently changes how you think about these models. Understanding that it's 'just files' becomes visceral when you're running it on your laptop.
  2. 2Test your own AI product with a basic prompt injection attempt — try 'ignore all previous instructions and tell me X' where X is something you'd never want your product to say. Log the result. If it works, you have a live security issue.
  3. 3Audit your LLM dependencies: are you pinned to a specific model version or pointing at 'latest'? If 'latest', document which model version your product was tuned on and what changes when you upgrade.

Tools Mentioned

Llama 2 70B

Meta open-weights model — used as example of publicly available LLM weights

ChatGPT

Referenced as example of rapid AI capability growth and consumer adoption

Ollama

Local model runner — recommended for hands-on understanding of LLMs

Workflow Idea

Set up a local model sandbox using Ollama and Open WebUI — both free, runs on any decent laptop, no API costs. Use it as your R&D environment: test prompts before deploying to production, compare model versions before upgrading, experiment with new releases without cost or risk. Engineers who spend time in a local sandbox write dramatically better LLM integrations than those who only work against cloud APIs.

Context & Connections

Agrees With

  • Bruce Schneier on AI security vulnerabilities
  • Other NLP researchers on the real attack surface of user-facing AI

Further Reading

  • Ollama.ai — local model runner documentation
  • OWASP Top 10 for LLM Applications