What actually happened
On June 9, 2026, Anthropic shipped Claude Fable 5 — its most capable widely-released model — alongside Claude Mythos 5, the same underlying model with several safety classifiers removed, handed only to vetted partners inside Project Glasswing. Four days later it was gone.
According to multiple status pages and reports, on June 12 at 5:21pm ET the US government issued an export-control directive suspending access to Fable 5 and Mythos 5 by any foreign national — inside or outside the United States, including Anthropic's own foreign-national employees. Because that's effectively impossible to enforce per-user in real time, Anthropic's only compliant move was to shut both models off for everyone. The incident was posted at June 13, 00:50 UTC, affecting claude.ai, the Claude API, Claude Code, and Claude Cowork (incident breakdown, Financial Express).
Anthropic publicly disagreed with the order, called it a "misunderstanding," and noted the government "did not provide specific details of its national security concern." All other Claude models — including Claude Opus 4.8 — stayed online, and developers were told to route hard reasoning tasks there in the meantime.
So what was the actual threat?
There are really two threats here, and it's worth keeping them apart.
Threat #1 — the capability itself
Mythos-class models are gated because they're genuinely dangerous in the wrong hands. In Anthropic's own framing, Mythos Preview "scrambled the global cybersecurity landscape" (Dario Amodei, "Policy on the AI Exponential"). Reporting on the program describes Mythos finding zero-day vulnerabilities in every major operating system and browser it was pointed at — including a 27-year-old flaw in OpenBSD, a system built specifically to be secure, for roughly $50 of compute (CyberHoot). That's why Anthropic locked it behind a coalition of ~40–50 (later ~200) infrastructure providers instead of shipping it to the public.
That's a real, serious story. But it's their problem to manage — and it's also the justification that gets reached for whenever a model gets locked down, slowed down, or wrapped in refusals.
Threat #2 — the one that affects you
If your work depends on a frontier API, you just watched the failure mode play out in real time. Not latency. Not price. Access. The most capable model in the world went from "generally available" to "gone" in four days — and the kill switch wasn't even pulled by the lab. The same week, Fable 5 shipped with safety classifiers that can decline your requests, while the uncensored version (Mythos 5) was reserved for a vetted few. Put those together and the pattern is clear:
- Remote revocation — your model can be switched off to satisfy a regulator, a lawsuit, or a policy change.
- Geo / nationality gating — access can be cut based on where you are or what passport you hold.
- Refusal classifiers — the model decides, on someone else's terms, what it will and won't answer.
- Capability tiering — the strong version goes to insiders; everyone else gets the safetied-down one.
None of this is a knock on safety work — frontier cyber and bio risk are real. It's just a reminder that a model behind an API is a privilege, not a possession.
The boring superpower: own the weights
Here's the part the freeze makes obvious. An open-weight model — Llama, Qwen, Mistral, Gemma, DeepSeek, gpt-oss — is just a file. Once it's on your disk, no one can reach across the internet and disable it. There's no account to suspend, no region to block, no classifier sitting between you and your own hardware. Tools like Ollama make running that file about as hard as installing an app.
How local models actually work
A 60-second mental model so the rest of this makes sense:
- Open weights. Labs release the trained model's parameters publicly. You download them once; they're yours offline forever.
- A local runtime. Ollama wraps the llama.cpp engine so the model runs on your CPU, GPU, or Apple Silicon — no network call leaves your machine.
- Quantization. Weights get compressed (e.g. 4-bit) so a 7B–32B model fits in consumer VRAM with minimal quality loss. This is why a laptop can run something genuinely useful.
- No gatekeeper. No API key, no rate limit, no refusal layer you didn't install. If you want a less-restricted model, "uncensored" / abliterated community variants exist openly.
- Privacy by default. Your prompts and data never leave your hardware — which, conveniently, is also why local models can't be subpoenaed or logged by a vendor.
Just give me the command
Install Ollama, then pull and run a model. That's the whole ceremony:
# 1. install (macOS / Linux / Windows): https://ollama.com/download
$ ollama run llama3.3
# first run downloads the weights, then drops you into a local chat
# a smaller, fast model for an 8GB GPU or a MacBook
$ ollama run qwen2.5:7b
# list what you've got, serve it to your own apps on localhost
$ ollama list
$ ollama serve # OpenAI-compatible API at http://localhost:11434
Not sure what your machine can handle? That's exactly what the rest of the
Yollama hub is for — a VRAM calculator that turns your GPU into an exact
ollama pull command, plus a hardware scanner
that recommends the models your machine can actually run.
The honest trade-off
Local isn't magic. A 14B model on your desk is not Fable 5, and pretending otherwise helps no one. Frontier models are still meaningfully smarter at the hardest reasoning and long-horizon agentic work. What you get in exchange for that gap is permanence, privacy, and control: a model that can't be revoked, won't phone home, and answers on your terms. For a huge share of real tasks — drafting, summarizing, coding help, classification, RAG over your own docs — a well-chosen 7B–32B model is already the efficiency sweet spot.
The smart posture isn't "local instead of frontier." It's frontier for the peaks, local for the floor — so that when the next model gets pulled offline at 5:21pm on a Friday, your work doesn't stop with it.
Censorship and shutdowns of the big models aren't a bug in someone else's product — they're a recurring feature of renting intelligence. The hedge is simple and it's been sitting in plain sight: run it local.
FAQ
Is Claude Fable 5 down right now?
As of June 12–13, 2026, yes — Fable 5 and Mythos 5 are suspended globally to comply with a US export-control order. Other Claude models, including Opus 4.8, remain available.
What is Claude Mythos?
A frontier Anthropic model line with state-of-the-art cybersecurity and biology capability, distributed only to vetted partners via Project Glasswing. Mythos 5 is the same model as Fable 5 with several safety classifiers removed.
Can a local AI model be banned or shut off?
Future downloads can be blocked, but a model already on your disk and run with Ollama has no remote kill switch, no account to suspend, and no geo-gate. That's the structural advantage of owning the weights.
What's a good local model to start with?
For most people: llama3.3 if you have the VRAM, or qwen2.5:7b / gemma2:9b on lighter hardware. Use the Yollama VRAM calculator to match a model to your GPU.
Is Yollama affiliated with Ollama?
No. Yollama is an independent, community hub for the local-LLM ecosystem and is not affiliated with, endorsed by, or sponsored by Ollama or Ollama Inc. We reference "Ollama" descriptively because it's the open-source runtime this whole space is built on.
Sources
- Anthropic — Claude Fable 5 and Claude Mythos 5
- Dario Amodei — Policy on the AI Exponential (June 2026)
- CyberHoot — Claude Mythos & Project Glasswing
- Financial Express — Fable/Mythos taken offline after govt intervention
- DEV — Why Claude Fable 5 was suspended 4 days after launch
Disclaimer: Yollama is an independent, community-run resource for the local-LLM and Ollama ecosystem. It is not affiliated with, endorsed by, or sponsored by Ollama, Ollama Inc., Anthropic, or any model provider named here. Product names and trademarks belong to their respective owners and are used here for identification and commentary only. This article is reporting and analysis, not legal, security, or investment advice.