Frontier Models
Anthropic's project Glasswing and the coming cybersecurity reset
Anthropic just announced Claude Mythos Preview, a model the company says is so capable at finding software vulnerabilities that it is not releasing it publicly. Instead, through Project Glasswing, Anthropic is giving $100 million in Claude credits to a consortium of defensive security teams at companies including Cisco, Broadcom, Microsoft, Apple, and Amazon.
On the most recent episode of Hard Fork, hosts Kevin Roose and Casey Newton discuss the implications: we may be heading into a period where huge amounts of critical software need patching, and where users should expect more updates, reinstalls, and general security friction. Kevin says it seems plausible that “in the next kind of six-ish months, every major piece of software in the world is going to need to be patched, rewritten, and re-released,” though Casey is more cautious, arguing that the most likely outcome is a scramble to fix the most critical infrastructure first.
It also reopens something that has been mostly closed since GPT-2: a meaningful gap between what AI labs have internally and what the public can use.
Hard Fork | We have to talk about Anthropic's Mythos
Watch Time: 22 minutes
Impacts of AI
The growing gap in understanding AI capability
Andrej Karpathy posted an explanation this week for why people keep talking past each other on AI, and Aarron Levie offered a complementary view from inside the enterprise.
Karpathy argues that AI capability is increasingly “peaky.” The biggest gains this year have landed in technical domains like coding, math, and research, where reinforcement learning has clearer, verifiable rewards and where companies see the most value. Meanwhile, more everyday uses such as search, writing, and advice have improved less dramatically. So two people can both be reporting honestly while describing very different realities: one is judging AI from free or older models, while another is using frontier paid tools like Codex or Claude Code in technical work. Karpathy’s point is that these groups are often speaking past each other.
Levie makes a parallel argument from the enterprise side. He says AI adoption is currently “a tale of two cities”: most people are still using chat tools, while a smaller group is deploying agents that do longer-running work and can generate much larger gains. But he argues that broader rollout is constrained by organizational realities: messy data, undocumented workflows, legacy systems, and the need to solve for context, compliance, security, and change management.
Andrej Karpathy | The growing gap in understanding AI capability
⚡ Quick Read (1 minute)
Aaron Levie | AI adoption is a tale of two cities
⚡ Quick Read (1 minute)
Designing with AI
DESIGN.md files: context, not magic
The design community has spent the last few weeks buzzing about DESIGN.md files—a plain markdown file of brand colors, fonts, and component rules that Google’s Stitch reads before generating UI. The excitement is a little funny, because this is the same pattern as CLAUDE.md (for Claude Code), .cursorrules (for Cursor), and AGENTS.md (for Codex). But Google Stitch's implementation of DESIGN.md brought this convention to the forefront for the design community.
So what is DESIGN.md? From TJ Pitre:
"It’s a markdown file. That’s it. There’s no special file format, no schema, no build tooling. It’s a plain text document with structured design instructions that an AI agent reads as context before doing its work."
TJ conducted a controlled experiment to demonstrate how context matters: two Claude Code sessions, identical instructions, and the same prompt to build a dashboard. The only difference was that one directory had a DESIGN.md with explicit dark-theme rules, hex values, and nav styling. Without the file, the AI produced a solid light-theme dashboard with safe blue accents. With the file, it produced the dark, purple-accented version it had been instructed to make. It followed the rules.
That result is not magic; it's context. A DESIGN.md is still just a flat file, with no variable references and no component APIs. It is a useful starting point for prototyping and for giving AI tools design context, but not a replacement for a real design system.
TJ Pitre | DESIGN.md files: context, not magic
☕ Medium Read (6 minutes)
Frontier Models
Human versus LLM jagged intelligence
"Jagged intelligence" has become a shorthand in AI discussions for the uneven capability profile of models: they can be strikingly strong in some areas and weak in others nearby.
Alex Imas posted radar charts from a jagged-intelligence study he ran on six people, showing their spiky profiles across verbal, math, science, patterns, culture, and econ. His point: humans are jagged too, and we’re simply more accustomed to their unevenness.
Ethan Mollick then responded by arguing that AI’s jaggedness is harder to deal with for three reasons: its weaknesses are not always intuitive or identifiable in advance, frontier LLMs tend to share similar blind spots, and the jagged frontier keeps moving outward, which means the shape you're designing around today won't be the shape six months from now.
Ethan Mollick | Things that make the jagged intelligence of AI harder to deal with
⚡ Quick Read (1 minute)
That’s it for this week.
Thanks for reading, and see you next Wednesday with more curated AI/UX news and insights. 👋
All the best, Heidi

