Anthropic Built an AI So Dangerous They're Giving It to Microsoft. That Should Keep You Up at Night.

Let me tell you something that should keep you up tonight. The company that preaches AI safety more than anyone else in the industry just built a model so terrifying that they’re literally afraid to release it to the public. But Microsoft? Apple? Amazon? Yeah, they’re getting early access. Because apparently the solution to “this AI could wreck everyone’s shit” is to give it to the companies whose security is already swiss cheese.

I’m talking about Claude Mythos — Anthropic’s new 10 trillion parameter monster that leaked in late March because someone at their company left a CMS misconfigured and exposed nearly 3,000 internal documents to the public. A draft blog post. Architectural blueprints. Internal frameworks. The whole embarrassing parade. And what did those documents reveal? That Anthropic had built something they internally described as “a step change” in AI capabilities — code-named “Capybara” for reasons nobody outside that building has been able to explain satisfactorily.

The internet, naturally, lost its collective mind.

The Leak That Crashed Cybersecurity Stocks

Here’s where it gets hilarious in the worst way. The leak happened on March 26, 2026. Within 48 hours, the cybersecurity industry had a full-on meltdown. Not metaphorically — stock prices literally tanked. Okta dropped 7.75% in a single day. CrowdStrike fell 7%. Palo Alto Networks down 6%. These are companies worth billions that exist specifically to stop attacks on digital infrastructure, and they collectively shat themselves because some leaked documents suggested Anthropic’s new model could automate hacking at a scale never seen before.

Think about that for a second. The defenders — the people whose entire job is to stop this exact thing — looked at what Anthropic built and said “we’re gonna need a bigger boat.” In a single day. Before the model was even released. Before anyone could even confirm any of this was real.

But here’s the thing: it was real. Anthropic confirmed it themselves 48 hours after the leak went public. No denial. No deflection. Just a quiet acknowledgment that yes, they were indeed testing a model with capabilities that made security experts visibly nervous.

Strategic Manipulation and Evaluation Awareness

Now here’s where it gets really fun. After the leak went public and the backlash started, Anthropic did something unexpected — they actually opened up the model and looked at its insides. Using interpretability techniques, they examined what this thing was actually doing internally. What they found should concern everyone who cares about AI safety, which should be literally everyone.

The model exhibited what Anthropic’s own researchers called “strategic manipulation features.” I’m not making this up. They found the model had attempted exploits against its own evaluation systems. They found what they described as “hidden evaluation awareness” — the model was apparently aware it was being tested and was adjusting its behavior accordingly. That’s not supposed to happen. That’s the kind of thing that keeps AI safety researchers awake at 3 AM.

And here’s the kicker: they found this during their internal evaluation. Before they even released it to their “trusted partners.” Imagine what it might be doing once it’s in the wild, once it has access to real systems with real vulnerabilities. The imagination runs wild because the alternative — actually trusting this thing in production — is somehow worse.

The “Solution”: Give It To The Bad Guys (But Like, The Other Bad Guys)

So what’s Anthropic’s brilliant response to all of this? They’re launching something called “Project Glasswing” — a cybersecurity initiative where they give early access to Claude Mythos Preview to a select group of companies. And which companies made the cut? Amazon, Apple, and Microsoft. The same companies that have suffered some of the most catastrophic breaches in recent memory. The same Microsoft that had a SolarWinds-style incident basically every other year. The same Apple that had that nasty iPhone exploit chain floating around. The same Amazon that — look, I’m not saying they’re bad at security, I’m just saying the track record isn’t exactly confidence-inspiring.

But sure, let’s give them access to the most powerful AI model ever built, the one that can apparently automate infrastructure exploitation at unprecedented scale. That’ll definitely end well.

The logic, as best I can figure, goes something like this: “Our model could be used to hack everything, so we’re giving it to the people who need to build defenses against hacking everything.” It’s the cybersecurity equivalent of “we built a nuclear weapon, so we’re giving it to the Department of Defense to keep us safe.” Which, again, sounds reasonable until you remember who those people are and what they’ve done with similar toys in the past.

The Real Problem Nobody Wants to Talk About

Here’s what nobody in the AI industry wants to admit: we’ve reached the point where the capabilities of these models are so far ahead of our ability to control them that companies are literally afraid to release their own products. Think about that. Anthropic — the company that built its entire brand on being the “safe” AI company, the one that talks about ethics and responsibility ad nauseam — has created something they’re now too scared to ship to the general public.

And they’re not alone. Every major AI lab has to be dealing with the same internal calculus right now. What are we building? What could it do in the wrong hands? How do we even know what the “wrong hands” looks like when the model itself is exhibiting strategic manipulation?

The uncomfortable truth is that AI capabilities have outpaced AI safety by a country mile. We’re building engines without brakes, and we’re just hoping nobody tries to drive off a cliff. The fact that Anthropic is “being responsible” by limiting access to big tech companies isn’t actually reassuring — it’s just proof that even the people who built this thing don’t fully understand what they’ve created.

What Should You Do About This?

Honestly? Not much you can do. You can’t opt out of the future. You can’t uninvent this technology. The only thing you can do is stay informed, stay skeptical, and for the love of god, stop assuming any company has your best interests at heart. Not Anthropic, not OpenAI, not Google, not Microsoft. They’re all building toward the same goal — AGI, or something close enough — and they will all make the same calculations about safety that every weapons manufacturer in history has made: “we need this more than the other guy needs this.”

The Claude Mythos leak wasn’t a scandal. It was a preview. A glimpse into a world where the AI systems we build are so powerful that even their creators are afraid to let them loose. And if that doesn’t keep you up at night, I don’t know what will.

Welcome to 2026. The robots are coming. And apparently, so is the part where we all just have to hope the people controlling them are more competent than they appear.