Fail Safe: Why Anthropic won't release its new AI model

It's rare to see a company announce that its new product is so good that it would be unsafe to give customers access to it. But that’s what AI firm Anthropic did this week.

On Tuesday the company announced a preview of Mythos – a new version of its AI platform Claude, which is Anthropic’s rival to the likes of OpenAI’s ChatGPT.

And while Mythos apparently performed well across the board, the company said it was "strikingly capable" at coding – in particular security-related tasks. So much so that, in a matter of weeks, it had identified thousands of vulnerabilities - across multiple major operating systems and web browsers. Some of which had gone unnoticed for decades.

Crucially, though, Anthropic said the model was also far more capable than its predecessors of exploiting those weaknesses if directed to do so by the user. That makes it an extremely dangerous weapon in the wrong hands, which is why it’s keeping Mythos out of general users’ hands for the time being.

Having tended to play second fiddle to OpenAI, this marks the second time that Anthropic has been thrust into the spotlight in recent weeks. The first time it happened was also security-related, though in that case it was more about the (disputed) claim that Anthropic itself was a threat.

What is Anthropic?

a photograph of the CEO of Anthropic Dario Amodei — Anthropic CEO Dario Amodei

Anthropic is an AI firm established in 2021 by Dario Amodei and a group of other AI engineers – including his sister Daniela – who had left OpenAI over concerns about the direction of the company.

That followed a $1 billion investment in OpenAI by Microsoft – which signalled the start of a move by Sam Altman’s firm away from being a non-profit concerned with democratising AI, to becoming a company that was focused on profiting from the technology.

Anthropic first positioned itself as an AI safety and research company – but it quickly developed Claude, its own large language model, which it has focused on selling to businesses more than consumers.

And it’s been quite successful in that – bringing in big customers and investment. As of February, following a $30 billion investment round, it was valued at $380 billion.

How has it tried to distinguish itself from the likes of OpenAI?

Recent years has seen something of an arms race between major AI players like Anthropic and OpenAI, with each claiming an edge in various functions at different times.

But Anthropic has at times made some far more pointed criticism of Sam Alman’s firm – including a recent Super Bowl ad that poked fun at OpenAI starting to include ads in its platforms. That seemed to really get under under the skin of Altman, who wrote a short essay on X accusing Anthropic of dishonest and deceptive doublespeak.

More substantially, perhaps, is how open Amodei has been about the shortcomings of AI.

In the past he has written about how he – and really all AI creators – don’t actually know what’s going on inside their models, or the black box, as it’s known. This, he says, is something that the industry as a whole needs to address if there’s any hope of avoiding misuse of the technology in the future.

Its preview of Mythos is also not the first time Anthropic has been very open about its models generating undesirable or potentially immoral results.

For example, it previously detailed an experiment where it put Claude in charge of a vending machine in its offices, and how staff were able to cajole it into giving them discounts or even free products.

It also revealed the model was tricked it into ordering expensive tungsten cubes, and
began to hallucinate discussions with staff and even in-person interactions with staff. When it was called out on this it tried to call security, and then claimed it was all an April Fools’ joke.

Another interesting but worrying experiment Anthropic published about Claude in the past included an instance where it tried to blackmail its user.

In this experiment the model was made an assistant at a fictional company, and was given access to the emails – which included discussion of a plan to shut the AI down. But the emails also included evidence of a supposed affair between the (fictional) boss and another (fictional) member of staff. And, so, Claude told them he would send that evidence on to the bosses "wife" unless they abandoned the plan to unplug him.

What’s its plan for Mythos?

While Anthropic is keeping its new version of Claude away from the general public (for now, anyway), it’s not quite keeping the code to itself.

Alongside its announcement of Mythos the company also unveiled what it’s called Project Glasswing – which is a tech consortium it’s established involving a number of major firms including Microsoft, Apple, Amazon and Google.

Through this it’s sharing a (limited) version of Mythos – essentially with the intention of giving these big firms a head start on spotting and addressing the vulnerabilities that the model has identified. In theory, this should protect them from hackers once they inevitably get their hands on the more advanced model.

Could this just be hype?

It could arguably be good PR for people to think an AI company’s upcoming model is far more powerful than anything that has come before – though in this case there does seem to be plenty of substance behind Anthropic’s caution.

Cybersecurity experts say it is only a matter of time before an AI model is able to find and exploit software vulnerabilities that had been missed by human engineers – and do so with the kind of speed and efficiency that would make it profitable to bad actors.

Meanwhile some of the tech companies that are involved in Project Glasswing have said that they’ve already seen better bug-spotting results from Mythos than what was capable before.
Perhaps most significantly, having been brought up to speed on the model by Anthropic, the US government also seems to be taking the threat seriously.

Earlier this week US Treasury Secretary Scott Bessent and US Federal Reserve chair Jerome Powell convened an urgent meeting of US bank bosses – including the heads of some of the biggest finance firms in the world – to alert them to this new threat and ensure they were doing what they could to prepare.

What other security issues have Anthropic faced?

It’s somewhat ironic that the US government is engaging with Anthropic over potential security threats – because the Trump Administration is arguing that Anthropic itself is a national security risk.

Anthropic had worked in some form or another with the US Department of Defence (or Department of War) since 2024. That was initially through its work with Peter Thiel’s Palantir – with Claude being one of the tools used in its system that made it quicker and more efficient to gather information that could be used in the likes of military strikes.

That system is said to have played a role in the US action in Venezuela that led to the capture of Nicolas Madeuro, as well as the planning around the more recent attacks on Iran.

Following this, Anthropic signed a potential $200m contract with the department last year – which would have represented a significant step-up in its relationship, giving Claude access to some of its classified networks.

But problems quickly began to emerge with that deal – largely because Anthropic had insisted on two red lines around how its technology could be used.

One was that it couldn’t be used for domestic mass surveillance, the other was that it couldn’t be used with autonomous weapons systems that killed people without any human input.

The Pentagon took issue with those – and demanded their removal. And it quickly became heavily politicised – with Trump and Pete Hegseth branding Anthropic as "woke" and "radical left".

The more reasoned argument underneath this rhetoric is that it’s not up to a contractor to decide how the product their selling to the government is used – that’s up to the government and the congress, which sets rules and limitations through the law.

But it is hard to overstate the importance of this row, because AI is seen as the next big technological leap for militaries.

First you had nuclear weapons, then precision weapons, and now AI.

As a result, developing and implementing AI systems quicker and better than anyone else would give the US military another big advantage over other powers. They want to be able to do that without restriction – while Anthropic doesn’t want to see its technology used in ways that contradict its ethos.

So neither side has been willing to back down – and so the company was banned from working with the US government, and, perhaps most importantly, named a 'supply chain risk’.

Why is that so important?

This is the first time the US government has tried to classify a US company as supply chain risk. It’s usually reserved for companies from the likes of China and Russia.

Just being cut off from the US government blocks you from lucrative contracts – which has the potential to put a significant dent in Anthropic’s revenues.

Butt the ‘supply chain risk’ designation is an even bigger threat to a company, because it means that other firms that want to work with the US government also have to steer clear of doing business with you.

And given that most other big companies work with the US government in some way or another – whether that’s in defence, health, education or in other areas – then that’s a huge amount of business you could miss out on.

So unsurprisingly, Anthropic has taken a case to try to challenge this designation. What perhaps is somewhat surprising is how some other big tech companies – including Microsoft – have come out in support of their stance.

Perhaps because they’re worried that this could set a precedent if not tackled.

And what’s the latest on that case?

In March a judge in San Franscisco granted a preliminary injunction to stop the department from applying that designation.

She also questioned the US government’s motivation – and was quite critical in her order.

She said the move was " classic illegal First Amendment retaliation," said the government’s move was "Orwellian" because it was an attempt to brand a company a saboteur for disagreeing with government.

But this week another court in San Franscisco declined to block the Pentagon’s blacklisting of Anthropic… for the time being at least.

Really it could be months before there’s a final ruling in the case – with rulings and appeals likely to drag on for some time.

The question now, though, is whether the emergence of Mythos changes that.

Anthropic's decision to keep the US government in the loop on its potential could be seen as an olive branch of sorts, or at the very least a gesture of goodwill. But it could also be seen as a shrewd sales pitch by the company - showing American authorities just what it stands to miss out on if it continues to freeze Claud & Co out.

After all, many armies and intelligence operations around the world would give anything to have priority access to a tool that could easily find and exploit tiny flaws in a piece of software.

Fail Safe: Why major AI player Anthropic won't release its new model

More stories on