
The research also concludes that while Mythos is performant, smaller AI models can achieve similar results to a good standard, while also being cheaper to run. That means that for some, those cheaper models might make more sense to run than Mythos in cybersecurity contexts.
But Mythos might not be operating at peak capability yet. According to another analysis by the UK's AI Security Institute (AISI) , Mythos is the most capable AI model when it comes to its own cybersecurity benchmarks . It doesn't perform dramatically better than other models across all tasks, but when it comes to more complex vulnerability discoveries and exploitations, it pulls ahead of the pack.
A part of this comes from its support for long context lengths, with larger token inputs delivering the best results. In its tests, AISI benchmarked Mythos up to 100 million tokens and found it to be the most capable at that threshold. It even postulates that it could scale further with a greater token budget.
"We expect that performance on our evaluations would continue to improve with more inference compute," AISI's report reads. "We ran the cyber ranges with a 100M token budget; Mythos Preview’s performance continues to scale up to this limit, and we expect performance improvements would continue beyond that."
It doesn't speculate how much better, whether that scaling is linear, or how far it expects the scaling to go in improving effectiveness, but it does suggest more can lead to better.
But even if Mythos is the best, and even if it can be even better with more compute power and more tokens, how much is all this going to cost?
We don't have token costs for Mythos, but considering the second-best model in AISI's tests was Claude Opus 4.6, which is already one of its more expensive models , Mythos is likely to be more expensive than that.
It may be worthwhile to spend big on a single pen-test, but it also raises questions about how economically viable it is to run long-term. How easy would it be to market such a service when Aisle's research suggests you can get most of the way there by spending far less, or even running models locally, as open-weight models get quantized?
Irregular argues that when evaluating an AI model's effectiveness in cybersecurity efforts, it needs to be weighed against the overall token cost. But an expected cost per success is a metric that Irregular suggests needs to be considered. That's where Mythos, if able to be judged more fairly against the competition, might fall down.
As part of its reveal of Mythos, Anthropic gave $100m in usage credits and $4 million open source donations to organizations to help them validate and fix the bugs discovered by Mythos. It also closed ranks and didn't release the model to the public, instead limiting it to a core group of technology companies as part of Project Glasswing.
That's great news. Fixing bugs privately, quietly, and away from the public is how security testing and improvement are usually handled. If Claude Mythos is a skeleton key, you want companies to be able to protect their products. While this initial $100 million in usage comes free, the next hit might cost businesses big, depending on Mythos's final model pricing.
So, does this mean that Project Glasswing is a mere marketing stunt? Not quite. It follows industry standard Coordinated Vulnerability Disclosures (CVD), and the model, when analyzing multiple reports, is one of the most performant AI models for cybersecurity out there.
But, following its rally of headlines around pushing back against the Pentagon , Anthropic now wants to help secure its place in the cybersecurity industry by graciously offering up free compute resources to those partaking in Project Glasswing.
But, you also have to consider if that grace is coming at a high cost for Anthropic. As demand for AI explodes , the companies serving large, powerful models need to be equipped with the compute resources to serve them. For a presumably heavier, more computationally expensive model like Mythos, that might put a strain on Anthropic's already outage-prone AI models, which have had a 98.4% uptime rate in the last 90 days as of the time of writing. Four nines, or 99.99%, is considered enterprise -grade uptime; in other words, that's the standard Anthropic needs to meet if it wishes to court SaaS and Cybersecurity whales with Mythos.
While that may not sound like much, it equates to almost twelve hours of downtime per month, which is poor by cloud service standards. For OpenAI's API, you get 99.99% uptime — and when you're in the business of selling tokens, that makes a huge difference. For Anthropic, it means that the company must also seek out further computational heft as soon as possible to plug the gap, as it did with its recent Broadcom deal .
So, the real conclusion to draw, now that the dust has settled somewhat on the grand Mythos reveal, is that it indeed might be one of the best overall AI models for cybersecurity, but it might not be the best model for every single job. If it's expensive, other models may be able to get to a similar level of quality while being computationally cheaper.
And Anthropic, for all of its bluster about the model, still cannot serve its currently-released models to industry-standard levels, discounting Mythos. So, all of these factors combine to put Anthropic in a difficult position. As compute remains constrained, and AI usage explodes globally, we can only wait and watch to see how (and where) the chips fall. Even if Anthropic can court the customers that it wants to with Mythos, it'll still need to keep up with insatiable compute demand.
Jon Martindale is a contributing writer for Tom's Hardware. For the past 20 years, he's been writing about PC components, emerging technologies, and the latest software advances. His deep and broad journalistic experience gives him unique insights into the most exciting technology trends of today and tomorrow. ","collapsible":{"enabled":true,"maxHeight":250,"readMoreText":"Read more","readLessText":"Read less"}}), "https://slice.vanilla.futurecdn.net/13-4-20/js/authorBio.js"); } else { console.error('%c FTE ','background: #9306F9; color: #ffffff','no lazy slice hydration function available'); } Jon Martindale Freelance Writer Jon Martindale is a contributing writer for Tom's Hardware. For the past 20 years, he's been writing about PC components, emerging technologies, and the latest software advances. His deep and broad journalistic experience gives him unique insights into the most exciting technology trends of today and tomorrow.
Key considerations
- Investor positioning can change fast
- Volatility remains possible near catalysts
- Macro rates and liquidity can dominate flows
Reference reading
- https://www.tomshardware.com/tech-industry/artificial-intelligence/SPONSORED_LINK_URL
- https://www.tomshardware.com/tech-industry/artificial-intelligence/anthropics-claude-mythos-might-be-the-best-overall-ai-model-for-cybersecurity-but-cheaper-models-can-attain-similar-results-research-shows-cross-examination-of-the-frontier-model-raises-questions-on-uptime-and-reliability#main
- https://www.tomshardware.com
- Rockstar Games confirms it was hacked by malicious group — 'ShinyHunters' takes credit, gives until April 14 to pay ransom or it will release confidential data
- Alienware AW2726DM 27-inch QHD 240 Hz QD-OLED gaming monitor review: A price breakthrough for desktop OLED
- Small Missouri town ousts half its city council after $6 billion AI data center approval — petition calls for mayor's removal as frustration (and violence) over
- China has spent 3.6 times more than the US on chipmaking subsidies over the past decade — $142 billion and counting, easily outweighs CHIPS Act
- Linux lays down the law on AI-generated code, says yes to Copilot, no to AI slop, and humans take the fall for mistakes — after months of fierce debate, Torvald
Informational only. No financial advice. Do your own research.