Mythos found the bugs. Who funds the fixes?

Mythos caused a ruckus in security. If top models can trigger a wave of major exploits, who's going to cover the costs when they hit open source?

Evan Larsson

/ 10 min read

Canadian Prime Minister Mark Carney invoked Mythos, Anthropic's infamous Project Glasswing model, during a Q&A at the Economic Club of New York, a room filled with some of America's deepest pocketbooks.

While speaking on what he saw as three core factors driving broad inflationary market pressures, Carney named the third pillar:

"The cost of cyber protection. [...] Everyone knows what Mythos is in this room. And that's - I think - going to be the very early stages of a big operating spend that's going to be required to address those issues. The marginal cost of software is no longer zero, it's actually quite material, and it's likely to be there for some time."

So now, not only are companies spending massive sums of money on infrastructure for AI (another one of his pressures) and on tokens to use AI, there is an emerging third force: the cost of defending all the existing software AI is now able to scrutinize.

In its warning, the UK's National Cyber Security Centre speaks of an AI-fuelled "patch wave": a rush of software updates that will need to move across the stack as automated vulnerability discovery improves. That is a more sober version of the same premise. Discovery accelerates first, then everything else has to catch up.


Cartoon showing a huge wave over a city with the words get excited or stay terrified

Gnarly patch wave


Yikes. But is this incredible new pressure real?

Mythos is a-comin'

The strongest evidence in support comes from the groups actually running these systems against production-scale code.

Anthropic's own coordinated vulnerability disclosure dashboard is the bluntest version of the story. Though it would be, right?

As of May 22, 2026, it reports 23,019 Mythos candidate findings, 1,900 reviewed by external security firms, 1,596 disclosed vulnerabilities across 281 open source projects, and 97 patched upstream.

Those numbers do not mean every candidate is a real bug, or that every disclosed issue deserves an urgent patch. But they do show candidate generation moving faster than the disclosure and repair pipeline.

Morbid note: Anthropic calls independent human review the "rate-limiting step."


A person trapped inside a transparent pipe filled with liquid

Just rate-limiting and clogging up the chocolatey money-river.


Mozilla's Firefox work is the best public example of Mythos producing a real defensive outcome. Mozilla wrote that Firefox 150 shipped fixes for 271 vulnerabilities identified during its Mythos evaluation. In its deeper write-up, the Firefox team described the shift with the memorable line: "Suddenly, the bugs are very good".

But there's an essential point that is easy to miss. Mozilla's result came not just from getting access to a powerful new model. Their follow-up stated that the impact came from both more capable models and better harnessing techniques.

Strap in: harness required

At Mozilla, Mythos had a very robust and helpful guardrail: their own harness. It's built on top of existing fuzzing infrastructure, uses parallelized jobs across ephemeral VMs, deduplicates findings, triages reports, tracks bugs, reviews patches, tests fixes, and manages releases. In April 2026, they fixed 423 security bugs. Over 100 people contributed code to that effort.

Cloudflare tells a similar story from a different angle. In "Project Glasswing: what Mythos showed us", Cloudflare calls Mythos "a real step forward," but the important part of the post is the workflow. They argue that "pointing a generic coding agent at a repo doesn't work" for meaningful vulnerability coverage.

This is Cloudflare, their existing security apparatus is robust. Their custom harness uses recon, hunt, validate, gapfill, dedupe, trace, feedback, and report stages. It runs many narrow tasks in parallel rather than asking one agent to be exhaustive. It's deep engineering.


Krang inside a large robot body pointing across a city

As Krang, capable warlord, requires a robo-bod for (mostly) successful criminality.


That should temper things. What works is not a "throw it into the model's gaping maw" pass over a full repository. Mythos' success is enabled and catalyzed by a sophisticated security pipeline in a strong organization.

Independent offensive security work points the same way. XBOW's evaluation calls Mythos powerful but not magical, and still recommends a multi-model pipeline rather than exclusive reliance on Mythos. AISLE's public-model work is another useful counterweight: smaller and public models can find real vulnerabilities... when they are wrapped in a capable harness.

Daniel Stenberg's curl write-up is also a useful reality check. In "Mythos finds a curl vulnerability", he writes: "Five findings became one". Mythos analyzed about 178K lines in curl's src/ and lib/ directories and reported five "confirmed security vulnerabilities." The curl team reduced those to one confirmed low-severity vulnerability, three false positives, and one ordinary bug.

We've found a middle ground: Mythos is powerful, but not magical.

AI-assisted security work is real, and still requires expert steering. The value is greatest when models are embedded into a workflow that scopes the work, proves impact, deduplicates findings, and hands humans something coherent worth acting on.

The validation bottleneck

So, given the right apparatus, a model similar to Mythos can discover a large number of real or potential vulnerabilities. But after discovery, what happens?

Bugcrowd gives us a clean non-Mythos example. In April 2026, it wrote that its triage queue had grown by 334% over a three-week stretch, "almost entirely from low-quality submissions." Their summary is hard to improve on: "convincing content got cheap", while checking correctness did not.

The Linux kernel saw the same kind of pressure in a different domain.

LWN.net quotes Linus Torvalds saying AI reports had made the kernel security list "almost entirely unmanageable" because different people were finding the same things with the same tools and sending duplicate reports to a private list.

"... we're making it clear that AI detected bugs are pretty much by definition not secret, and treating them on some private list is a waste of time for everybody involved - and only makes that duplication worse because the reporters can't even see each other's reports." - Linus Torvalds


Linus Torvalds making an irritated gesture during a talk

Obligatory.


Instead of a relatively modest number of firms and actors pursuing bounties, the bar to "find vulnerabilities" is now much lower. The scarcity then moves to the time required to review, deduplicate, and triage. To say nothing of then fixing the actual issues, which now appear to be present in greater abundance.

The validation bottleneck shows up in primary sources first: Anthropic's own dashboard points to human review and triage as the rate-limiting step, and Bugcrowd's queue data shows what happens when cheap reports hit human validation. It's a logjam.

Security vendors are starting to describe the same thing in cost terms.

Vendor bending

Contrast Security describes a 1.8 million line scan with Claude Sonnet 4.6 that produced 3,560 findings for $315 in token usage.

The scan was the cheap part. The painful part came from their imagined validation bill: at 30 minutes per finding for a security engineer, Contrast estimated about $128,000 in labour.

This is one illustrative case, but what if this is even a partial representation of the direction we're all headed?

On those numbers, validation labour was not a little more expensive than the model run. It was roughly 400 times more expensive!


Pie chart showing token usage as 0.25 percent and human validation as 99.75 percent of one combined estimated scan plus validation cost

Breaking down an illustrative $128,315.00 Mythos bill.


That source has a commercial angle, and the validation bill is hypothetical, so apply a load of salt. But the shape is worth a pause. Bugcrowd's queue data and Anthropic's disclosure pipeline both point the same way: the token bill is not the whole bill.

Even Anthropic's own Mythos pricing points in that direction. Project Glasswing lists Mythos Preview at $25 per million input tokens and $125 per million output tokens after credits. That is not cheap in hobbyist terms, but for a company it is slim procurement, especially compared with competent alternatives. What is less slim is the cost of a new model returning 3,560 things that might matter.

What this shows us is that the cost pressure is the asymmetry. An AI found many things for a relatively small cost, and then dispatched them to experts for repair at a volume that is not at all a small cost. Experts are a lot more expensive than cost-per-million-token bills.

Teams might be headed to a big, not-so-beautiful bill indeed.

Who's gonna fix it?

There's the too clean argument: AI finds vulnerabilities faster, therefore AI will also repair them faster, therefore AI spend in both directions is required.

Hmm, sure, but the other half, the AI can fix it too half, isn't so clean today and remains a much tougher sell. It's this specific point where we remain skeptical.

Cloudflare explicitly warns that faster patching does not remove the shape of the expert-driven patch pipeline. In its Glasswing write-up, it says it tried "letting the model write its own patches". (Screams)

In short, these fixes did address the original bug, but often while "quietly breaking something else".

That warning has some academic company. A large-scale study of AI-generated patches found that Llama 3.3 introduced new vulnerabilities at an 11x higher rate than developer-written patches in its dataset, with command injection and eval injection among the prevailing pitfalls.

Results will differ between models and context, but the fantasy loop where AI finds bugs, AI patches bugs, and humans simply move on remains a fantasy indeed.


A complicated cartoon machine producing an unexpectedly wrong answer

The Homework Machine by Shel Silverstein


Repair speed only counts if the repair is correct. A patch that breaks an invariant, skips regression testing, or creates a new reachable edge case is no helpful patch at all.

This is where the "AI is required" or "fire with fire" claims should slow down. Yes, AI can help produce candidate patches, explain code, generate proofs of concept, and draft tests. But responsibility still lands with the maintainer or engineering team to create and ship fixes that actually amount to more secure software.

And remember, this all assumes a resourced company with money and an existing defensive security motion. But what about the open source volunteer hacker somewhere in the middle of anywhere?

Holding the bag

For large companies, increased security work is an increased operating expense. Painful, but budgetable, and probably going to wind up rolled down to the consumer. That's how these factors become inflationary in a broad sense.

For open source, the same pressure lands on maintainers who never priced security response into anything because there is nothing to price. Many simply do it for the joy of it.


A young hacker sitting in front of a laptop

For the love of the game.


Dr. Sam Illingworth points this out in his article "Project Glasswing and the open source maintainer tax": "AI generates at machine speed. Humans remediate at human speed." There's the asymmetry again.

Most folks still underestimate how much essential software relies on open source:

Broad cost pressures have a way of de-prioritizing free, volunteer work. Imagine what would happen if that volunteer work now fundamentally required an expense to secure?

And the work is different. There are more decisions, and a ton of new - if not more frequent - questions: Is this report real? Is it a duplicate? Should this be handled privately or in public? Who writes the advisory?

So who funds the fixes?

For some high-value systems, AI-assisted security workflows are now essential. Attackers and defenders will both use cheaper discovery tools and that will have an impact. It's an arms race. So yes, there are new costs required which will hit businesses and then land on everyday people. Therefore: we will fund them.

But for open source software? Unfortunately, to be determined.

Fundamentally, what seems required now is less AI itself and its costs, and more the ability to absorb a faster stream of security evidence without drowning the humans responsible for the code. But the trick is, for that, you're probably going to need to use top models yourself, in whatever way can be most helpful.

At opub, this is where we're trying to help. We link maintainers to top AI models for exactly this kind of supplementary work: validating reports, reproducing bugs, exploring patches and drafting tests. Generous donors cover the bill.

Open source should be fun and rewarding. If the programmers just want to program, we can help them cut out the noise. While this is one way to offer them support, it'll take a whole lot of forces working together so that open source can continue to thrive post-Mythos.

Written by Evan Larsson

Filed under maintainers, security, agents, tokenomics

Share

Subscribe to the newsletter

Hear from open public.

Rare and concise letters with our latest writing, sponsorships, and updates.

No spam, never sold. Unsubscribe any time.

Next up

Introducing Open Public

May 21, 2026 / 6 min / launch, donors

More from opub

All posts
/ 6 min read

Introducing Open Public

opub is the public ai compute commons for open source. Donors fund donated compute for open source projects, and maintainers spend it on over 30 top coding models through OpenRouter API keys with public token spend.

Evan Larsson