Hacker News

Hardening Firefox with Claude Mythos Preview

117 points by HieronymusBosch 14 hours ago | 68 comments

https://arstechnica.com/information-technology/2026/05/mozil...

Again, and this is important:

A bug is a bug. A “potential vulnerability” is a bug. A vulnerability is verifiable as having security implications with a proof of concept or other substantial evidence.

Words matter. Bugs matter. It’s important to fix large amounts of bugs, just as it always has been, and has been done. Let that be impressive on its own, because it IS impressive.

Mythos didn’t write 271 PoC for vulnerabilities and demonstrate code path reachability with security implications. Mythos found 271 valid bugs. Let that be enough.

epistasis 8 hours ago

I was a bit confused by your definitions, but here's how Mozilla broke out [1] the 271, um, things:

> As additional context, we apply security severity ratings from critical to low to indicate the urgency of a bug:

> * sec-critical and sec-high are assigned to vulnerabilities that can be triggered with normal user behavior, like browsing to a web page. We make no technical difference between these, but sec-critical bugs are reserved for issues that are publicly disclosed or known to be exploited in the wild.

> * sec-moderate is assigned to vulnerabilities that would otherwise be rated sec-high but require unusual and complex steps from the victim.

> * sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).

> Of the 271 bugs we announced for Firefox 150: 180 were sec-high, 80 were sec-moderate, and 11 were sec-low.

Mozilla uses the term "vulnerability" for even sec-high, even though they say right below that it doesn't mean the same thing as a practical exploit. And on their definitional page, they classify even sec-low as "vulnerabilities" [2].

Words are tools, that get their utility from collective meaning. I'd be interested where you recieved your semantics from and if they match up or disagree with Mozilla.

[1] https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

[2] https://wiki.mozilla.org/Security_Severity_Ratings/Client

IainIreland 7 hours ago

I work at Mozilla; I fixed a bunch of these bugs.

In general, I would say that our use of "vulnerability" lines up with what jerrythegerbil calls "potential vulnerability". (In cases with a POC, we would likely use the word "exploit".) Our goal is to keep Firefox secure. Once it's clear that a particular bug might be exploitable, it's usually not worth a lot of engineering effort to investigate further; we just fix it. We spend a little while eyeballing things for the purpose of sorting into sec-high, sec-moderate, etc, and to help triage incoming bugs, but if there's any real question, we assume the worst and move on.

So were all 271 bugs exploitable? Absolutely not. But they were all security bugs according to the normal standards that we've been applying for years.

(Partial exception: there were some bugs that might normally have been opened up, but were kept hidden because Mythos wasn't public information yet. But those bugs would have been marked sec-other, and not included in the count.)

So if you think we're guilty of inflating the number of "real" vulnerabilities found by Mythos, bear in mind that we've also been consistently inflating the baseline. The spike in the Firefox Security Fixes by Month graph is very, very real: https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

nirui 2 hours ago

How about this: a "vulnerability" is a "vulnerability", but after it was identified and verified to cause problem, that's when it should be called a "bug", because it could make the software do unwanted things.

paulvnickerson 7 hours ago

What types of vulnerabilities was it finding? Cross site scripting, privilege escalation, etc? Mostly memory corruption or any Javascript logic bugs?

IainIreland 7 hours ago

I work on SpiderMonkey, so I mostly looked at the JS bugs. It was a smorgasbord of various things. Broadly speaking I'd say the most impressive bugs were TOCTOU issues, where we checked something and later acted on it, and the testcase found a clever way to invalidate the result of the check in between.

If you look closely at, say, this patch, you might get a sense of what I mean (although the real cleverness is in the testcase, which we have not made public): https://hg-edge.mozilla.org/integration/autoland/rev/c29515d...

reisse 7 hours ago

> although the real cleverness is in the testcase, which we have not made public

What is the point of keeping it private? I'd bet feeding this patch to Opus and asking to look for specific TOCTOU issue fixed by the patch will make it come up with a testcase sooner or later.

IainIreland 6 hours ago

The same is also true of a good security researcher, and has been for a long time. The question is mostly whether it takes long enough to come up with a testcase that we've managed to ship the fix to all affected releases, and given people some time to update. (And maybe LLMs do change the calculus there! We'll have to wait and see.)

mccr8 6 hours ago

Possibly! One of the many areas that might need rethinking in the age of AI (that started in February of this year) is how long security bugs should be hidden. We live in interesting times.

paulvnickerson 7 hours ago

Very cool, thank you.

mccr8 6 hours ago

I'd say it leans towards memory corruption kinds of issues, as those are easiest to pass the validator, thanks to AddressSanitizer. I think there's a lot of potential for making the validator more sophisticated. Like maybe you add a JS function that will only crash when run in the parent process and have a validator that checks for that specific crash, as a way for the LLM to "prove" that it managed to run arbitrary JS in the parent. Would that turn up subtler issues? Maybe.

epistasis 7 hours ago

I'm not a security dev or researcher or anything, but as an outsider my understanding matches how Mozilla uses the terms. Though words used by specialists and the general public can offer differ...

cookiengineer 6 hours ago

Can you elaborate why those bugs weren't found by e.g. fuzzing in the past?

I'm genuinely curious what "types" of implementation mistakes these were, like whether e.g. it was library usage bugs, state management bugs, control flow bugs etc.

Would love to see a writeup about these findings, maybe Mythos hinted us towards that better fuzzing tools are needed?

IainIreland 6 hours ago

If I had to guess, I'd say that AI is better at finding TOCTOU bugs than fuzzing because it starts by looking at the code and trying to find problems with it, which naturally leads it to experiment with questions like "is there any way to make this assumption false?", whereas fuzzing is more brute force. Fuzzing can explore way more possible states, but AI is better at picking good ones.

In this particular sense, AI tends to find bugs that are closer to what we'd see from a human researcher reading the code. Fuzz bugs are often more "here's a seemingly innocuous sequence of statements that randomly happen to collide three corner cases in an unexpected way".

Outside of SpiderMonkey, my understanding is that many of the best vulnerabilities were in code that is difficult to fuzz effectively for whatever reason.

mccr8 6 hours ago

Fuzzing isn't good at things like dealing with code behind a CRC check, whereas the audit based approach using an LLMs can see the sketchy code, then calculate the CRC itself to come up with a test case. I think you end up having to write custom fuzzing harnesses to get at the vulnerable parts of the code. (This is an example from a talk by somebody at Anthropic.)

That being said, I think there's a lot of potential for synergy here: if LLMs make writing code easier, that includes fuzzers, so maybe fuzzers will also end up finding a lot more bugs. I saw somebody on Twitter say they used an LLM to write a fuzzer for Chrome and found a number of security bugs that they reported.

throw0101c 7 hours ago

Presumably there are (implicit?) "sec-none" things, like [a] from the recently released 150.0.2 [b] which makes absolutely zero mention about "Security Impact" or "Severity" in the bug report, unlike [c], which is listed in the Mozilla weblog post [2].

Security things are mentioned in the Release Notes [b] pointing to a completely different document [d].

Perhaps sometimes a bug is 'just' a bug, and not a vulnerability.

[a] https://bugzilla.mozilla.org/show_bug.cgi?id=2034980 ; "Can't highlight image scans in Firefox 150+"

[b] https://www.firefox.com/en-CA/firefox/150.0.2/releasenotes/

[c] https://bugzilla.mozilla.org/show_bug.cgi?id=2024918

[d] https://www.mozilla.org/en-US/security/advisories/mfsa2026-4...

Gregaros 7 hours ago

> Mozilla uses the term "vulnerability" for even sec-high, even though they say right below that it doesn't mean the same thing as a practical exploit.

That’s not evident in what you pastedat all.

What you pasted says

> sec-critical and sec-high are assigned to vulnerabilities that can be triggered with normal user behavior […] We make no technical difference between these […] sec-critical bugs are reserved for issues that are publicly disclosed or known to be exploited in the wild.

> sec-low is assigned to bugs that are annoying but far from causing user harm (e.g, a safe crash).

From this one infers that the "180 were sec-high" bugs found are actually exploitsble but known to have been found in the wild, and are NOT mere annoying bugs.

The difference between 180 and 270 does nothing to deflate the signicance, or lack there of, of the implication re: Mythos.

epistasis 7 hours ago

Yes, it is not in what I pasted, as I said, "even though they say right below". If you don't believe me then click on either of the links.

mozdeco 7 hours ago

Mythos did in fact write PoCs for all bugs that crash with demonstration of memory-unsafe behavior (e.g. use-after-free, out-of-bounds reads/writes, etc).

For us this is substantial enough evidence to consider it a security vulnerability at that point, unless shown otherwise and it has always been this way (also for fuzzing bugs).

ZrArm 2 hours ago

> Mythos did in fact write PoCs for all bugs that crash with demonstration of memory-unsafe behavior (e.g. use-after-free, out-of-bounds reads/writes, etc).

But report [1] says that "Some of these bugs showed evidence of memory corruption...", which implies that majority of these (which includes 271 bugs from Mythos) don't have evidence at all. Do I not understand something?

> For us this is substantial enough evidence to consider it a security vulnerability at that point

Mythos is supposed to be pretty good at writing actual exploits, so (as I understand) there shouldn't be any serious problems with checking if bug is vulnerability or not.

[1] https://www.mozilla.org/en-US/security/advisories/mfsa2026-3...

jerrythegerbil 3 hours ago

Is that number of crashing bugs with PoC available/written down anywhere?

browningstreet 5 hours ago

This isn’t true anywhere people have to make decisions about what to work on first.

dataflow 6 hours ago

> Mythos didn’t write 271 PoC for vulnerabilities

I think the word you're looking for is exploit?

input_sh 10 hours ago

Original source: https://news.ycombinator.com/item?id=48051079

It's better because it actually lists a sample of Bugzilla reports that were made public. This topic was discussed previously (36 comments two weeks ago: https://news.ycombinator.com/item?id=47885042), but the part about bug reports being made public is brand new.

kajman 4 hours ago

I dismissed the earlier non-technical blog post as shameless product boosterism for Anthropic. The linked hacks blog (which is a better source than this article) is a welcome release. It's hard to deny there's something real to this now, I think. Mozilla's internal definition of a "vulnerability" is also probably more widely applied than what many would intuit, but it is good that these issues are being taken seriously and fixed.

tialaramex 6 hours ago

They've only linked a few tickets, so of course maybe when we see all 271 actual distinct things the insight won't apply but all those I examined ended up as some C++ code with a nasty bug in it.

Firefox is written in several languages, only about 25% of it is in C++ but every single one of these issues seems to touch the C++.

mccr8 6 hours ago

A general limitation of this approach is that it is only as good as your validator, and there's nothing easier to validate than a test case that creates, say, an AddressSanitizer use-after-free. For subtler issues will we have to more specific validators or will the LLM become better at coming up with other dangerous conditions it will verify? We'll see.

tialaramex 4 hours ago

> A general limitation of this approach is that it is only as good as your validator, and there's nothing easier to validate than a test case that creates, say, an AddressSanitizer use-after-free

Sure, but, surely AddressSanitizer would also detect the same problem in the C or Rust which together also make up about 25% of Firefox so... ?

benced 2 hours ago

Reading this article in the context of the Zig folks refusing to even consider LLM-generated bugs certainly shapes my perspective on what technologies will be in my toolchain.

Diti 11 hours ago

I hope to see the day when (or if) the LLMs get so good at spotting and fixing bugs that all that’s left for the Firefox engineers to do is to focus on adding new features.

This isn’t sarcasm. Firefox deserves to be used more. Most people I know don’t use it because “Chrome does almost everything better”, and Firefox can’t compete with the other browsers’ roadmaps.

greggoB 10 hours ago

> Firefox deserves to be used more

Totally agree. I even go as far as choosing which website I make purchases on depending if they work on FF, or writing to support occasionally to tell them it's not supported or a feature isn't working properly and this would be appreciated.

I know it pretty much always goes nowhere, but I feel it's what I can do to keep the browser somehow on the radar.

nubinetwork 9 hours ago

> Firefox deserves to be used more

Part of the problem is, when they stop working on fixing bugs, they start doing Mr Robot things... We just want a web browser. Nobody asked for pocket, or AI...

If they use AI to fix all the bugs, then what else is for them to do, other than maintain syntax compatibility with the various languages they build with? They're just going to go back to making the browser trash again.

FeepingCreature 32 minutes ago

I have an old apk of Firefox pinned on mobile. I do this because I genuinely believe that for my very limited usecases, the browser has become actively worse.

(Don't worry- I use the system browser for any site I don't fully trust.)

OhMeadhbh 6 hours ago

When I was at PalmSource, I tried to get budget for CoVerity or Fortify (static code analysis tools.). "Too expensive," my management chain said. I spent another year putting together a deal for a lower cost but limited to scanning the network stack. "No, it's based on BSD and BSD is inherently secure," my management chain said (neither is true, btw.)

I eventually left and wound up at Mozilla where there were a number of /* flawfinder ignore */ comments scattered throughout the code.

My guess is that Mythos just ignored the "flawfinder ignore" directives and reported the known vulnerabilities in the code.

fg137 2 hours ago

What are people's thoughts on how this could affect static analysis tools? I know they are very different beats but often they achieve the same goal. Static analysis tools can be slow, and they report lots of false positives.

I wonder if these models will get good + cheap enough so that people rarely reach for static analysis.

crummy 7 hours ago

Curious if people think LLMs will lead to more secure or less secure software in five years.

int32_64 7 hours ago

Both. The skilled will use them to find problems, the unskilled will use them to slopcode insecure software the skilled will have to fix.

FeepingCreature 29 minutes ago

More secure software, but in the same way that the population is net healthier after a plague.

mc3301 5 hours ago

Kinda like home-improvement stores, power tools, easily available hardware and youtube tutorials led to both incredibly amazing and durable furniture, as well as janky, ugly and even dangerous furniture.

More tools for more people equals more stuff being made on a wider range.

data-ottawa 5 hours ago

I’m just happy we’re talking about security.

That will make software safer alone.

bawolff 6 hours ago

One of the biggest issues in security historically imo is vendors who think, well nobody will ever find this bug so we can deprioritize fixing it. LLMs will prevent vendors lying to themselves which will lead to more secure software.

stavros 7 hours ago

That depends on which side has more money.

UltraSane 6 hours ago

In 5 years attackers have an advantage but in the long run I think more secure if developers use LLMs on software to find and fix all of the worse remotely exploitable bugs before release. LLMs are going to force devs to be much more security conscious.

gnabgib 6 hours ago

16 day old story

Wired: Mozilla Used Anthropic's Mythos to Find and Fix 271 Bugs in Firefox (41 points, 18 comments) https://news.ycombinator.com/item?id=47853649

Ars: Mozilla: Anthropic's Mythos found 271 security vulnerabilities in Firefox 150 (33 points, 8 comments)https://news.ycombinator.com/item?id=47855384

mozdeco 5 hours ago

No, it's a new post, see also

https://hacks.mozilla.org/2026/05/behind-the-scenes-hardenin...

gnabgib 5 hours ago

That's this post.. and while it's more detail on the same headline (271 bugs) these discussions look the same as 2 weeks ago (and the same as all the bloggers and podcasters discussed)

delichon 6 hours ago

In the latest Mission Impossible, saving the world depends on recovering the original software of an escaped superhuman AGI from a sunken Russian submarine. Luther writes a "poison pill" that given the original source will instantly one-shot the AI. We were left to wonder how this magical code could have been written, but now we know. Luthor just wrote a Mythos prompt that handed it the source code and asked for an immutable critical exploit.

lschueller 10 hours ago

Let's see, how this will improve the daily soc work. I still don't see, what's the big difference between Mythos and Opus, security wise. I'm confident, that this kind of vul detection is a long-term improvement. But does specifically Mythos makes such a big difference to "normal" models? I would love to see, what's the actual difference.

mccr8 8 hours ago

Quantifying the abilities of an LLM is a hard research problem, so I'm not sure if I can describe it in any great way, but Mythos did seem to be fairly clever about putting together things from different domains to find problems.

For instance, in one of the included bugs (2022034) it figured out that a floating point value being sent over IPC could be modified by an attacker in such a way that it would be interpreted by the JS engine as an arbitrary pointer, due to the way the JS engine uses a clever representation of values called NaN-boxing. This is not beyond the realm of a human researcher to find, but it did nicely combine different domains of security.

As the person responsible for accidentally introducing that security problem (and then fixing it after the Mythos report), while I am aware of NaN-boxing (despite not being a JS engine expert), I was focused more on the other more complex parts of this IPC deserialization code so I hadn't really thought about the potential problems in this context. It is just a floating point value, what could go wrong?

lschueller 8 hours ago

Okay, so far it makes sense to me. But is the deal with JS and floating point values, which isn't soemthing super special super rare stuff, only detected and identfied by Mythos while Opus wouldn't get to this point?

IainIreland 7 hours ago

There doesn't have to be a huge qualitative discontinuity between Opus and Mythos. It's just that Mythos has reached a threshold where it's finally smart enough that putting it in a loop and asking it to find bugs is suddenly really effective. Especially at the beginning, Mozilla wasn't doing anything particularly clever with prompts. Mythos is just smart enough that the hit rate on obvious prompts is high enough to matter. (Maybe you can get similar performance out of Opus 4.6 with really smart prompts, but AFAICT nobody had managed it until Mythos.)

JoshTriplett 10 hours ago

Among other things, Mythos seems better at "let me find, weaponize, and stack vulnerabilities until I get end-to-end from untrusted content to root", rather than just finding one thing in a specific identified area.

Havoc 6 hours ago

Results similar to mythos have been duplicated by weaker models.

Think it's more a care of mythos raising widespread awareness that tireless LLMs can be weaponized to dig through code and find that one tiny flaw nobody spotted

MetaverseClub 9 hours ago

I'm curious about how did Mozilla do bug finding before Mythos? Did they use any non-AI bug finding tools?

mccr8 9 hours ago

The usual sorts of fuzzing and static analyses, using AddressSanitizer and ThreadSanitizer. Also, with a bug bounty program to try to encourage external researchers to report issues. (I work on Firefox security; also I fixed 2 of the bugs linked in the blog post.)

canucker2016 8 hours ago

Coverity (similar to lint) scans various open source software products for vulnerabilities.

see https://www.blackduck.com/static-analysis-tools-sast/coverit...

and for Firefox-related alleged defects, see https://scan.coverity.com/projects/firefox

You have to create an account to view the actual reported defects.

There are just over 5000 reported defects still outstanding. I don't know how many overlap with the reported 271 Mythos-reported defects.

rockdoe 7 hours ago

How many of those are false positives though? Probably just over 5000?

You get bug bounties if you report the kind of bugs Mythos identified. There's a reason no-one collected bounties from the "5000 defects" Coverity identified.

The Mythos reports have several examples of chaining a whole bunch of logic in different parts of the program together to exploit something very subtle. The Coverity reports aren't anything like that. These tools aren't remotely in the same league or even universe.

IainIreland 7 hours ago

Yeah, fuzzing, sanitizers, and bug bounties were our main pre-AI tools for finding bugs.

MetaverseClub 7 hours ago

it's just sad that Coverity represents the best working C++ static analysis tool.

mccr8 6 hours ago

Firefox developers do fix issues found by Coverity. I haven't looked at the results in over a decade, but the last time I did there were a few code patterns we used in a lot of places which Coverity didn't like (but were actually okay the way we were doing them) which resulted in a colossal number of false positives.

deferredgrant 7 hours ago

A vuln finder is useful only if it respects the humans on the other end. Every bogus report taxes the same scarce attention needed for the real bugs.

xacky 9 hours ago

I just hope they don't start ignoring human created bug reports, as there are still many that haven't been fixed for years.

nnm 6 hours ago

I still don't know the exploit count for Mythos. Is it zero, one, or more?

mmooss 7 hours ago

> “That’s the key thing that has unlocked our ability to operate at the scale we’ve been operating at now,” he said. “It gives the engineer a crank they can pull that says: ‘Yep, this has the problem,’ and then you can iterate on the code and know clearly when you’ve fixed it and eventually land the test case in the tree such that you don’t regress it.”

I don't understand much of this paragraph:

* "a crank they can pull that says: ‘Yep, this has the problem,’": as in, ring an alarm? Does the LLM ring th alarm?

* "you can iterate on the code and know clearly when you’ve fixed it": Isn't that true of most bugs, assuming you do the normal thing and generate a test case? And I thought the LLM output test cases itself: "It will craft test cases. We have our existing fuzzing systems and tools to be able to run those tests" And are they claiming the LLM facilitates iterating?

* "and eventually land the test case in the tree": Don't you create the test case before the fix? And just a few words earlier they seemed to be working on the fix, not the test case. And see the prior point about test cases.

* "such that you don’t regress it.”: How is the LLM helping here?

Maybe I'm missing some fundamental unwritten assumption?

mccr8 6 hours ago

Mostly I think this just means that having a test case makes it easier to fix and verify. You can't actually take for granted having a test case when fixing a security bug. Sometimes you only have a crash stack or maybe a vague and hypothetical static analysis result.

> eventually land the test case

This is just a reference to the fact that we don't land test cases for security bugs immediately in the public repository, to make it harder for attackers. You are right that the LLM only helps with creating the initial test case. Things like running the test case in automation is part of the standard development process.

mmooss 43 minutes ago

Thank you; that makes sense.

ChrisArchitect 10 hours ago

The zero-days are numbered

https://news.ycombinator.com/item?id=47853277

rem1099 6 hours ago

I don't find that number very high. In a project of the size of Firefox, a new version of a compiler with stricter warnings or a draconian interpretation of the C standard can easily find 200 new bugs.

New tools find new bugs, but the oligarchy newspapers report on Mythos and not on clang-22.0.