The matrix was always the same and the challenge was clearly designed so that the point was being able to read anything at all, not knowing how to invert a matrix, so I asked the creator what was up.
He told me that there were tools that would trace input values until they reached a comparison instruction, then print what they were compared against. Therefore it was necessary for every deobfuscation challenge to scramble the input in some way too complex for these tools to undo, before comparing it. Hence the multiplication by a pseudorandom matrix.
The point is, cheating tools aren't new.
"new" does the same thing and is probably just a better descriptor then frontier
"Frontier models break the open CTF format" is good
"Frontier AI..." means wtf is Frontier AI.
Because of course it exists (just googled it): https://frontierai.company/
The solution is just to make CTFs harder, but when do CTFs become too hard? Maybe the problem is that 'hard' CTFs are fundementally too 'simple' where it's just a logic chain and an exhaustive bruteforce towards a solution since there really are limited ways to express a solution in plain sight.
Or maybe human creativity has been exhausted and we're not so limitless as we thought. Only time will tell.
I had another idea spring to mind: we could hide two flags, one that could only be found by ai agents and not humans or tools written by humans.
Do you publish it somewhere? Here's a sample my my js obfuscator output: https://gist.github.com/Trung0246/c8f30f1b3bb6a9f57b0d9be94d...
we have very powerful simulation tools so something like "project a pattern at these angles" wouldn't really work as you could simulate that.
I guess something cool is that we can make simulating the solution very expensive, but in real world it would be free since it's analog... As long as simulations take longer than it takes for a human to find a solution it would be a pretty good way to deal with it. I am sure people smarter than me can come up with something.
Maybe I was too early to dismiss human creativity.
There are a million places where a computer can interact with a non-digital system in a loop.
- Tune an FPGA, or a whole data-center, or just a physical computer.
- Make a drone fly somewhere.
- Design a selective toxin (or anti-toxin).
Or, you know, get more people to click on adds. All totally possible to automate.
Many CTFs have switched to a dual-leaderboard format recently, one for "agentic teams," one for the rest. If all you care about is "learning" and imaginary internet points, you can just participate as a human team and adblock the AI scoreboard, and maybe lobby CTFTime into splitting their rankings as well.
This stands out to me, and speaks perhaps broader than the article itself? I’m sure this has been in the spotlight before, but well put for many areas I think.
My fear is that they never get to the level they need to be at to create good software even with the help of AI. So, although an expert with AI can create great software, that is not where we end up. In stead we will have vibe coded messes by people who barely have any grasp of what is going on.
You could even go so far that anything loaded on your computer is fair game, but not more than that (certain competitive programming competition for example allow unlimited amount of paper material - for CTFs you probably need much more than that, therefore electronic).
still has no mention of AI, but that will likely change as they increasingly dominate competition.
It's a lot harder to detect cheating when your only trace is how fast someone submitted the string CTF{DUck1e_Pwned}
I just did a CTF where I was in the top 10. It was the first CTF I completed and I used AI because the rules permitted it. That said, I couldn’t solve all challenges.
But yes, it was significantly easier now than I last attempted one. Even manually solving with AI assisted assembly interpretation was much easier.
Well actually I get it. In cycling motor doping, putting a hidden engine into the bike, seems more offensive than regular doping. I think this is because there is a continuum from eating well to taking supplements to injecting stuff, but having a engine breaks a fundamental idea about cycling. Similar hacking is about cleverly abusing the rules.
As I don't know much about the CTF scene, I looked for other takes on this topic.
Here's an article from 2015 about how tool-assistance already changed CTFs:
> Individual skill will undoubtedly be a factor next year. But, I'm left wondering whether next year's DEFCON CTF will tell us anything more than how well-developed each team's tools are (and how well they can interpret the results).
https://fuzyll.com/2015/ctf-is-dead-long-live-ctf/
But there are quite a few recent (2026) articles with the same core message as in the original article, e.g., https://blog.includesecurity.com/2026/04/ctfs-in-the-ai-era/ or https://k3ng.xyz/blog/ctf-is-dead
And here's someone explaining how Claude Max allowed them to win CTFs:
> I had always been interested in CTF as one of the only ways people could compete and show off their skill in coding/problem solving on a global scale. It was just too difficult and didn't make sense for me to learn the fundamentals as an electrical engineer. As time went on, I got better and better, and it was hard to tell whether it was because of experience or if it was because of improvements in AI.
> I accomplished my goals, and for that reason I'm quitting CTF, at least for now. [...] I'd like to think I highlighted the problem before it became a bigger issue. So, how do we fix this? Teams and challenge authors losing motivation is not good. CTF dying is not good. AI bad. Or is it?
https://blog.krauq.com/post/ctf-is-dying-because-of-ai
The only article that saw LLMs as a non-negative force for CTFs was this one. Fittingly, it sounds like LLM output ("Let's be honest", "This is where things get interesting.") and only contains hallucinated references.
The same article talks about CTF skills as a way to learn about security best practices and separately a sport.
In reality it was all about learning an extremely important skillset (securing/attacking software and systems) that is getting automated.
The real thing the author seems to be frustrated about is AGI is coming in computationally verifiable domains first, and lot of his skillset was taken over in a big part.
This is like someone complaining that making machine parts has been ruined: Skillful craftsmen used to make them by hand using manual tools!
Nowadays the CAD/CAM/CNC cheaters have almost completely automated the whole thing. How is the next generation of craftsmen going to learn how to craft a gear by hand when the process of gear making has been reduced to pressing start on a CNC machine?!
See what I mean? Sorry, I think this article is just Luddite. I can empathize with the pain of your beloved craft basically being rendered obsolete by new technology, but the process can neither be stopped nor is it bad in general.
The manual skills you trained with CTF puzzles are now simply no longer relevant . (Field-specific) "AI orchestration" is the new cyber securtiy skill if LLMs really have become so good at this, and what the author used to do manually then has the same value as being able to craft a gear by hand.
Indeed, in the real world, plenty of people organize to do formerly-skillful tasks together. I have not personally crafted a gear by hand, but I have built a house in a long-abandoned style with a group of people only using hand tools.
There _is_ a danger that society forgets how to do these things. During that house-building exercise, there were many tricks of the trade that, while likely documented somewhere in a book, would have been difficult to reproduce without seeing a demonstration. From the standpoint of “does it matter?” it depends on what you care about. We absolutely do not need cruck-framed houses with scribed joints. Modern construction is faster and cheaper and lasts long enough. But it would sadden me greatly if practices like this faded from memory, because it’s one of those things that makes you gasp “wow!” when you see it. And your appreciation only deepens when you try it yourself.
These models seems completely unbeatable only in the ads. There are 100+ times way someone puts Hindi Yoda talk In Morse Code and it goes nuts. The reason they are going to hard for PR Marketing on this is because they know it is a matter of time.
The only things that works is novelty and obscurity. LLMs still suck with things mentioned in the footnotes of datasheets and manuals, things that deviate in subtle ways, unique constructions that alter something very very common. It's hard for LLMs to avoid common pitfalls in terms of making assumptions, while staying on track.
Not as easy logistically...
Explicit ELO measurements with some cheating detection. AI assistance wholly banned. As you climb the ELO ladder, detection gets more onerous. At top level during online events, anti cheating teams require the use of both monitoring software and multiple cameras.
Idea is that you can cheat pretty easily at the lowest levels but it gets less easy the higher you go. This allows for better feeding into the truly elite competitions.
I think chess’s very firm stance that AI is never allowed in competition (neither online nor in person), rather than CTF’s acceptance, was the right call.
If CTF is a player-vs-player event, then AI should just be banned outright, otherwise it will devolve into AI-vs-AI, which is just not an interesting competition format, as we learned in chess. Compared to FIDE top events (which bans AI), only a tiny niche audience actually watches the Top Chess Engine Championship (AI-centered). It turns out what we care about is not whether chess can be solved by any means available, but what are the limits of the human mind in learning chess.
Pretty much all chess coaches/educators also warn against relying heavily on AI during learning; engines only give you an illusion of understanding.
I've seen that exact font and color scheme a dozen of times the past weeks.
The intent for most CTFs is to provide a meaningful challenge that concerns a single topic without introducing noise that wastes time. Of course a training exercise is easier to complete for an LLM.
On the other hand, CTFs are fundamentally a game and a competition which are supposed to be fun and compare and improve ones skill. So when I let an LLM generate the entire solution for me, what's the point anymore? I did not learn anything. I did not work for that place on the leaderboard, I just copied the solution. And worst of all, I did not have any fun. It's boring.
So how does using AI as a solver not feel like cheating?
What am I missing here?
Its not really a good comparison
But I don't know enough that's why I asked.
I imagine one could do CTF in public, machines you work on vetted/prepared to some spec, yada yada.
If chess and Go can do it why can't CTF?
That was my question when I wrote "what am I missing here".
"Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that"."
It's an incredibly exciting time in security research in my humble old man opinion.
Think the cadence of new exploits is perhaps a good measure of that rather than subjective thoughts by anyone regardless of experience.
It's pretty fun. Or at least it was, back when you had some sense that your competitors were competing on an even playing field and just beat you because they were better than you.
I wouldn't say the name is a "gaming reference", it's just a descriptive name for a game.
Its a war game reference I guess?
I never got super into security but it gave me the confidence to play in the same field and lose the stupid aura I had that somehow "rich americans" would be better than me at everything because they had better universities or because of Hollywood or something.
Sad that another cool thing is lost to AI but I guess kids will learn in other ways.
>and the old game is not coming back
For many people the CTF scene was already dead in 2021 because it had turned into something unrecognisable.
In reality it’s just different.
"That makes open CTFs pay-to-win. The more tokens you can throw at a competition, the faster you can burn down the board. Specialised cybersecurity models like alias1 by Alias Robotics are becoming less relevant compared to general frontier LLMs. The competition is turning into "who can afford to run enough agents, with enough context, for long enough.""
1) It’s OK to do just about anything to win a CTF, including installing malware on the organisers computers months before the actual event so you’ll have an easy time stealing the flags.
2) It’s not ok to try and win the CTF with a solution the authors did not intend.
Recently the #2 crowd has been winning because the hacking scene has turned corporate and boring. People started to partake in CTFs in the hopes of landing a job(!)
CTFs are indeed ruined for those people, I personally don’t mind.
For the people in group #1 LLMs change little. Attacking the challenges directly was always a last resort.
This year, multiple groups on the top of the leaderboard were clearly abusing LLMs. You can tell because they know nothing of what a CTF is nor the terminology, nor really the fields the challenges were about when they were talked to. They were obviously amateurs.
It was pretty depressing to hear how unaware they were of how obviously they did not fit in to the type that usually is on the top of the leaderboard. It seems they seriously think they were under the radar. If it was one group it could be a freak incident - some times someone just shows up and curbstomps competition. But there were many groups like this this year. They also had a certain smugness to it - one staff reported that a group was hinting to other teams about their "super weapon". Another group credited their "secret third team member they didn't want to talk about".
I use LLM frequently and experiment with it a lot, both at work and on my free time. Nowadays they are good enough to have value and I am interested in learning more about that. They let me spend more time on hard problems and avoid spending the day on simple CRUD. I say this to say that LLM doesnt have to equal bad, it is a tool, that's all. However, I generally avoid LLM communities because many LLM fans are lazy and unskilled people who are just happy they can feel they are worth something even if they have no skill. They don't really have much to provide of conversation. If anything, from reading the CTF crowd this year, the rise of LLMs has just meant more of these people can stomp on and harvest the CTF scene for self validation.
This is not me trying to gatekeep who can play CTF. Anyone is welcome, but there is one condition: You are here to learn and have fun.
The conclusion many I talk to has come to is that nowadays, it is harder to learn to put in hard work and become good at something because there are just too many ways to cheat and take shortcuts. I suspect in the future there will be a shortage of useful people - the kind that have critical thought and know the value of doing something properly. This doesn't mean "Not using LLM", but as said by many on HN before you need a certain seniority before LLMs are useful augmentations to your skills and not just stopping you from learning yourself.
I agree with the article. Anything but physical competitions with strong security - think professional e-sports with organizer-provided PCs, is over. But I think one of the most interesting things to take away from my CTF experience is that the bottom of the leaderboard was still full of amateurs slowly working their way up - it is a few rotten apples that ruin the fun for most, and there are still plenty of people who want to learn and deep-dive.
The text itself being exceedingly long for no obvious reason doesn’t help.
And if you think it was too long, what part would you have shortened? I never knew about the scene and found it interesting to read this personal take on it.
According to Pikka, the paragraph text is Taupe Grey (#92908a) on a Liquorice (#111110) background. That's... pretty far from black and white.
The whole point of competitions is to provide a safe environment thanks to a set of rules all participants AGREE on in order to progress together.
If new tools "break" the competition, we change the rules and that's A-OK.
CTF isn't a natural phenomenon, if tools change, rules change, simple.
We’ve figured out the human replacement pipeline it seems, but we haven’t figured out the eduction part. LLMs can be wonderful teachers, but the temptation to just tell it ‘do it for me’ is almost impossible to resist.
If they can ship code that matches a spec, why does it matter if they’re using ai or not?
Genuinely curious.
I am perfectly capable of writing specs, and feeding them to 3 separate copies of Claude Code all by myself. Then I task switch between the tmux windows based on voice messages from the pack of Claudes. This workflow is fine for some things, and deeply awful for others.
Basically, if a developer is just going to take my spec and hand it to Claude Code, then they're providing zero value. I could do that myself, and frequently do.
The actual bottleneck is people who can notice, "The god object is crumbling under the weight of managing 6 separate concerns with insufficient abstraction." Or "Claude has created 5 duplicate frameworks for deploying the app on Docker. We need to simplify this down to 1 or we're in hell." I will happy fight to hire people who can do the latter work. But those people can all solve fizzbuzz in their sleep.
People who just "ship code that matches a spec" without understanding the technical details are providing close to zero value right now.
There is an interesting niche for people with deep knowledge of customer workflows who can prompt Claude Code. These people can't build finished products using Claude. But they can iterate rapidly on designs until they find a hit. Which we can then fix using people with deeper engineering knowledge and taste.
But if you're not bringing either deep customer knowledge or actual engineering knowledge, you're not adding much these days.
I’m not talking about gotcha level stuff here where the first time it didn’t compile because of a bracket or anything, or even first time wrong. They couldn’t do Fizzbuzz in a language of their choice, at all.
Those that could were always annoyed at having to do such things because how could someone coming for a contract position not be able to do this? Without seeing what a filter it really was.
Who cares as long as the car is fixed, right? As long as the mechanic can Chinese-room his way to a working car, why does it matter how much of it he actually understands?
And why hire the mechanic instead of hiring the Chinese room?
The energy spent arguing that those 4 instructions in a row “are not a mark of someone who can write code” would have better been spent firing them.
https://blog.codinghorror.com/why-cant-programmers-program/
If you remove the "without AI" and the end, I've been hearing similar anecdotes about fizzbuzz for years (isn't the whole point of fizzbuzz to filter out those candidates?)
We usually hire for problem solving capabilities and not so much for technical know-how.
That’s at least how I read your comment.
This situation in particular was a React role so there is an expectation that when you list React as one of your skills on your resume then you know at least the basics of state, the common hooks, the difference between a reference to a value vs the value itself.
These days you can do a surprising amount with AI without knowing what you are doing, but if you don't have any clue how things work you'll very quickly run in to problems you can't prompt away.
Well, they were ostensibly forcing functions... ten years ago everyone was paying the exchange student to do their homework and assignments for them, and that guy was paying his cousin back in his home country, but the whole thing is a bit more efficient now.
But he was a great teacher anyway. He was engaging and kept the kids in line and learning. I eventually learned the truth, and most of my classmates forgot about it. Teaching, like flying a plane or driving a train, might become more about keeping watch over a small group of people and ensuring that things don't go off the rails, and that's fine.
In reality heavier isotopes of hydrogen fuse, conserving the total number of nucleons, but the resulting hydrogen has a lower rest mass than the parent particles. The extra mass is released as energy and the total energy is conserved.
By his logic the system either violated energy conservation (by creating nucleons while releasing energy) or was endothermic (creating nucleons from the surrounding energy).
I think it helps that it's a very narrow field to look at, compared to fuzzy and big-picture view of social studies, for example. So much room to be confidently wrong... And sadly I can't think of a solution, LLMs or not.
E.g. in Hungary I had a university CS professor that originally wanted to be a highschool teacher and a highschool physics teacher that originally wanted to be researcher. Their choice of degree didn't determine which outcome they got. The researcher and teacher curriculum had an 80%+ overlap.
A Physics Prof Bet Me $10,000 I'm Wrong
https://www.youtube.com/watch?v=yCsgoLc_fzI
All things I learned in school which were wrong information.
Not to mention, the current state of education is far worse. I don't think most realize how low the bar is.
My “earth sciences” teacher also once tried to argue with me against the universal law of gravitation. (no, she was not referring to Special/General Relativity. She didn’t agree two objects in a vacuum fall at the same speed regardless of mass.
We can all agree that both human "experts" and LLMs can sometimes be right, and sometimes be confidently wrong.
But that doesn't imply that they're equally fit for purpose. It just means that we can't use that simple shortcut to conclude that one is inferior to the other.
So where do we go from here?
Are they or aren't they
Now I’m certain that there exist those mythical human instructors who can do better, but that’s not worth much if 99.99% of people don’t have access to them. Just like a good human physician who takes their time with the patient is better than an LLM, but that’s not worth much either given that this doesn’t match most people’s experience with their own physicians.
For me the best human teachers were the ones that managed to make me interested on topics that I thought are boring/useless (many times my opinion being stupid, mostly due to lack of experience).
So far with LLM I learn about things I know something (at least that they exist) and I am interested in, which is a small subset of things that one should learn during lifetime.
The kids learnt all about Team Fortress 2, Roblox, Rainbow Six etc. They also learnt how to game the learning system so it looked like they were doing their work.
Not really, not if you want to ask it deep questions. It won't have an answer that is deeper than something that you can find online, and if pressed it will just keep circling around the same response.
The reason is that this "thing" was never curious, never asked questions, and never really learned anything. It just has learned the Internet "by heart", and is as boring as a human teacher who is not really curious about the subject they are teaching, and has just got some degree by "by hearting" some text book. Of course it does it much better than a human, but it is fundamentally the same thing.
You're certain that mythical instructors exist (?) who "can" do better?
Are human instructors more competent as teachers than AI teachers, or are AI teachers more competent as teachers than human teachers? No "this or that can happen," just a definitive statement please.
AI is likely a million times better student than my dimwit cybersec meatbags...er, majors, for sure, as well! Don't have a reliable way to measure or experience why/how, tho, so I'm not out here claiming it. Even if I did, why would I argue for their replacement?
[0] Episode webpage: https://share.transistor.fm/s/31855e83
(Real mathematics problems, not American-style ""math"".)
Also, you could spin up your own educational agent with very strict instructions on guiding the user instead of just doing the work. Of course you can always go around it but if you're making an effort to learn, this is a good middle ground.