Hacker News

Frontier AI has broken the open CTF format

201 points by frays 7 hours ago | 171 comments

baq 5 hours ago

Replace ‘CTF’ with ‘high school’ or ‘university’ and you’ve described the total slow motion collapse of education; the only saving grace is that most of it requires in person presence.

We’ve figured out the human replacement pipeline it seems, but we haven’t figured out the eduction part. LLMs can be wonderful teachers, but the temptation to just tell it ‘do it for me’ is almost impossible to resist.

Gigachad 4 hours ago

We are interviewing for a software dev role and we made the first round in person to prevent cheating. The gap between people who learned pre ai vs post is immense. I had a dev with supposedly 3 years experience and a degree in software who wouldn't have been able to write fizzbuzz without AI.

IanCal 4 hours ago

Can’t say you’re wrong but the last anecdote describes many I’ve had to review for jobs long before LLMs. Fizzbuzz is a classic thing that shockingly many devs genuinely cannot do, even at home.

sigmoid10 3 hours ago

Yeah, I've interviewed people like this 15 years ago. Degrees and experience mean nothing in this field. The best predictor I found was personal passion projects. Let them get as nerdy as possible, then you will see pretty quickly where their skills are at and what their limits are. And you will immediately filter out people who just studied CS because they heard you can make good money.

wookmaster 2 hours ago

Completely agree with this, leetcode has become such a business now of memorization for interviews it’s useless to know if someone memorized a solution or not.

gedy 2 hours ago

I agree, however there are so many interviewers who will still treat that as some softball criteria and insist that unless you "prepare" for an interview by memorizing leetcode you are 100% a faker and liar.

jadar 38 minutes ago

Maybe they themselves are fakers and liars / deeply insecure. I got bumped out of an interview rather rudely once because I blanked and couldn’t answer a trivia question about arrays.

Gigachad 3 hours ago

Something that is for sure new is the AI interview cheating tools which listen in on the call and provide answers in an overlay invisible to screen sharing. The only way to deal with it would either be invasive spyware on the applicants computer or asking them to do the interview face to face.

nsvd2 3 hours ago

Spyware wouldn't help at all because you could just put the AI between the computer and the monitor, for example, or use a VM.

josh2600 49 minutes ago

Why is it important that a dev can’t do fizzbuzz without ai?

If they can ship code that matches a spec, why does it matter if they’re using ai or not?

Genuinely curious.

ekidd 7 minutes ago

> If they can ship code that matches a spec, why does it matter if they’re using ai or not?

I am perfectly capable of writing specs, and feeding them to 3 separate copies of Claude Code all by myself. Then I task switch between the tmux windows based on voice messages from the pack of Claudes. This workflow is fine for some things, and deeply awful for others.

Basically, if a developer is just going to take my spec and hand it to Claude Code, then they're providing zero value. I could do that myself, and frequently do.

The actual bottleneck is people who can notice, "The god object is crumbling under the weight of managing 6 separate concerns with insufficient abstraction." Or "Claude has created 5 duplicate frameworks for deploying the app on Docker. We need to simplify this down to 1 or we're in hell." I will happy fight to hire people who can do the latter work. But those people can all solve fizzbuzz in their sleep.

People who just "ship code that matches a spec" without understanding the technical details are providing close to zero value right now.

There is an interesting niche for people with deep knowledge of customer workflows who can prompt Claude Code. These people can't build finished products using Claude. But they can iterate rapidly on designs until they find a hit. Which we can then fix using people with deeper engineering knowledge and taste.

But if you're not bringing either deep customer knowledge or actual engineering knowledge, you're not adding much these days.

IanCal 39 minutes ago

Fizzbuzz is such an incredibly simple problem if you can’t do it I struggle to see how you’d be able to complete any task that requires very basic reasoning and very basic coding knowledge. And if an AI system can do those parts, what am I getting for spending tens of thousands of pounds per year by hiring a person who can’t? Wouldn’t I just tag codex on the tickets?

I’m not talking about gotcha level stuff here where the first time it didn’t compile because of a bracket or anything, or even first time wrong. They couldn’t do Fizzbuzz in a language of their choice, at all.

Those that could were always annoyed at having to do such things because how could someone coming for a contract position not be able to do this? Without seeing what a filter it really was.

tardedmeme 20 minutes ago

For the same reason it's important your mechanic can identify which parts of a car are the wheel.

Who cares as long as the car is fixed, right? As long as the mechanic can Chinese-room his way to a working car, why does it matter how much of it he actually understands?

And why hire the mechanic instead of hiring the Chinese room?

jadar 40 minutes ago

It’s about deeply understanding what you’re doing. Like as a kid before you knew how to ride a bike, you could sit on a bike and peddling, but until it “clicked” you couldn’t balance and keep going forward stable. Fizzbuzz tests your ability to reason through a problem that seems simple on its face, but is easy to get wrong and/or overthink.

hnthrow0287345 38 minutes ago

It doesn't. It's just a low-end skill filter that got really popular. It could have easily been replaced by other tests like is this word a palindrome.

eastbound 22 minutes ago

And yet, some people argue that you shouldn’t ask a developer to align 3 “if” and 1 “for”!!!

The energy spent arguing that those 4 instructions in a row “are not a mark of someone who can write code” would have better been spent firing them.

xfax 41 minutes ago

To understand the code they are shipping requires some level of proficiency. Their inability to do fizzbuzz without AI calls that into question.

andai 18 minutes ago

That's actually the origin of FizzBuzz! A puzzle invented to weed out the perplexing multitude of CS graduates who apparently cannot program.

https://blog.codinghorror.com/why-cant-programmers-program/

Retr0id 4 hours ago

> I had a dev with supposedly 3 years experience and a degree in software who wouldn't have been able to write fizzbuzz without AI.

If you remove the "without AI" and the end, I've been hearing similar anecdotes about fizzbuzz for years (isn't the whole point of fizzbuzz to filter out those candidates?)

Gigachad 4 hours ago

While this is true, it seems undeniable that if you use AI to do everything for you, you will never learn the skills. I'm seeing a massive amount of developers submitting stuff for review and admitting they have no idea how it works and they just generated it.

baxtr 4 hours ago

I wonder if you’re filtering for the right things.

We usually hire for problem solving capabilities and not so much for technical know-how.

That’s at least how I read your comment.

Gigachad 3 hours ago

Ultimately in a software development role you need both technical know how and problem solving capabilities.

This situation in particular was a React role so there is an expectation that when you list React as one of your skills on your resume then you know at least the basics of state, the common hooks, the difference between a reference to a value vs the value itself.

These days you can do a surprising amount with AI without knowing what you are doing, but if you don't have any clue how things work you'll very quickly run in to problems you can't prompt away.

gonzalohm 2 hours ago

Isn't wiring coding solving a problem? If the candidate can't do that then even if they use AI for coding how are they going to review the code properly?

andai 20 minutes ago

They were a forcing function for skillz and they no longer are. We need new forcing functions for skillz or we will become WALL-E blobs.

Well, they were ostensibly forcing functions... ten years ago everyone was paying the exchange student to do their homework and assignments for them, and that guy was paying his cousin back in his home country, but the whole thing is a bit more efficient now.

daymanstep 5 hours ago

Wonderful teachers that give unreliable information with total confidence?

entropyneur 5 hours ago

I had human teachers who did that in middle/high school. Took me many years to pick out all the hallucinated bits of "knowledge". I don't think the current models are any less reliable that what we currently have on average.

dguest 4 hours ago

I'll always remember my middle school science teaching telling us that nuclear fusion violates conservation of mass because the 2 protons in a pair of hydrogen nuclei combine to make helium with 4 nucleons. It's not true, but that's not the point.

But he was a great teacher anyway. He was engaging and kept the kids in line and learning. I eventually learned the truth, and most of my classmates forgot about it. Teaching, like flying a plane or driving a train, might become more about keeping watch over a small group of people and ensuring that things don't go off the rails, and that's fine.

3form 4 hours ago

This one feels less sinister than some other things at least to me, personally. You can reasonably doubt that the conservation of mass is violated and find out the truth based on that. But understanding more complex biology or historical context for some things? Granted, many of these things seem to be low stakes, but I'm sure there are some there are not (sex ed comes to mind).

zem 4 hours ago

to be fair, fusion does violate conservation of mass, just not the way the teacher explained it. the loss of mass is where the energy comes from.

mr_mitm 2 hours ago

Yes, there is no law of conservation for mass like there is for energy. Fusion is a good example for why it's not conserved. The teacher was right.

tardedmeme 9 minutes ago

There actually is a law of conservation of mass (it's the same law, because mass is energy) and it only appears violated if you forget about the particles that are zooming away at the speed of light. Of course the mass of a system changes if mass can flow in and out.

dguest 21 minutes ago

He was right that it violates conservation of mass. He was completely wrong that it violated it by adding 2 atomic mass units when hydrogen fuses.

In reality heavier isotopes of hydrogen fuse, conserving the total number of nucleons, but the resulting hydrogen has a lower rest mass than the parent particles. The extra mass is released as energy and the total energy is conserved.

By his logic the system either violated energy conservation (by creating nucleons while releasing energy) or was endothermic (creating nucleons from the surrounding energy).

3form 3 hours ago

Yes, together with mass-energy equivalency it would form a coherent argument, and then also a correct one - but the thing is that if incomplete, it still might sound funky enough to you to research it if you care.

I think it helps that it's a very narrow field to look at, compared to fuzzy and big-picture view of social studies, for example. So much room to be confidently wrong... And sadly I can't think of a solution, LLMs or not.

bernds74 3 hours ago

I had a chemistry teacher who told us that hydrogen reacts violently with oxygen, and this is how the hydrogen bomb works.

dguest 17 minutes ago

Hey it's a bomb made out of hydrogen! Also the deployment system for a thermonuclear bomb might involve that reaction in the rocket engine.

daymanstep 3 hours ago

I had a chemistry teacher who insisted that the fissile isotope of Uranium was U-238 not U-235. I challenged him on this multiple times and he refused to budge on this. I get that it's a simple mistake to make (it seems like U-238 is bigger so intuitively ought to be less stable) but he could have just looked it up and he didn't, I guess he was just so confident about it that he thought there was no way he could have been wrong about it.

oldsecondhand 3 hours ago

That's an American problem though. In most of Europe you need a masters degree to teach highschool and that involves at least an undergrad level of understanding the subjects you will teach.

E.g. in Hungary I had a university CS professor that originally wanted to be a highschool teacher and a highschool physics teacher that originally wanted to be researcher. Their choice of degree didn't determine which outcome they got. The researcher and teacher curriculum had an 80%+ overlap.

Bawoosette 5 hours ago

To be fair, that was much of my actual experience with human professors in university.

renticulous 4 hours ago

Veritasium proved that in a difficult challenge.

A Physics Prof Bet Me $10,000 I'm Wrong

https://www.youtube.com/watch?v=yCsgoLc_fzI

IshKebab 5 hours ago

Yeah one of my teachers was able to identify which high school I had come from due to something I had been mistaught.

Levitz 4 hours ago

Off the top of my head: DOMS being little crystals in muscles, tongue having separate areas for each type of taste, food pyramid, blue blood in the veins, the appendix being useless, body temperature doesn't change disregarding whether it's exposed to cold or to heat, and a whole lot of stuff related to politics and history I'd rather just omit (I don't live in the US).

All things I learned in school which were wrong information.

Not to mention, the current state of education is far worse. I don't think most realize how low the bar is.

akdev1l 2 hours ago

My biology teacher in school once tried to teach us that winds created by God. Not like spiritually or something but that God literally made the wind I guess.

My “earth sciences” teacher also once tried to argue with me against the universal law of gravitation. (no, she was not referring to Special/General Relativity. She didn’t agree two objects in a vacuum fall at the same speed regardless of mass.

autoexec 4 hours ago

They'll also encourage and praise you even when you're heading down the wrong path until you think you've uncovered the secret of the universe or proven that established science was wrong this whole time when really you've just been bullshitting with an engagement bot.

k__ 5 hours ago

Anti-intellectualism is at it again, hu?

victorbjorklund 5 hours ago

Like humans.

CoastalCoder 3 hours ago

I think we should go a little deeper on this idea.

We can all agree that both human "experts" and LLMs can sometimes be right, and sometimes be confidently wrong.

But that doesn't imply that they're equally fit for purpose. It just means that we can't use that simple shortcut to conclude that one is inferior to the other.

So where do we go from here?

oofbey 3 hours ago

I’ve always thought of the definition of “expert” as reliably knowing the difference between what is known, what is speculated but unproven, and what is unknown. People claim expertise in all sorts of things that they aren’t experts in. But true experts should not be wrong. They should qualify levels of certainty. This definition certainly works in the sciences.

p-e-w 5 hours ago

The amount of bullshit and blatant lies I’ve heard from my human teachers dwarfs the hallucinations produced by today’s LLMs.

mold_aid 5 hours ago

>LLMs can be wonderful teachers

Are they or aren't they

tardedmeme 3 minutes ago

As usual it depends. When it does well it's because it can do well. When it does poorly it's because you're prompting it wrong.

p-e-w 5 hours ago

A million times better than any human teacher I’ve ever had, for sure.

Now I’m certain that there exist those mythical human instructors who can do better, but that’s not worth much if 99.99% of people don’t have access to them. Just like a good human physician who takes their time with the patient is better than an LLM, but that’s not worth much either given that this doesn’t match most people’s experience with their own physicians.

vladms 5 hours ago

Did an LLM teach you a topic you did not feel like learning?

For me the best human teachers were the ones that managed to make me interested on topics that I thought are boring/useless (many times my opinion being stupid, mostly due to lack of experience).

So far with LLM I learn about things I know something (at least that they exist) and I am interested in, which is a small subset of things that one should learn during lifetime.

jimnotgym 4 hours ago

Well I have some evidence to support your hypothesis. During Covid my kids were at home, eventually with some kind of self learning website from school. I was upstairs working, checking in with progress on the parents app. Finish your daily school work and then you can game.

The kids learnt all about Team Fortress 2, Roblox, Rainbow Six etc. They also learnt how to game the learning system so it looked like they were doing their work.

throwaway132448 4 hours ago

Good point well made.

qsera 2 hours ago

>A million times better than any human teacher I’ve ever had, for sure.

Not really, not if you want to ask it deep questions. It won't have an answer that is deeper than something that you can find online, and if pressed it will just keep circling around the same response.

The reason is that this "thing" was never curious, never asked questions, and never really learned anything. It just has learned the Internet "by heart", and is as boring as a human teacher who is not really curious about the subject they are teaching, and has just got some degree by "by hearting" some text book. Of course it does it much better than a human, but it is fundamentally the same thing.

mold_aid 2 hours ago

>Now I’m certain that there exist those mythical human instructors who can do better,

You're certain that mythical instructors exist (?) who "can" do better?

Are human instructors more competent as teachers than AI teachers, or are AI teachers more competent as teachers than human teachers? No "this or that can happen," just a definitive statement please.

AI is likely a million times better student than my dimwit cybersec meatbags...er, majors, for sure, as well! Don't have a reliable way to measure or experience why/how, tho, so I'm not out here claiming it. Even if I did, why would I argue for their replacement?

IanCal 5 hours ago

They can be incredible. One on one teaching with an infinitely patient teacher who can generate interactive problems on the fly, for dollars a month? Wild. A year of paid ChatGPT would pay for about 9 hours of cheap tutoring here.

rockskon 4 hours ago

That's not going to work out the way you think it will when a student won't even know how to ask questions.

repelsteeltje 4 hours ago

I found this interview [0] on the subject of AI in CS education on the Oxide & Friends podcast very illuminating. Of course, Brown University CS != All education, but interesting angle nevertheless.

[0] Episode webpage: https://share.transistor.fm/s/31855e83

otabdeveloper4 49 minutes ago

The best frontier LLMs can't solve 4th grade math homework yet. Don't hold your breath on that collapse of education.

(Real mathematics problems, not American-style ""math"".)

pjc50 5 hours ago

"Education is just a CTF for the valuable flag of a credential. In this essay I will --"

magic_hamster 4 hours ago

Education is also figured out. You just need to learn, do and practice for yourself. Telling the agent "to just do it for you" is tempting, but it's not learning. You need to be deliberate when you're trying to actually learn and internalize.

Also, you could spin up your own educational agent with very strict instructions on guiding the user instead of just doing the work. Of course you can always go around it but if you're making an effort to learn, this is a good middle ground.

tardedmeme 37 minutes ago

When I did my first CTF, it was close to the deadline and I thought I had the extracted the flag from the program and the rest of the program was just filler, so I entered the flag, and it told me it was not the flag. It turns out the program multiplies the input by a pseudorandom matrix before comparing it against the flag, so I had to implement a matrix inversion and then get the flag. That's not the story though.

The matrix was always the same and the challenge was clearly designed so that the point was being able to read anything at all, not knowing how to invert a matrix, so I asked the creator what was up.

He told me that there were tools that would trace input values until they reached a comparison instruction, then print what they were compared against. Therefore it was necessary for every deobfuscation challenge to scramble the input in some way too complex for these tools to undo, before comparing it. Hence the multiplication by a pseudorandom matrix.

The point is, cheating tools aren't new.

chrismorgan 5 hours ago

Meta: this was submitted with the article’s title “The CTF scene is dead” which I found very easy to understand. It has just been updated to use the subtitle’s first sentence, “Frontier AI has broken the open CTF format”. I find that much harder to grasp, rather like a garden-path sentence. My immediate thoughts were that “Frontier” was a company name, and that there was some file format named CTF. If you don’t know about Capture The Flag contests, the change doesn’t help. If you do, I think the change makes it worse.

IanCal 5 hours ago

If it helps I understand the second much better and feels less clickbaity and includes more info. I do agree with the points you made about the confusion although I find frontier a term used in this area a lot, “frontier AI models have” would probably resolve that.

Jenk 4 hours ago

If the title simply said "AI is out-performing humans at CTF" then none of this confusion exists. Nothing is "broken," we don't need to be superfluous with "frontier," and the point is still there.

IanCal 4 hours ago

But the article is arguing it is broken. That’s the point. You can disagree but that’s very much that the author is writing about, not a curiosity, and that it’s these top models that are not custom security models.

jofzar 5 hours ago

Imo frontier is too niche and specific, if you know what a frontier model means then it's fine, but if you don't then it's negative/detrimental to the title.

"new" does the same thing and is probably just a better descriptor then frontier

jack_pp 4 hours ago

if you are on HN and have no idea what "frontier model" would mean maybe it's time you found out.

hbbio 4 hours ago

I also misread the updated title.

"Frontier models break the open CTF format" is good

"Frontier AI..." means wtf is Frontier AI.

Because of course it exists (just googled it): https://frontierai.company/

rockskon 4 hours ago

But then you're not acting as a billboard promoting AI. Isn't that partly the point?

aaron695 3 hours ago

[dead]

jsoaoxhd 3 hours ago

Why do people always hijack threads to discuss titles? Most articles have terrible titles. Just downvote it and move on.

KomoD 10 minutes ago

You can't downvote a submission.

dandellion 2 hours ago

Why do you contribute to making this thread longer? Just downvote an move on.

skinfaxi 58 minutes ago

They can't downvote.

himata4113 6 hours ago

I was writing an obfuscator recently, I just had the model deobfuscate and optimize the code back to original and I kept improving the obfuscator until it couldn't. The funny thing is that after all this I also ended up with a really strong deobfuscator and optimizer which is probably more capable than most commercial tools.

The solution is just to make CTFs harder, but when do CTFs become too hard? Maybe the problem is that 'hard' CTFs are fundementally too 'simple' where it's just a logic chain and an exhaustive bruteforce towards a solution since there really are limited ways to express a solution in plain sight.

Or maybe human creativity has been exhausted and we're not so limitless as we thought. Only time will tell.

I had another idea spring to mind: we could hide two flags, one that could only be found by ai agents and not humans or tools written by humans.

Trung0246 3 hours ago

Interesting, what I just did recently is basically the same of this as I tried to push the limit of js obfuscator as much as possible by keep forcing gpt/claude deobfuscate final output then having gpt improve the tool to break the deobfuscator.

Do you publish it somewhere? Here's a sample my my js obfuscator output: https://gist.github.com/Trung0246/c8f30f1b3bb6a9f57b0d9be94d...

koolala 5 hours ago

A portion could require astral projection and computers can't do that. Or maybe just a VR mini-game like the 90s always imagined.

himata4113 5 hours ago

bringing CTF solutions into the real world is a really good idea! I didn't even think of this until you mentioned it.

we have very powerful simulation tools so something like "project a pattern at these angles" wouldn't really work as you could simulate that.

I guess something cool is that we can make simulating the solution very expensive, but in real world it would be free since it's analog... As long as simulations take longer than it takes for a human to find a solution it would be a pretty good way to deal with it. I am sure people smarter than me can come up with something.

Maybe I was too early to dismiss human creativity.

dguest 4 hours ago

Maybe CTF is dead, but there are plenty of fun problems in the real world -- ask any scientist, engineer, or medical researcher.

There are a million places where a computer can interact with a non-digital system in a loop.

- Tune an FPGA, or a whole data-center, or just a physical computer.

- Make a drone fly somewhere.

- Design a selective toxin (or anti-toxin).

Or, you know, get more people to click on adds. All totally possible to automate.

koolala 3 hours ago

Using real-life calculators to add? Calculate the Flag. I don't think it is dead at all. It's like mixing in board game / escape room / science / engineeer/ medical research elements.

simonTrace 8 minutes ago

AI-generated phishing is the scariest development in cybersecurity right now. Click rates on AI-written phishing emails are 54% compared to 12% for traditional attacks. Automated real-time detection is the only scalable answer at this point

lachiflippi 44 minutes ago

The "CTF for fun" aspect has been dead ever since the winning teams had thousands of dollars of rewards waiting for them. Of course people are going to use anything that's not explicitly forbidden by the rules to win. Introducing what amounts to an "I win" button that both can't be prevented by rules and is accessible to anyone didn't "break the format" anymore than the epidemic of giant merger teams did a couple years ago, it just broke the community because you now don't have to actually talk to other people to cheat anymore.

Many CTFs have switched to a dual-leaderboard format recently, one for "agentic teams," one for the rest. If all you care about is "learning" and imaginary internet points, you can just participate as a human team and adblock the AI scoreboard, and maybe lobby CTFTime into splitting their rankings as well.

hoyd 5 hours ago

«That feedback loop is breaking. If the visible scoreboard is dominated by teams using AI, a beginner is pushed toward using AI before they have built the instincts the AI is replacing. That is an anti-pattern. It prevents active learning, and active struggle is the bit that actually teaches you. It is also completely demotivating to put in real effort and see no visible progress because the ladder above you has been automated.»

This stands out to me, and speaks perhaps broader than the article itself? I’m sure this has been in the spotlight before, but well put for many areas I think.

black_knight 4 hours ago

I see this with beginner programming students at university. They get AI to help them with assignments, with the intention of learning, but ultimately they do not get the understanding they would have if they had done the assignment themselves. Then they are at a deficit for learning more advanced topics.

My fear is that they never get to the level they need to be at to create good software even with the help of AI. So, although an expert with AI can create great software, that is not where we end up. In stead we will have vibe coded messes by people who barely have any grasp of what is going on.

SirHumphrey 5 hours ago

Competitive programming scene always included offline competition and with AI they are becoming more important (and in general they were more fair even before). If CTFs are to survive, they should probably try to adopt this strategy.

You could even go so far that anything loaded on your computer is fair game, but not more than that (certain competitive programming competition for example allow unlimited amount of paper material - for CTFs you probably need much more than that, therefore electronic).

tromp 6 hours ago

https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurit...

still has no mention of AI, but that will likely change as they increasingly dominate competition.

rurban 6 hours ago

I don't do CTF's but took part at the security workshop for fun ~2 years with my Android phone only. I was first with the first simple challenge, but then couldnt continue because my phone was just too limited. But I watched what the others did. And a young Indian guy did everything with ChatGPT then. I found it silly, but amusing, because he actually got second. There was no Codex nor Claude then. Nowadays it must be dead for real, because I would solve everything with my agents, as I do in the real world.

susam 6 hours ago

I have normally found any sort of timed technical competition intimidating. Even so, about 6 or 7 years ago, after being persuaded by a colleague, I participated in a few CTFs. I am glad I did, back when this type of thing still meant something. I have kept a screenshot from one of the CTFs that I am quite fond of: https://susam.net/files/blog/ctf-2019.png

parasti 4 hours ago

I can't help but draw parallels with video games. Aimbots in competitive multiplayer games is a well defined issue: it's considered cheating and frowned upon, players caught cheating are banned from the game. Tool-assisted speedruns (TAS) where a player attempts a world record at completion in a single-player game is another face of the same concept (computers help you win), but one that is socially accepted as long as runs are clearly labelled as TAS.

ViscountPenguin 4 hours ago

The biggest difference would be the fact that you can discover video game cheating through some kind of trace. Speed running communities go pretty hardcore on that kind of thing nowadays.

It's a lot harder to detect cheating when your only trace is how fast someone submitted the string CTF{DUck1e_Pwned}

justanotherjoe 4 hours ago

Sure if the goal is entertainment and sports, you're right. However, unlike chess or counter strike it's downstream from a real needed utility. Like, is there a point to do it anymore? (ofc there is, but still, it's been devalued from the perspective of the 'real utility')

nrabulinski 2 hours ago

It’s literally not. The most interesting and satisfying CTFs have never been grounded in reality, it’s just been an expression of mastery, both from players and authors, with a few notable exceptions. But they’re that, exceptions, not the rule.

bornfreddy 2 hours ago

I guess this is very similar to what happened to demo scene, in some way. The limits are what makes these problems interesting, and once we have better machines / tools, the incredible skill is no longer prerequisite, making everything less interesting for participants. Sad, but - such is life...

amingilani 6 hours ago

I don’t think CTFs are dead, they’ll just evolve. The difficulty level will need to be increased or the rules locked down. Just like sports and racing persist despite the existence of performance enhancing drugs and rocket technology.

I just did a CTF where I was in the top 10. It was the first CTF I completed and I used AI because the rules permitted it. That said, I couldn’t solve all challenges.

But yes, it was significantly easier now than I last attempted one. Even manually solving with AI assisted assembly interpretation was much easier.

mort96 6 hours ago

Increasing the difficulty level is a terrible solution. The problem with CTFs isn't that they're too easy. Making them harder just makes them even less accessible to people who don't cheat. It'd be like seeing people who put hidden electric motors in their bikes during Tour de France and conclude, "oh we just need longer distances and steeper hills".

StrauXX 3 hours ago

LLMs don't tend to help much when solving challenges beyond their skill level. Either they one-shot a challenge, or thei are almost useless as a companion for them.

Retr0id 4 hours ago

That doesn't work. The thing that made CTFs fun is the fact that the challenges are solvable in a short-ish timeframe, usually a day at most, if you have the requisite skills and talent.

yk 2 hours ago

There's something funny about complaining about cheating in a hacking competition.

Well actually I get it. In cycling motor doping, putting a hidden engine into the bike, seems more offensive than regular doping. I think this is because there is a continuum from eating well to taking supplements to injecting stuff, but having a engine breaks a fundamental idea about cycling. Similar hacking is about cleverly abusing the rules.

raphman 6 hours ago

Interesting and well written article that mirrors/foreshadows how LLMs do and will change other scenes.

As I don't know much about the CTF scene, I looked for other takes on this topic.

Here's an article from 2015 about how tool-assistance already changed CTFs:

> Individual skill will undoubtedly be a factor next year. But, I'm left wondering whether next year's DEFCON CTF will tell us anything more than how well-developed each team's tools are (and how well they can interpret the results).

https://fuzyll.com/2015/ctf-is-dead-long-live-ctf/

But there are quite a few recent (2026) articles with the same core message as in the original article, e.g., https://blog.includesecurity.com/2026/04/ctfs-in-the-ai-era/ or https://k3ng.xyz/blog/ctf-is-dead

And here's someone explaining how Claude Max allowed them to win CTFs:

> I had always been interested in CTF as one of the only ways people could compete and show off their skill in coding/problem solving on a global scale. It was just too difficult and didn't make sense for me to learn the fundamentals as an electrical engineer. As time went on, I got better and better, and it was hard to tell whether it was because of experience or if it was because of improvements in AI.

> I accomplished my goals, and for that reason I'm quitting CTF, at least for now. [...] I'd like to think I highlighted the problem before it became a bigger issue. So, how do we fix this? Teams and challenge authors losing motivation is not good. CTF dying is not good. AI bad. Or is it?

https://blog.krauq.com/post/ctf-is-dying-because-of-ai

The only article that saw LLMs as a non-negative force for CTFs was this one. Fittingly, it sounds like LLM output ("Let's be honest", "This is where things get interesting.") and only contains hallucinated references.

https://caverav.cl/posts/ctfs-not-dead/ctfs-not-dead/

kevinsimper 6 hours ago

You could make it offline and with provided laptops only, just like with the competitive CS2 scene.

sheept 5 hours ago

Offline CTFs could also incorporate physical security challenges, like lockpicking

tylerchilds 5 hours ago

I do like the idea of escape the room games becoming the cybersecurity employable competition meta

Retr0id 4 hours ago

They often do

hsbauauvhabzb 5 hours ago

Ctfs need preparation and unconstrained internet, even if you block domains it’s possible to tunnel out

Retr0id 4 hours ago

Unconstrained internet is nice, but I don't think it's a hard requirement. Just tricky to enforce, even in-person.

StrauXX 3 hours ago

It is a hard requirement. Once you reach higher levels of challenges you spend most of your time reading through RFCs, web sepcs, Github issues, mailing lists, papers, random bugtrackers and library/framework code. There is no way to create a whitelist for that. Besides, a firewall won't stop good hackers.

Retr0id 3 hours ago

Normal CTF workflows can involve a lot of research but that's not the point. You can design self-contained challenges with offline solving in mind, and bundle any truly necessary docs/src/etc. with the challenge download.

sheept 5 hours ago

Presumably if you block domains, you wouldn't be able to use AI to find a way around the block. So doing so demonstrates at least some human skill

hsbauauvhabzb 5 hours ago

Or forethought, I’m sure you could ask an AI how to circumvent any blocks.

belabartok39 5 hours ago

Use jumpbox to access CTF. Disable all wireless for the playing hall.

hsbauauvhabzb 5 hours ago

I think you’re forgetting hotspots, or laptops with inbuilt 4/5g

swiftcoder 3 hours ago

Faraday cages exist. Finally a use for all those damn SCIFs tech companies were building in the late 2010's...

eastbound 5 hours ago

Since real-life situations involve AI, banning AI would make CTFs just a simple game, not a demonstration of capabilities and talent.

mort96 5 hours ago

What do you mean? Solving a CTF challenge demonstrates way more capabilities and talent than just asking a chat bot to solve a CTF challenge.

loeg 5 hours ago

They always were just a game?

xiphias2 4 hours ago

,,a beginner is pushed toward using AI before they have built the instincts the AI is replacing. That is an anti-pattern.''

The same article talks about CTF skills as a way to learn about security best practices and separately a sport.

In reality it was all about learning an extremely important skillset (securing/attacking software and systems) that is getting automated.

The real thing the author seems to be frustrated about is AGI is coming in computationally verifiable domains first, and lot of his skillset was taken over in a big part.

copx 4 hours ago

>If adaptation means accepting that the scoreboard is now an AI orchestration benchmark, then we should say that honestly instead of pretending the old competition still exists.

This is like someone complaining that making machine parts has been ruined: Skillful craftsmen used to make them by hand using manual tools!

Nowadays the CAD/CAM/CNC cheaters have almost completely automated the whole thing. How is the next generation of craftsmen going to learn how to craft a gear by hand when the process of gear making has been reduced to pressing start on a CNC machine?!

See what I mean? Sorry, I think this article is just Luddite. I can empathize with the pain of your beloved craft basically being rendered obsolete by new technology, but the process can neither be stopped nor is it bad in general.

The manual skills you trained with CTF puzzles are now simply no longer relevant . (Field-specific) "AI orchestration" is the new cyber securtiy skill if LLMs really have become so good at this, and what the author used to do manually then has the same value as being able to craft a gear by hand.

raddan 4 hours ago

The way I read the post is that the author is disappointed that the community is gone. The CTF was just a reason for a number of like-minded people to organize around an activity.

Indeed, in the real world, plenty of people organize to do formerly-skillful tasks together. I have not personally crafted a gear by hand, but I have built a house in a long-abandoned style with a group of people only using hand tools.

There _is_ a danger that society forgets how to do these things. During that house-building exercise, there were many tricks of the trade that, while likely documented somewhere in a book, would have been difficult to reproduce without seeing a demonstration. From the standpoint of “does it matter?” it depends on what you care about. We absolutely do not need cruck-framed houses with scribed joints. Modern construction is faster and cheaper and lasts long enough. But it would sadden me greatly if practices like this faded from memory, because it’s one of those things that makes you gasp “wow!” when you see it. And your appreciation only deepens when you try it yourself.

lokrian 5 hours ago

Is AI also superior to humans at black box challenges and attacking actual targets on the internet? That seems like a really important question.

Avamander 4 hours ago

No, the search space is much more vast and the feedback loop almost nonexistent.

The reason LLMs can do CTFs so well is partially because the challenges are usually designed to avoid wasting time and to introduce a single concept without noise.

spacedcowboy 4 hours ago

The first paragraph on anything with an acronym in it should explain the bloody acronym. I assumed CTF was an encryption standard, given the headline. It was only coming here and reading the comments that made me realise it's a game-format ("Capture The Flag").

jaffa2 4 hours ago

Capture the flag the only expansion of CTF that i know but even if it is capture the flag this still doesnt make any sense. Like Quake CTF?

arm 4 hours ago

https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurit...

motbus3 5 hours ago

I think soon there will be ways to trick this models and I think when it happens it will be yet another layer like aslr

These models seems completely unbeatable only in the ads. There are 100+ times way someone puts Hindi Yoda talk In Morse Code and it goes nuts. The reason they are going to hard for PR Marketing on this is because they know it is a matter of time.

Avamander 4 hours ago

The more you obfuscate a topic against LLMs the lower the educational value of a challenge.

The only things that works is novelty and obscurity. LLMs still suck with things mentioned in the footnotes of datasheets and manuals, things that deviate in subtle ways, unique constructions that alter something very very common. It's hard for LLMs to avoid common pitfalls in terms of making assumptions, while staying on track.

jimnotgym 4 hours ago

You can still do competitions. But you'll all need to fly to the same place and work on laptops with a fresh install of Linux. 1 hour to install tooling then Internet off, challenge revealed.

Not as easy logistically...

SoylentOrange 5 hours ago

Great article, well written, and good analogy to chess. I’ve been playing competitive chess most of my adult life and I think that the solution lies in how chess dealt with this problem:

Explicit ELO measurements with some cheating detection. AI assistance wholly banned. As you climb the ELO ladder, detection gets more onerous. At top level during online events, anti cheating teams require the use of both monitoring software and multiple cameras.

Idea is that you can cheat pretty easily at the lowest levels but it gets less easy the higher you go. This allows for better feeding into the truly elite competitions.

I think chess’s very firm stance that AI is never allowed in competition (neither online nor in person), rather than CTF’s acceptance, was the right call.

salt4034 2 hours ago

Yes, chess has been dealing with AI for decades at this point, and it's amusing/frustrating that so many other communities are deciding to re-discover everything from scratch, rather than just learn from the chess experience.

If CTF is a player-vs-player event, then AI should just be banned outright, otherwise it will devolve into AI-vs-AI, which is just not an interesting competition format, as we learned in chess. Compared to FIDE top events (which bans AI), only a tiny niche audience actually watches the Top Chess Engine Championship (AI-centered). It turns out what we care about is not whether chess can be solved by any means available, but what are the limits of the human mind in learning chess.

Pretty much all chess coaches/educators also warn against relying heavily on AI during learning; engines only give you an illusion of understanding.

TrackerFF 4 hours ago

Question: Was this website made with Claude?

I've seen that exact font and color scheme a dozen of times the past weeks.

vagab0nd 5 hours ago

This left a strange feeling. The article reads as extremely bleak. But from a different perspective this is extremely bullish for AI.

Avamander 4 hours ago

LLMs managing the "coloring book" equivalent of something is not bullish for the "art" version of something.

The intent for most CTFs is to provide a meaningful challenge that concerns a single topic without introducing noise that wastes time. Of course a training exercise is easier to complete for an LLM.

saidnooneever 4 hours ago

Do CTFs like Lan parties or factor in new tooling avalable to people. change is not death. or death is not an end. either way, people will enjoy applying and showing off their skill. competing with eachother on a human level,.with or without ai tools.

dostick 2 hours ago

Unable to find what “CTF” means, since it doesnt look like referring to Capture The Flag gaming

yc-kraln 2 hours ago

It does--but a particular form of Capture The Flag where there is a computer system and the "capturing" is breaking in or exploiting a security issue in that system.

r4indeer 5 hours ago

I'm conflicted on the use of AI in CTFs. On the one hand, they are supposed to mirror real-life scenarios, so of course you should be able to use any tool that would be available to you in real life.

On the other hand, CTFs are fundamentally a game and a competition which are supposed to be fun and compare and improve ones skill. So when I let an LLM generate the entire solution for me, what's the point anymore? I did not learn anything. I did not work for that place on the leaderboard, I just copied the solution. And worst of all, I did not have any fun. It's boring.

So how does using AI as a solver not feel like cheating?

virtualritz 5 hours ago

Chess and Go are not dead just because Ai got better than humans at these games.

What am I missing here?

jofzar 4 hours ago

These have very strong anti cheats and in person is very stringent on no electronics.

Its not really a good comparison

hnlmorg 5 hours ago

You aren’t allowed to use tools to play competitive Chess / Go but that are required for solving CTF.

lugu 3 hours ago

Read the article.

virtualritz 2 hours ago

I read the article. Their chess section makes no sense as in "why this wouldn't work for CTF".

But I don't know enough that's why I asked.

I imagine one could do CTF in public, machines you work on vetted/prepared to some spec, yada yada.

If chess and Go can do it why can't CTF?

That was my question when I wrote "what am I missing here".

artninja1988 3 hours ago

https://news.ycombinator.com/newsguidelines.html

"Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that"."

eecc 6 hours ago

“solve”, why not solution? Like “spend” and not expenditure, why use the verb as a noun and not care about grammar?

sheept 5 hours ago

These examples that you're calling "verbs as a noun" are standard grammar. You can't just invent simplified rules about a language and declare it wrong when the rules fall apart.

iainmerrick 6 hours ago

They’re shorter.

Why so pedantic?

tkel 52 minutes ago

Pretty ironic that this article was also written using LLMs. It has all the LLM-isms.

Grimburger 6 hours ago

Very impressed that OP has gone from starting university in 2021 to becoming a Senior Security Engineer.

It's an incredibly exciting time in security research in my humble old man opinion.

Think the cadence of new exploits is perhaps a good measure of that rather than subjective thoughts by anyone regardless of experience.

chvid 6 hours ago

What is CTF? And why is the cyber security world filled with silly gaming references?

mort96 6 hours ago

Capture The Flag is a cybersecurity game where the organizers set up a bunch of intentionally vulnerable computer systems with a "flag" on them, a string that's "supposed to be" secret but is accessible through exploiting the vulnerabilities. This may be a line in /etc/password, a string in memory, a field in a database, whatever. The goal of the game is to hack into the computer systems, find ("capture") the flag, then copy/paste it into the organiser's scoreboard website to prove that you solved that particular challenge.

It's pretty fun. Or at least it was, back when you had some sense that your competitors were competing on an even playing field and just beat you because they were better than you.

I wouldn't say the name is a "gaming reference", it's just a descriptive name for a game.

throwa356262 6 hours ago

https://en.wikipedia.org/wiki/Capture_the_flag_(cybersecurit...

Its a war game reference I guess?

Gathering6678 3 hours ago

I thought a company called Frontier broke a file format CTF.

slurpyb 4 hours ago

How to motivate cybersec best outcome reddit 2026 no mythos

vasco 6 hours ago

My first ever was Stripe CTF in 2012 I think, I still wear the shirt I got (now super fainted) from passing some challenges. I was a student in portugal and remember receiving the shirt for it and thinking, maybe those Americans aren't any better than me and I can compete at the same level.

I never got super into security but it gave me the confidence to play in the same field and lose the stupid aura I had that somehow "rich americans" would be better than me at everything because they had better universities or because of Hollywood or something.

Sad that another cool thing is lost to AI but I guess kids will learn in other ways.

JackSlateur 4 hours ago

No relationship with the CTF (Common Trace Format) format ..

monarx 6 hours ago

used to see some really good CTF videos show up on youtube and now nothing like that shows up on the feed

walletdrainer 6 hours ago

>I started playing CTFs in 2021

>and the old game is not coming back

For many people the CTF scene was already dead in 2021 because it had turned into something unrecognisable.

In reality it’s just different.

lukan 6 hours ago

Well, I had to google what CTF means (capture the flag, a hacking competition), so surely cannot judge here, but the text indicates that with AI some things are very different today:

"That makes open CTFs pay-to-win. The more tokens you can throw at a competition, the faster you can burn down the board. Specialised cybersecurity models like alias1 by Alias Robotics are becoming less relevant compared to general frontier LLMs. The competition is turning into "who can afford to run enough agents, with enough context, for long enough.""

walletdrainer 5 hours ago

There are two different schools of thought:

1) It’s OK to do just about anything to win a CTF, including installing malware on the organisers computers months before the actual event so you’ll have an easy time stealing the flags.

2) It’s not ok to try and win the CTF with a solution the authors did not intend.

Recently the #2 crowd has been winning because the hacking scene has turned corporate and boring. People started to partake in CTFs in the hopes of landing a job(!)

CTFs are indeed ruined for those people, I personally don’t mind.

For the people in group #1 LLMs change little. Attacking the challenges directly was always a last resort.

mock-possum 6 hours ago

Isn’t that the bitter lesson in a nutshell? “Specialised cybersecurity models … are becoming less relevant compared to general frontier LLMs.”

Retr0id 4 hours ago

I started playing in 2015 or so and had mostly stopped by 2020. Not because I felt it was "dead" exactly but it just wasn't hitting the same for me. By then it wasn't "the winner has the most LLMs", but "the winner has the most members on their team". I merged into one of the mega-teams and it just wasn't fun any more.

Grimburger 6 hours ago

>Learning about eternal September in May 2026

Hits different doesn't it

petterroea 4 hours ago

I helped arrange my country's longest living CTF this year. Our CTF is *made for amateurs*, but we always have challenges for intermediate to skilled players and the top of the scoreboard is usually topped by them. It is the compromise we have - amateurs get so many tasks they struggle to solve them all, and the pro's get to win. Our goal is to nerdsnipe people who are curious into trying our CTF by offering easy beginner tasks, and then get them hooked enough to stick around for the intermediate ones, even if it takes them a day to solve one.

This year, multiple groups on the top of the leaderboard were clearly abusing LLMs. You can tell because they know nothing of what a CTF is nor the terminology, nor really the fields the challenges were about when they were talked to. They were obviously amateurs.

It was pretty depressing to hear how unaware they were of how obviously they did not fit in to the type that usually is on the top of the leaderboard. It seems they seriously think they were under the radar. If it was one group it could be a freak incident - some times someone just shows up and curbstomps competition. But there were many groups like this this year. They also had a certain smugness to it - one staff reported that a group was hinting to other teams about their "super weapon". Another group credited their "secret third team member they didn't want to talk about".

I use LLM frequently and experiment with it a lot, both at work and on my free time. Nowadays they are good enough to have value and I am interested in learning more about that. They let me spend more time on hard problems and avoid spending the day on simple CRUD. I say this to say that LLM doesnt have to equal bad, it is a tool, that's all. However, I generally avoid LLM communities because many LLM fans are lazy and unskilled people who are just happy they can feel they are worth something even if they have no skill. They don't really have much to provide of conversation. If anything, from reading the CTF crowd this year, the rise of LLMs has just meant more of these people can stomp on and harvest the CTF scene for self validation.

This is not me trying to gatekeep who can play CTF. Anyone is welcome, but there is one condition: You are here to learn and have fun.

The conclusion many I talk to has come to is that nowadays, it is harder to learn to put in hard work and become good at something because there are just too many ways to cheat and take shortcuts. I suspect in the future there will be a shortage of useful people - the kind that have critical thought and know the value of doing something properly. This doesn't mean "Not using LLM", but as said by many on HN before you need a certain seniority before LLMs are useful augmentations to your skills and not just stopping you from learning yourself.

I agree with the article. Anything but physical competitions with strong security - think professional e-sports with organizer-provided PCs, is over. But I think one of the most interesting things to take away from my CTF experience is that the bottom of the leaderboard was still full of amateurs slowly working their way up - it is a few rotten apples that ruin the fun for most, and there are still plenty of people who want to learn and deep-dive.

deafpolygon 6 hours ago

Unrelated, but does anyone find this site incredibly hard to read?

walletdrainer 6 hours ago

Bizarre font and poor contrast, yep.

The text itself being exceedingly long for no obvious reason doesn’t help.

lukan 6 hours ago

Poor contrast? White on black?

And if you think it was too long, what part would you have shortened? I never knew about the scene and found it interesting to read this personal take on it.

swiftcoder 3 hours ago

> White on black?

According to Pikka, the paragraph text is Taupe Grey (#92908a) on a Liquorice (#111110) background. That's... pretty far from black and white.

3qw128 5 hours ago

The article is the thickest of AI slop. Don't believe anything.

sevindob 5 hours ago

ikr, if bro can't be bothered to write an article himself then anything he says is automatically suspect

tommy29tmar 42 minutes ago

[flagged]

phoebe_builds 4 hours ago

[flagged]

utopiah 5 hours ago

Right, the same way that car racing has "broken" jogging. This is so dumb. /s

The whole point of competitions is to provide a safe environment thanks to a set of rules all participants AGREE on in order to progress together.

If new tools "break" the competition, we change the rules and that's A-OK.

CTF isn't a natural phenomenon, if tools change, rules change, simple.

swiftcoder 4 hours ago

The only way this actually works is if you move CTF to in-person only. There's no other way to reasonably prevent the whole leaderboard being taken up by whoever spent the most on tokens.

zzvimercm 5 hours ago

[flagged]

rqd3 5 hours ago

tldr; adapters took my elo

mikehuntt 4 hours ago

[dead]