Hacker News

Ask HN: What was your "oh shit" moment with GenAI?

38 points by andrehacker 22 hours ago | 119 comments

Most of us were amused when DALL-E and its peers went mainstream, and we were quick to point out the obvious flaws.

Then ChatGPT hit the scene and again, many of us dismissed it as a parlor trick that would never amount to much.

Using LLMs for coding initially was a only small step up from basic code completion, and a welcome farewell to Stack Overflow.

I am curious: what was the specific moment that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?

jzemeocala 51 seconds ago

I bought an Alesis QS8.1 super cheap in perfect condition (was a top grade digital piano/synth in the 90s).

and then i realized that ALL of the software (which i collected from defunct websites and archived on github) related to it was ancient and after a while of getting tired of using WINE every single time i decided i wanted a cross platform modern equivalent that did everything that several of these different programs did (plus break out some stuff that was now potentially possible with modern computer)

i thought it would be extremely hard because the computer to synth communication is pretty much only via sysex commands (of which the actual wave file encoding protocol was undocumented)

Claude walked me through examining the some of the original software in GHIDRA, and I had a working demo that night.....now im just playing with adding new features to it.

andrewthornton 23 minutes ago

My furnace went out during the 2025 holiday and I couldn't get an appointment with a repair person for 2 days. It was getting very cold in my house so I went into my attic and made several videos of the furnace attempting to start and gave it to gemini. It diagnosed the issue immediately and had me spin one of the components (a small exhaust fan) while the furnace tried to fire. It came on immediately. I had to do that several times, but it worked until the HVAC service showed up.

ssl-3 8 minutes ago

That's pretty great.

(Though that's also the kind of hands-on troubleshooting step/fix that a person could just google for and find pretty easily back before the internet got all fucked up.)

tonyedgecombe 15 minutes ago

I've been fitting a kitchen and chatGPT has been useful to bounce ideas off and resolve issues. Of course if IKEA's documentation wasn't so sparse I wouldn't need it but that's another story.

I guess I'm seeing similar benefits to a novice programmer. Professionals would scoff at my work but they are expensive and difficult to work with. Meanwhile I'm getting the job done.

On the other hand I'm not touching AI for any development work. I'm too worried about my skills atrophying or not properly learning anything new.

alberth 19 minutes ago

Do you mind explain more. Did you just prompt to Gemini what was happening, did you give Gemini photos of Furnance, etc?

gwbas1c 6 minutes ago

> and made several videos of the furnace attempting to start and gave it to gemini

I assume recorded videos and uploaded them in the Gemini phone on their app; and then probably said "what's wrong?"

Gemini is very good at those kinds of things. I recently got some ratcheting straps and needed to use them, but at the time I didn't know what they were called, so I didn't know what to search for on Google. I opened the Gemini app, pushed the button to take a picture (just like in text messages,) and included a message that was similar to "what is this and how do I use it?"

nrjames 16 minutes ago

We were experiencing abnormally high electrical bills and I could not figure out what was happening, so I downloaded the granular usage data (15 min increments) from Duke Energy, explained what we had in our house and when we typically used those items (washer/dryer, EVs, etc), provided a rundown of our energy usage plan, then asked Claude to build me a Streamlit dashboard that would help us understand what was going on and predict what was going to happen over the next months. The dashboard had a few simple toggles a levers. Claude was basically able to one-shot this, knew how to manage the XML from Duke Energy, etc... In about 20 minutes of prompting, I had a very comprehensive dashboard that was extremely helpful not only in diagnosing that specific issue but also in helping us understand how to further lower our electrical bills.

lithboy 11 minutes ago

This can be a product.

gwbas1c 3 minutes ago

When I don't know how to use a specific API, or how to do a task, I'll often give some high-level instructions to Copilot (Claude's model) in Visual Studio, and then review what it comes up with very, very closely. (Including lookup up specs so I can confirm that it did it correctly.)

It's much, much faster and easier than starting from scratch.

simonw 37 minutes ago

ChatGPT Code Interpreter back in ~March 2023. I uploaded a CSV file (of police incidents in San Francisco) and watched it load that into Pandas, show me some charts, then export the data to a SQLite database file for me to download.

I write software for data journalists and this new thing appeared to be able to do everything I wanted my software to do just as an unplanned side effect of having the ability to run Python against a folder with some uploaded files in it.

With hindsight it was my first exposure to a coding agent, but we hadn't named the category at that point.

evdubs 55 minutes ago

I tried to see if an LLM service provider could rewrite some legal docs where nothing was hallucinated in order to follow a consistent format to see what may be missing in the document. It could do that.

Next, I wanted to see if this could be done with a local LLM. Gemma-4 handles this fine with an 8GB video card and a large context (128k).

Next, I wanted to see if the model could also OCR these docs and translate them. The same model can handle that quite well.

This was when I realized LLMs should be great for handling work where:

- I already know what I want to do

- I already know how to do it

- I don't think this task will help develop skills I find to be valuable

- If I have to do it manually myself, I will probably cut corners

So now I view LLMs through the lens of, "what work can I send to an LLM that I otherwise would not really care about doing."

SoftTalker 37 minutes ago

Yes, the best results I've had using LLMs are for tasks where simply reading and reformatting/translating/summarizing are the goals. They are much faster and less prone to boredom doing these things than humans are. For now.

jerome-jh 9 minutes ago

Recently, Claude (through Copilot) found a hardware issue on our product. I was asking it to find an issue in a specific feature of a device driver, that could cause what we observed. It determined the feature was correctly implemented.

Then it hinted that depending how the hardware is implemented, it could cause the observation. It turned out the hardware was implemented as suspected by Claude.

I was already convinced it knew the codebase, somehow, more than I do. Now it is just as if its knows the product and its use as well.

dannyobrien 10 minutes ago

I got early access to the pre-ChatGPT OpenAI API (actually by pinging someone from OpenAI who posted about it on HN). At work, we were setting up to play a livestreamed JackBox game for a charity event. This would have been in 2019.

In a previous life, I'd been a writer for the original You Don't Know Jack game (the UK variant), where the job was to crank out as many funny quips about a topic as you could, and then use a handful of them in the recording of the game itself. Some of the later JackBox games are like that, but for the players -- you're given a set piece, have to come up with little funny improvisations within a time limit.

As an experiment, I tried the set-up lines with the OpenAI API, and see whether it could come up with some responses. Of course, 90% of them were unfunny or incoherent, but 1/10 were not bad, or even pretty good.

I'm not sure that would have been impressive to anyone else -- but remember, I'd had this as a job, and sat in a writer's room, where everyone did this, for hours. In that environment, you expect a large proportion to be duds: the discipline is keep pumping them out, and not flagging creatively until you find a rich vein. I realised that this was a tool that would have been the perfect complement to that work -- and it was a pretty good JackBox player too.

carodgers 8 minutes ago

I was trying to play tic tac toe with one of the earliest version of ChatGPT. It kept playing on used squares and missing obvious connect-three's. That's when I realized, "Oh shit, this model design is probabilistic, oblivious to the underlying reality, and not bound by any reasoning rules. We should never ever build any load-bearing component with this."

smokel 47 seconds ago

You mean, somewhat similar to how quantum mechanics underpins most of reality?

bonoboTP 11 minutes ago

The big one was definitely ChatGPT upon release in 2022 and specifically when people showed how it can role play as a Linux terminal and you can narrate events like "the data enter is now on fire" and "run" nvidia-smi, it would show high temps on the gpus etc. Or you could "explore" the homedir or some famous person. It convinced me that if it can understand so well how terminals work, tool use and agents are around the corner.

Then Opus 4.5 convinced me that this has finally arrived. In 2022 I expected things to arrive faster actually, in 2023-2024. I expected we'd have much more realtime collaborative integrations with AI including GUI computer use. Maybe in 1-2 years.

For images, it was nano banana where I realized AI images can truly work, and all these adhoc issues like hands and limbs, or "it will never do horse riding a astronaut" were temporary. It's now clear that making feature length films is within reach. Not in one go but with an agent orchestrating, designing a screenplay, characters, shots etc and generating those. Whether the result will be worth watching or a flat story on the high level is another question. But it will be a "film" for sure.

cineticdaffodil 8 minutes ago

I think all those Steve Spielbergs hiding among the 8 billion - without connections and without hollywood names, having their day without getting filtered out by investor gremiums playing it safe - will produce enough material to be happy cineast for life.

jmkni 2 hours ago

Not coding, but reading logs.

I was trying to figure out a nightmare bug that only happened in production and Claude code was able to connect to Google Cloud and read the logs in real time

I recreated the bug in the UI and it was instantly able to see ion the logs what the problem was, then because it had the context of my whole codebase it was able to point me to the exact line of code causing the problem

That was certainly an "oh shit" moment

shreddude 60 minutes ago

I could go on and on, but Claude recently decompiled the firmware of my camper van, documented all the CAN interfaces, then programmed an ESP32 module to talk to the van’s integrated systems (power, HVAC, lighting, tanks). That sort of embedded systems integration is completely out of my wheelhouse.

I honestly don’t understand AI naysayers. I use Claude every day both professionally as a Solution Architect and personally in a variety of projects I simply could not have ever approached alone.

rvnx 34 minutes ago

I get it understand either. "This is just a stochastic parrot".

I suppose these people are lying so that they can justify their well-paid job, or they just don't know how to use LLMs or to prompt GenAI tools.

camel_gopher 4 minutes ago

It’s a probabilistic parrot

jazzyjackson 22 minutes ago

I’ll explain it: these tools are non-deterministic and people have different experiences with them. For a few people every interaction is totally fumbled and they think the cheerleaders of gen AI must be lying, for others the chatbot hits one home run after another and lets them add microcontrollers to their CAN bus. When these people’s good luck runs out and they start getting mixed results like the average user, they assert the service must have been down graded

dyauspitr 14 minutes ago

I still don’t get it I can dictate a prompt and sometimes I do it so quickly the text looks like a drunken parrot dictated it and it still always gets exactly what I’m asking for. I’m just going to attribute malice to the naysayers.

block_dagger 18 minutes ago

I wanted to add gapless playback to an audio archive website I maintain. I tried myself before any of the popular LLMs were available. I failed. I then tried with the first LLMs that came out. They failed. Then, when the first Claude Opus was released, it succeeded. I now have gapless playback.

chasd00 8 minutes ago

i was a skeptic and then, on a whim, i told claudecode to "create an app with a react front end and python api backend that delegates auth0.com and allows users to manage a todo list" or something like that. Like a standard issue web app with a database, backend, frontend, openid and all that. i was pretty impressed with the result.

Then i asked it to create a multi-user stock market portfolio simulator with a comprehensive api, leaderboard, scheduled tasks and the other bells and whistles. Again, fairly impressed with the result. Then I prompted it to build an trading bot that uses the API to compete with the human players, again fairly impressed with the result.

Last, i prompted my way through a react native mobile app integrated with supabase for my sister's startup. It created the schema, some triggers, webhook for stripe, all the app views, setup an expo account, push notifications, prompted _me_ through an Apple developer account and everything else.

All of this was done an hour here and an hour there while making dinner or watching TV, barely any attention paid to the details. Just prompting claudecode and checking what it did.

After those three experiences I started incorporating claudecode into all my coding workflows and managed to get my job to buy me a license for work stuff too.

solomonb 18 minutes ago

I gave chatgpt 3.5 the type signature for a co-algebraic encoding of a mealy machine:

    newtype Mealy s i o = Mealy { runMealy :: (s, i) -> (s, o) }

And it gave a really impressive analysis.

Then I scrambled all the names and asked with a fresh context like:

    newtype Foo z e g = Bar { blob :: (z, e) -> (z, g) }

It got completely confused and generated a bunch of non-sense. It was at that moment I realized that LLMs don't really understand anything.

And yes I understand that a newer model would not get confused by this.

mschaef 23 minutes ago

This is a small one, but significant to me.

I asked Claude to add support for multiple lights to my toy ray-tracer. It correctly added the support and then suggested adding colored lights to make it easier to diagnose. It felt more like a colleague making a useful suggestion than any sort of pure engineering tool.

rerdavies 12 minutes ago

Working on a Spice compiler to convert schematics for classic guitar pedals into real-time executable code.

I provided a reference to a The Spice Manual 2nd ed. a page number and an equation number, and asked Claude to implement it (not really expecting it to succeed).

It proceeded to implement not only the equation, but the calculation of the Langrangian of the functio, another 30 lines below, which required taking symbolic partial derivatives for a not-at-all trivial function, and successfully figuring out which variable was which in the resulting matrix. The source material just said "Lagrangian of", and did not provide the partial differential equations. And then providing a comment that identified the page number and equation number in the source text for the "Lagrangian of" equation.

mbo 36 minutes ago

Look, not to brag but DALL-E's "armchair in the shape of an avocado" was mine (https://openai.com/index/dall-e/). I remember trying to convey the gravity of this capability to my friends at the time, who I guess were not as impressed as me.

wps 23 minutes ago

Thats insane! I cited your image in a humanities paper during one of my freshman year classes.

maxwellg 17 minutes ago

Pre-GenAI I wrote a new interview question for a role on our team. As far as I know, the question was never made public. The interview required implementing a pretty basic CSS-in-JS utility in vanilla javascript. We instructed the candidate read the MDN documentation for the CSSStyleSheet interface, and then gave them a public API to implement. Passing implementations usually consisted of a ~10 line for loop, and was really just a test of whether a developer pick up and work with new libraries on the fly. Still, the interview probably had a 30% pass rate.

On a lark, I asked ChatGPT to complete the interview question in late 2022. I would have hired ChatGPT back then based on its first response! It was easily in the 90th percentile of responses I have seen.

KaiserPro 33 minutes ago

I've had a few.

The biggest technical one was when we were making an all day wearable AI assistant thing. It basically had really precise office location (think cm level accurate) a shitty VLM to describe what the wide angle lens was looking at, Speech to text, OCR and a gaze recorder that decribed what you were looking at.

This was all streamed to sqlite. The thing that was really "oh shit" what the thing that made the whole system usable: a 4 paragraph prompt that turned natural language into SQL and reported back to the (non technical user) what they wanted to know.

The most recent one is being caught out by Genai video of a gymnast. I worked in VFX so I am normally able to spot dodgy shit, but this one was close to being real, scarily real.

irthomasthomas 26 minutes ago

My most recent one: Taking a bricked ipad and plugging it into my linux laptop, then telling deepseek to fix it. A couple of hours and twenty sudo passwords later it was working again.

jmkni 22 minutes ago

That is very cool

mikewarot 2 hours ago

I tried to get it to generate code to program one of my BitGrid simulators, and it kept producing code that failed, over and over. It was then that I figured out that it can only do CRUD apps and the like, things it's seen over and over in its training data.

It's useless for most of what I want to code.

cheevly 25 minutes ago

GPT literally generates perfect code for me in languages that do not exist anywhere in its training set, so I’m not sure how you’ve achieved this level of failure.

jkraybill 2 hours ago

So many. First was when I saw GPT-2 create jokes that were original and kinda funny.

Most recent: I use Claude Code and have a convention where I grant various levels of autonomy during a session. I got bored recently and just let it keep running with an empty issues queue, essentially telling it to do whatever it wanted.

It did a bunch of repo cleanup, then it kept suggesting to end the session, but I just kept giving it autonomy prompts.

It started a creative writing public repo and wrote a bunch of stories, essays, and poems. I did not prompt it, at all, to do that. Some of what it wrote is quite good (IMHO).

hypendev 50 minutes ago

Back in the times of GPT3 text completion, right before the API came out, a contemporary art museum asked me to collaborate on a project. The project was supposed to include a chatbot, and I was like okay I can probably hook something up.

Then I remembered the "text completion LLM thingy" I saw on HN, and tried it out in the playground. Once I gave it an IRC style example of a conversation to complete, I was like hm, this could work. Then I figured out I could "sort" people into different groups based on personality using the same text completion engine and some answers they provided. Then I noticed I could have it provide me with JSON directly.

That's when I realized how big this could be for code and data analysis - even tried to convince an at the time cofounder to pivot into AI coding, but to no avail.

Once the API was released and the art project chatbot got launched (and the theater show associated with it, which even won some awards), people who used it loved the chatbot, got into heated arguments with it, tried to teach it things, talked about their lives and were sad when it didnt remember something.

That was when I understood the social impact this could have on people - they really behave like its a person on the other side. They show interest, think it displays emotion, try to entertain it, be polite, ask about its thoughts and hopes and dreams. And even when they knew they were talking to a machine, they were still trying to be friends and make it happy, which was quite beautiful to see.

Later on, I had a third oh shit moment - once the 3.5 API was out and about, I prototyped a Rust code generation harness for a client, akin to a primitive claude code. That was the "I'm getting a bit worried" oh shit moment, and it caused a lot of reflection and thinking about the future. And I happily welcome it.

Fomite 2 hours ago

When we had to have a frank discussion about whether to fail someone who obviously used an LLM for parts their dissertation.

sevennull 35 minutes ago

well?

anon373839 15 minutes ago

Mine was when I used Stanford Alpaca, and realized that they had transformed Llama 7B into a credible facsimle of ChatGPT with just $600.

hansvm 2 hours ago

A coworker had me work through a particular problem (some no-importance web demo) with Cursor and Sonnet 4.6. It still sucked, but there was a qualitative shift in suckiness, one that I realized could finally be used to solve some real problems I had if I wrote an appropriate harness and used good enough models.

I still find it mandatory to write a lot of kinds of code by hand, but I write a lot of code with agents too now, and I previously literally didn't think that'd happen in <5yrs.

bachmeier 22 minutes ago

> that you went from those quaint, dismissive observations to a slightly panicked, "Uh Oh" realization of what these models can do?

Never experienced any kind of panic, only excitement. I told Github Copilot to add documentation to a function and it documented how the code was used even though there was nothing in the function to indicate how it was used. It somehow knew from the code pattern why I was writing that function.

wps 14 minutes ago

Nvidia GauGAN and deep-daze amused me immensely at the age of 14 or so. I've had "a man painting a completely red image" saved for a long time.

It is insane how primitive modern inpainting and txt2image make these two projects look.

ilaksh 24 minutes ago

OpenAI already had GPT prior to the ChatGPT launch, and I had not really taken it seriously. But on November 30, 2022 when ChatGPT came out and was immediately popular, I reevaluated it.

I immediately realized that it meant my time as a programmer in the traditional sense was going to come to an end relatively soon.

On December 1, 2022 I created my first agentic coding loop experiment. I launched one of the first AI code generation websites that would generate web pages along with embedded images in January 2023.

vunderba 15 minutes ago

Honestly? Probably all the way back to when Nick Walton used the computers at his university to train a custom version of GPT-2 that let players experience a completely open-ended text adventure game in 2019.

As somebody who as a kid had tried feeding IF transcripts into a markov model to generate random rooms for an amateur MUD, this was mind-blowing. It felt like I was playing a version of the “Mind Game” from Ender’s Game by Orson Scott Card.

https://en.wikipedia.org/wiki/AI_Dungeon

hannahstrawbrry 2 hours ago

Had an issue in a project where multiple media files with the same/similar names were colliding. After spending hours with chat gpt wrangling python scripts to try and sort it out programmatically, I shifted gears and built a web tool that would allow me to manually review the content and select the correct media file to associate with it in about 5 minutes, allowing me to comb through and finally fix the issue & verify the content was correct in about an hour. It made me realize I needed to completely re-think how I set about solving problems now that I have an entirely different set of tools to develop- that has been the biggest "Oh shit" moment for me, looking into the mirror and recognizing how AI will re-shape me as a developer.

enraged_camel 2 minutes ago

Opus 4.5 helped us with a very complex data topology refactor and migration. Instead of the five month timeline we had initially allotted for it, we finished it in nineteen days.

dang 2 hours ago

(1) Watching it do log file analysis in seconds that would have taken me hours (edit: days, in fact), and which I would therefore never have done in the first place.

(2) Helping me with optimizations that I had been putting off for years because they involved learning curves that I never had time to take on.

(3) Tracking down bugs in code, especially race conditions and other concurrency issues, that were otherwise baffling.

(4) Finding information that I had been unable to find using Google searches (e.g. https://news.ycombinator.com/item?id=42653136).

There have been others, but those are what come to mind - perhaps because, in each of these cases, it made something happen that would otherwise never have happened - not because it was impossible, but because the level of effort required was prohibitive.

_0ffh 12 minutes ago

Didn't have one. I was convinced I would experience this since I was a teenager. Blame science fiction if you will.

briga 57 minutes ago

Maybe when I found out you can use it to run terminal commands, spin up and take down dev environments, and even run other LLMs. Suddenly 90% of the difficulty of onboarding to new repos disappeared overnight and a lot of heavily CLI-based workflows became trivial to automate. Never again do I want to spend hours manually sorting out Python dependencies.

1qaboutecs 38 minutes ago

Was trying to explain convolution (of functions) to a friend and I wanted to build a little picture. I typed more or less nothing into Claude and it gave me a fine web-app for demo'ing examples to my friend within minutes.

Three years ago this would have taken a minimum of three college graduates a couple days -- one to know the math, one to know the backend, and one to know the front-end. Maybe two of those could be the same person on a good day -- none of the topics is individually that hard -- but it's a lot together.

cheevly 27 minutes ago

Ever since the first Davinci model of GPT-3 ive literally been using LLMs daily. It was an indispensable tool for me from the very beginning and despite 10,000+ hours of usage and research, I still feel like ive barely cracked the surface of whats possible with current genai tech.

nickandbro 15 minutes ago

When I was making matplotlib charts with gpt 3.5, and I was like okay this is somewhat impressive

bluejay2387 2 hours ago

I had a locally hosted model write its own semantic search system that indexed 250,000 documentation and code files and then write a fully functioning mod for one of the games I play based on that documentation that I couldn't get to work after 2 weeks of my own effort, all in under 4 hours (and that included a 25 minute long indexing process). This freaked me out enough that I then had it write a CLI based activity and TODO tracker and then integrate that tool into its coding process to track all of its activities in about another 2 hours. I am still emotionally recovering from this day. I have since replaced the semantic search system with an open source option (though I used it for a few months) but I still use the activity tracker for both coding projects and myself.

gravypod 25 minutes ago

What mod did you build?

bluejay2387 6 minutes ago

A mod that fixed a bug that prevented certain buffs from working when mounted for the Magus class / Arcane Rider archetype in Pathfinder Wrath of the Righteous. It also managed to fix the problem with Shelters not providing protection from corruption when resting in outposts in that same mod. I've used other models to expand the mod to an entire mini-expansion with new Archetypes and abilities since then.

hereme888 19 minutes ago

Creating a functional python app with zero programming knowledge, back in the days of GPT 3.5.

That was enough to awaken my teenage hacker spirit.

EliRivers 60 minutes ago

Code reviews. Code reviews in theory done by humans, but containing copy-pasted inane statements of the obvious. Questions that really did no more than demonstrate a lack of context. Code reviews no longer an educational opportunity for the reviewer, a way they learn and stress their own understanding to create a better product and become a better person, destroyed by the siren song of GenAI producing comments that on the surface seem so helpful and sensible.

"Uh Oh" realization of what these models can do?

The code reviews was just how I first saw it, but the rot goes deeper. The "uh oh" was my realisation of how much these can damage people's professional development. These people will never get better at their job than they are right now.

A lot of what else GenAI does is great, but this is an "Uh oh" indeed.

gravypod 21 minutes ago

I work with someone who is very AI-forward, high confidence, and very low execution. He has started sending me large PRs of AI slop that he assured me doesn't need to be reviewed. I quickly find many minor issues from an initial pass of one of the reviews. He gets mad at the team for slowing him down.

He also will paste chat logs with Claude into our team chat. Often Claude will say the same thing I told him but he either doesn't remember or doesn't trust human engineers now.

He has spent months working on agent skills and prompring.

He has not landed anything in 3mo, and has landed nothing useful in ~1 year.

This will be the rest of my career. Working with people in ai psychosis and trying to stay productive.

peteforde 9 minutes ago

What's funny about this is that it sounds like your coworker reviews his LLM output roughly as well as you read the other replies before assuming that this was an anti-LLM pile-on thread.

knuckleheads 2 hours ago

I remember a couple months after ChatGPT came out I was in a 1-1 with a coworker who hadn’t really played around with it much. I was very much toying around with it and was surprised at how good at stuff it was. I wanted to show him it was for real, he was skeptical, so over a half hour we had it make a bee and a flower buzz around in d3, copying and pasting between jsfiddle and ChatGPT. By the end of it, we had a nice animation and were both throughly surprised that the computers could code so well now.

geuis 5 minutes ago

For me it wasn't "oh shit" per say, but "oh wow".

Some time in 2024 at a company get together, we had an afternoon hackathon. There was a feature in our iOS app that was missing (ability to mute autoplaying game trailers). This annoyed me a lot, because I frequently have music on when working and anytime I needed to open a test build it would kill my music. It had been an open ticket for a while but had low priority for the iOS team.

I had probably written a hundred lines of Swift in my career up to that point. Not expecting anything to come from it, I had Cursor examine the iOS codebase and told it I wanted to add a mute button under a certain area of the app settings.

Blew my mind when after only 10 minutes or so, the model had quickly found where to add the feature. Took a little back and forth, but then it added a fully functioning mute option in settings that mostly worked across the app. A little more back and forth, and those issues were settled. Maybe an hour overall of time spent that afternoon.

I pinged one of the iOS engineers about it later and he said to push it up for review. There were a few things that needed to be updated to get it inline with the rest of the codebase, but nothing substantial. Feature got merged a week or two later.

Now I'm way more productive than I have been in years. I've been getting a lot of enjoyment out of being able to prototype rapidly and experiment on features rather than getting bogged down in the process of scaffold work. Able to knock out issues much quicker.

That's all been positive, but it hasn't taken away my actual core responsibility. The LLMs can give you great advice and write code quickly. But they still don't always do well at broad thinking.

Current case in point: I've been working on an iOS app that uses vision models to do work on photos and videos that the user has taken. I've built text-based semantic search systems before, and there's a lot of cross over with vision models, but its been an interesting journey so far learning about the different types of vision models and what they're good at. Lots of testing so far and educating myself on the topic to get the user-level features I want. Claude code has been invaluable in this, as its great at writing the Swift code while I'm able to focus on the results of what is being done.

Where Claude is still not good is being able to reason at a higher level about different strategies on using vision model outputs to achieve the stated goals. Its not an issue of me not clearly defining the specifics of a feature and then letting Claude run off burning tokens to figure it out. For example, just late last night I was deep diving into some core segmentation code and having Claude explain what everything was doing line by line so that I could get a better understanding of the mechanics of the vision model.

A side effect was that I realized the vision model was outputting tons of nearly identical segments that were overlapping. This was something Claude had completely missed, and because I didn't know that's something this particular vision model did I had no prior way to know to catch it.

Bottom line is that understanding the mechanics of your application is still very much a requirement for the engineer. In this case, once I learned what was happening it completely changed my approach on how to achieve my feature goal. The code runs hundreds of times faster now and the segmentation is much, much better.

The new wave of coding models is disruptive, but its letting me be a much better engineer and get things done faster and with more assurance that the code being written is solid. I still have to spend the same amount of time thinking and learning about a problem, and probably more time verifying what's being output, but a lot of the drudgery is also being taken away.

bag_boy 2 hours ago

I had ChatGPT write up a Zillow description for my house in the style of Carrie Bradshaw from “Sex and the City” to impress my wife.

It was unlike anything I had ever experienced.

My wife was unimpressed lol.

This was 2022.

twooclock 59 minutes ago

I programmed data export to some xml over a couple of days. Sending xml results via email to an accounting firm for verification. A day after I finished my disk crashed and I lost all my code. Fed Claude with xml from my mail and... oh shit! ... got "my" code back. (And immediately paid for Claude subscription) :-)

steren 2 hours ago

The moment when I ran llama on my old gaming PC (using something called ChatGPT4All) was my "oh shit" moment: I was now talking... to my PC.

adammarples 22 minutes ago

Struggling to do named entity recognition, with lots of tagging by hand, and then seeing BERT just being able to straight up answer questions about a document. Had to sit down after that because it was past anything I could even understand.

ieie3366 12 minutes ago

I'm a terrible cook, but just by using Claude as a tutor I've managed to make 5 different recipes in a row and they all tasted fantastic, restaurant quality.

sct202 2 hours ago

One of our SAAS providers launched an AI agent enabled version, and it can follow direction and do tasks & manipulate data/settings in the software like on par with a below average person. When I used it I had a sinking feeling, tons of teams and people will be redundant as these agents improve and roll out to other software.

oidar 59 minutes ago

Opus 4.6. My standard battery of questions included solving an ascii maze (20x20 grid) without using a script, using only "thinking" as a tool. It was the first model to be able to solve it. It was the first model that really appeared to be able to reason spatially.

orzig 60 minutes ago

"Write a bible verse ... explaining how to remove a sandwich from a VCR" https://x.com/tqbf/status/1598513757805858820

moconnor 2 hours ago

Literally the very first time I used ChatGPT. I had already been experimenting with GPT3 for various jokes and games via the API but the naturalness of it as a chat interface that understood you changed everything.

The first time I used a terminal agent was another one.

jiggawatts 10 minutes ago

I reverse engineered a proprietary network protocol from a vendor binary (compiled C++) and a short sample network capture.

The agent had access to the NSA Ghidra disassembler, which it can control shockingly well.

I just clicked the “Allow” button a lot and eyeballed the output decoding quality. I felt like I got demoted to non-technical QA.

goldenarm 2 hours ago

The first SORA release truly scared me. The uncanny valley of simulating life like this still creeps me out to this day.

nsikorr 43 minutes ago

Definitely the first NotebookLM podcast I generated.

overgard 2 hours ago

I feel like with the hype cycle and constant publishing of sketchy claims that I pretty much daily have an "oh shit" moment followed by a "nope, everything is about the same" moment. It's frankly exhausting. It's hard for me to recall a subject that has irritated me as much over a period of years, and it's barely even about AI itself but instead just feeling harassed with the constant anxiety and rage baiting.

skyberrys 56 minutes ago

Pretty good take. I don't really get the feelings of anxiety, but sometimes I'm working and I'm like I'm flying this is so fast! And then everything comes crashing down when I can't figure out one last bug.

tripledry 2 hours ago

I felt the same way, then I started with "I'll believe it when I see it". Now I'm a bit happier.

refulgentis 2 hours ago

Using GPT-3 to translate the color science code I wrote for Google's design system from Dart to ~any language so I could get it deployed cross platform quickly, and it all worked.

kgwxd 2 hours ago

When it started being forced on me in tools I was already using begrudgingly.

utopiah 2 hours ago

When none of the models, STOA or not, could answer any genuinely interesting question. All models could regurgitate was has been expressed before but nothing actually new was there, until explicitly asked for, and even then it required filtering through potentially so much noise it was practically not interesting anymore as it required all the knowledge to validate or invalidate the claims. That's when, few years ago, I realized "Oh shit... despite all the tremendous effort and resources, it's still not that useful.". Honestly this was NOT was I expected. Yet, it was an important realization.

utopiah 2 hours ago

Related but distinct, few years later I asked an acquaintance to ask a question to a model. I didn't want to bias the test so I ask them to ask whatever they wanted. They asked "What time is it in Sri Lanka?" which I thought was a funny question. I predicted it wouldn't work because it was asked to an offline model so I thought it wouldn't manage to get current data. Still, I didn't interfere and we watch the answer being provided. It was roughly factually correct information about Sri Lanka... but it did not give the correct time. Again that's a rather basic question a young child would easily get right. You need the current time with a known timezone, the time difference, basic arithmetic and voila, you have the correct answer with an explanation to verify. Here it didn't work and I was there trying to explain how to STOA open-source model which required thousands if not millions in resources, training time, researcher salaries, etc could not even handle that random basic question. Another "oh shit" moment, again, not the one I expected which is precisely why to me it was, and still is, interesting.

Rumudiez 26 minutes ago

"I couldn't remember the order of the words in 'state of the art' so I just spray and pray across the keyboard like usual. I can't tell the difference because I'm just a pattern matching bot"

riebschlager 2 hours ago

"I googled 'what is my bank balance' and it couldn't even tell me. What a waste of resources."

utopiah 2 hours ago

I didn't mention resources here.

The point of the test was to ask somebody with no bias on HOW the result was produced.

Smaug123 2 hours ago

A few years ago, as you say, this was true. Nowadays I guess you just have to bite the bullet that Erdős problems aren’t interesting.

utopiah 2 hours ago

I already commented on Erdos problem, that is also a jagged frontier.

aspenmartin 2 hours ago

Curious what your interesting questions were, you should be able to find them in your chat history.

utopiah 2 hours ago

That was more than a decade ago so unfortunately not. I should have kept those questions though. I even mention in a comment on HN a while ago that unanswered or wrongly answered questions should precisely be a batch test when new models are released.

poly2it 2 hours ago

What? What LLM were you using a decade ago? Am I misreading you?

utopiah 2 hours ago

You might not be aware of it but GenAI predates OpenAI which was founded more than 10 years ago anyway.

poly2it 50 minutes ago

Of course I am aware, but how is this relevant today? How does that prove that the science is irrelevant and wasted?

HDThoreaun 19 minutes ago

No. GenAI means LLMs right now. I agree it didnt in the past, but definitions change.

aappleby 2 hours ago

Are you sure you're asking the right questions?

utopiah 2 hours ago

To me they were important questions. Maybe totally interesting to you.

bigyabai 2 hours ago

What question?

utopiah 2 hours ago

I can't recall but basic stuff like P = NP. /s

My point was preciously to challenge STOA in domains, not questions with well known answers.

jmclnx 39 minutes ago

Non-technical people I know are starting to take AI responses to their questions as 100% true fact.

Baeocystin 23 minutes ago

"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

--Charles Babbage

Blind trust in the machine for a certain type of user seems to be endemic since the beginning.

SoftTalker 35 minutes ago

They did the same with Google search results that were just SEO garbage content, too.

dyauspitr 13 minutes ago

It’s usually right. This isn’t as big of an issue anymore.

dyauspitr 2 hours ago

I was trying to replace my koi pond pump last weekend and the model numbers on it had washed away. I took a picture of it and it immediately narrowed it down to two models but wasn’t sure if it was the 4500 model or the 2500 model. I asked it how I can determine which one it was. It then asked me to measure the length and that the 4500 was 11 inches and the 2500 was 9 inches. Mine was 11. It was cool it was able to reason that out and give me something actionable.

It’s kind of a trivial example but there are multiple instances of this per week with the wide variety of things I do around my property.

nrjames 14 minutes ago

Ha! I did the exact same thing about 2 months ago. It saved me a lot of headache and research.

dyauspitr 10 minutes ago

I got quoted $700 by the pond guys to replace it. I ended up buying it for $109 bucks and replacing it myself. It honestly would not have been possible without ChatGPT because I had nothing to go off of and the pipe connection was really specific to that model.

spwa4 2 hours ago

When I wrote a captcha cracking convnet in 2000 and tested it ...

And in 1 out of 5 runs it beat me.

zhoBEENG 20 hours ago

It was when I first saw an LLM reliably make tool calls to bash.

LargoLasskhyfv 16 hours ago

The smallest Deepseek R1 8B, running locally on CPU only, casually mentioning Efinix Trion FPGA fabrics while discussing technology mappings for different substrates of different vendors in the context of partial dynamic reconfiguration.

WTF?!

simsation 20 hours ago

When I saw a very basic mockup of a website and realized AI could generate the entire page from it (this was shortly before ChatGPT came out)

SpecStudioHN 15 hours ago

when ChatGPT was released. LLMs went from being a toy to a serious creative tool overnight.

boredhedgehog 2 hours ago

"Translate this poem. Maintain meter and rhyme."

bigyabai 22 hours ago

BERT, then GPT-J/GPT-Neo and FLAN-T5

damnitbuilds 21 hours ago

My "Oh shit" moment was when my boss got the bill for me trying to vibe code a bugfix.

saadn92 2 hours ago

I use claude code on a daily basis, but honestly it becomes more annoying the more I use it. Why? I think because I ask it to do something and unless I'm extremely specific, either the code is verbose or the feature I'm designing is done in a poor way. For me, the productivity gains aren't that great and I'm even considering whether to go back to doing things by hand to save myself the frustration. Sure, if you don't care about code quality or scalability, it's a great thing to generate code. And yes, there are times when I don't, but for real projects, I actually do because I know as an engineer those things do matter in the long run. So, to be honest, I still haven't had that moment.

tripledry 54 minutes ago

From a technology perspective LLMs are absolutely bonkers, blows my mind it works as well as it does.

From a programmer perspective, I'm starting to like it less and less. It's useful for sure, but doesn't really live up to the hype. In many ways it's the opposite, my bet is still that programmers will be in high demand in the not so distant future after all of this settles.

Might be wrong, time will tell.

pythonaut_16 2 hours ago

It has seemed to me that with each step from Opus 4.6, to 4.7 to 4.8 Claude has gotten worse at building good solutions. Like perhaps it is more "capable" in the small scale than 4.5 was but it's much worse at knowing what to do.

aleksandre_dev 23 minutes ago

[flagged]

thatsayanfr 2 hours ago

[flagged]

4k0hz 2 hours ago

[dead]

abstractanimal 32 minutes ago

When I realized that an LLM can process all the traffic in Slack that overwhelms me daily and give me a manageable digest. How long until they intermediate most of our social interactions? Sooner than we can possibly adapt, I think.

jazzyjackson 27 minutes ago

If you social interactions can be mediated by a chatbot I implore you to find better social interaction

cheevly 24 minutes ago

If yours cant, then I implore you to find better AI mediation tools.