This makes me think of checklists. We have decades of experience in uncountable areas showing that checklists reminding users to question the universe improve outcomes: Is the chemical mixture at the temperature indicated by the chart? Did you get confirmation from Air Traffic Control? Are you about to amputate the correct limb? Is this really the file you want to permanently erase?
Yet our human brains are usually primed to skip steps, take shortcuts, and see what we expect rather than what's really there. It's surprisingly hard to keep doing the work both consistently and to notice deviations.
> lower rates of fact-checking and reasoning challenges
Now here we are with LLMs, geared to produce a flood of superficially-plausible output which strikes at our weak-point, the ability to do intentional review in a deep and sustained way. We've automated the stuff that wasn't as-hard and putting an even greater amount of pressure on the remaining bottleneck.
Rather than the old definition involving customer interaction and ads, I fear the new "attention economy" is going to be managing the scarce resource of human inspection and validation.
But the temptation to short change this step when it becomes the bottleneck for shipping code will become immense.
This is exactly what I worry about when I use AI tools to generate code. Even if I check it, and it seems to work, it's easy to think, "oh, I'm done." However, I'll (often) later find obvious logical errors that make all of the code suspect. I don't bother, most of the time though.
I'm starting to group code in my head by code I've thoroughly thought about, and "suspect" code that, while it seems to work, is inherently not trustworthy.
- how many data sources it has access to
- the quality of your prompts
So, if prompting quality decreases, so does model performance.
- Charles Babbage, https://archive.org/details/passagesfromlife03char/page/67/m...
EDIT: This is a new iteration of an old problem. Even GIGO [1] arguably predates computers and describes a lot of systemic problems. It does seem a lot more difficult to distinguish between a "garbage" or "good" prompt though. Perhaps this problem is just going to keep getting harder.
> But we also find that when AI produces artifacts—including apps, code, documents, or interactive tools—users are less likely to question its reasoning (-3.1 percentage points) or identify missing context (-5.2pp). This aligns with related patterns we observed in our recent study on coding skills.
Well, sure. If you're asking the AI to produce artifacts directly, it's likely because you pre-judged yourself less competent to do that kind of analysis.
What it notably does not correlate any of these these behaviors with is external value or utility.
It is entirely possible that those people who are getting the most value out of LLMs are the ones with shorter interactions, and that those who engage in lengthier interactions are distracting themselves, wasting time, or chasing rabbit trails (the equivalent of falling in a wiki-hole, at the most charitable.)
I can't prove that either -- but this data doesn't weigh in one way or the other. It only confirms that people who are chatty with their LLMs are chatty with their LLMs.
In my own case, I find the longer I "chat" with the LLM the more likely I am to end up with a false belief, a bad strategy, or some other rabbit hole. 90% of the value (in my personal experience) is in the initial prompt, perhaps with 1-2 clarifying follow-ups.
Claude is meant to be so clever it can replace all white collar work in the next n-years, but also “you’re not using it right?” Which one is it?
In my experience good prompting is mostly just good thinking.
In a strange way that's exciting, because it forces me to learn. And sometimes forces me to confront whether stuff I had was domain knowledge or portable as experience.
Do we?
How can you not think that makes you sound like a complete moron?
The general lack of intellectual curiosity is just mind blowing to me.
How's that working out for you in the context of working with AI tools? Do you feel like it's helping you make better use of them? Or keeping your mind sharp?
I've been considering getting some books on core topics I haven't (re)visited in a long time to see if not having to write as much code anymore instead gives me time to (re)learn more and accelerate.
At which point, if the evidence turns out to be negative, it will be considered invalid because no model less recent than November 2027 is worth using for anything. If the evidence turns out to be slightly positive, it will be hailed as the next educational paradigm shift and AI training will be part of unemployment settlements.
That's not, IMO, a "skills go down" position. It's respecting that this is a bigger maybe than anyone in living memory has encountered.
> is likely to improve at what they do
personally, my skills are not improving.
professionally, my output is increased
The olden days of buidling skills and competencies are largely dying or dead when the skills and competencies are changing faster than skills and competency training ever intended to.