I’ll have to try again later on desktop. The content looks interesting but it’s literally impossible to read. I cannot get past the section that introduces Ernst and Young.
It might "work" just fine on mobile (or not) but you may have stopped trying before reaching the point of re-scrolling, because it's insane.
Coming back later on desktop, I see that the percentage keeps climbing the further you manage to make it down the page. The real stat is 60% of the citations were hallucinated.
Some people should not be allowed to make a website.
Now nobody will remember or notice.
It's unsurprising that trying to do more with less results in lower quality.
There may be a lot of demand for do-nothing services.
A lot of corporate work is just do-nothing box-ticking.
Boss: get me a report about X, so I can give that report to my boss who won't read it.
You: E&Y, please get me a report. Here's $200k.
If you ever needed evidence to not buy “advice” from such outfits, this is exhibit one.
Hopefully they at least fired the partner that published this steaming pile of AI slop.
I think their audit work is in a downwards spiral. Audit has become so competitive that they are struggling to find ways to make it cheaper. They have become slaves to reducing the hours booked, and the rate of those hours. To do this they substitute less experienced people all the time. You used to be able to chat with your partner about an issue you have coming up, now you get their assistant if you are lucky. By chasing 'efficiency' they have lost their value-add. Now the first time the partner has looked at your file is right before the clearance meeting, and they spot issues that should have been picked up earlier and tested on the day you should be signing. So you end up doing it all again. I'm trying to coin a term for the inneficiency caused by chasing efficiency.
Some things stuck out at me: - They were all in their early 20s. - They were all incredibly checked out. Honestly they still seem like an outlier to me decades later. - They partied hard. Yes, with drugs. - Most of them were in rotating intimate relationships with each other and unusually open about it. Office scuttlebutt was literally "who is fucking who this week". - They seemed busy for maybe two or three weeks out of the entire year and then it was long stretches of Minesweeper/Solitaire.
I filed this away in my head as "provides no value" and that was decades ago. If the industry itself is worse off today I can't imagine how much worse it actually is from my experience.
"don't let the perfect be the enemy of the good" ?
Penny wise, pound foolish? Measure twice cut once?
I had no experience and knew absolutely zero about any of those sectors.
I don't know but I would expect it to be realtively easy for an LLM to detect "hallucinations".
Yes, this technique and its variations[1][2] "work" but it's still not 100% perfect. And it's not as widely used it might be because, among other reason:
a. it takes longer to implement
b. it costs more (more tokens spread across multiple llm calls)
c. higher latency (getting an answer takes longer due to multiple llm calls involved)
d. the final answer is probabilistically more likely to be correct, but is still not guaranteed to be error free, so you can never fully escape the need for Human in the Loop.
I think this may be part of the problem. The actual humans creating the report don't have the expertise to know which one to trust. At least that was what consulting was like in my experience at a similar firm.
I don't know but I would expect it to be relatively easy for an LLM to detect "hallucinations".
But I guess since EY is a CYA hedge anyway, no one really cares about whether the reports are hallucinations or not. Someone high up spent money on EY, so that they can justify some decision and won't be held responsible that much, when it turns out the decision was shit. All that matters to them is, that it has the appearance of something genuine and then they can base the decision on what they receive from EY, which better be what they already wanted to hear/read anyway.
Even people who like their jobs work because they need money to live.
okay that makes me feel better, I think January's frontier models and beyond are better at this
but check your sources folks
~ A greedy, dishonest and unethical capitalist.
Any person with above average knowledge on a specific topic, can tell when AI starts hallucinating and making things up, or at least introducing new problems due to complexity added rather than solving it, that’s my observation using all top tier ones too, it’s like they are designed to solve a problem regardless so they start making things up or piling workarounds, a person with no deep knowledge in that topic will just copy it all and call it a day.
Just yesterday, I asked claude 4.8 on something specific that I know the answer for, it had a long list of solutions that none were close to the real answer, when I replied with the real answer and pushed back, I got the famous quote “you are right, thanks for pushing back”.
Performative executives of yesteryear that constantly need external validation and direction and operate through hive mind and groupthink are weak and will die.
I believe some of the biggest problems in today's business leaders are an inability to be open to new information, to think across traditional professional boundaries, or to ask meaningful questions.
AI simply exposes this unapologetically.
Bad management (this includes most government): up your game or get out of the way.
Sycophantic consultant firms: die.
The Economist should do an article on this.
In many cases the skills are available in house to do the necessary vetting, but these people are already overwhelmed with their existing day to day.
Anyone remember that item a few months back about Amazon now having senior engineers vet generative AI output (https://news.ycombinator.com/item?id=47323017)? I had to LOL when I read that. These folks are already slammed. And the idea that Amazon would allow human bottlenecks to multiply across projects and underlying infrastructure development is ridiculous.
I'm pushing the need for basic engineering principles across whole organisations.
You wouldn't give an engineer 1000 lines of code to review without the original spec of what you're trying to achieve for context (at a minimum, ideally the reviewer was in the room when the work was introduced, and has full context).
So, these docs, they're given as an all or nothing.
Do you push back on the 39th metric that is defined to the utmost detail? Or just resign yourself to the fact that it is what it is?
A one (6 is the goto if we're talking Amazon?!) pager.. "this is what I am proposing" at least gives the skeleton of the idea to push back at the general shape of the idea, refine it, before all the emotional investment of your precious report being complete.
Y'know.. the traditional product running through the spec in a SCRUM* environment.. the engineers doing proper code reviews..
* Yes SCRUM is dead, but that's another thing.
Not fully baked, worse: made to sound confidently correct, orthogonal to its actual correctness.
You mean the people they fired and demoralized?
One of the things that "great [wo]men" like about "vibe-coding" (and that includes blindly producing non-code product), is that they, and they alone can now do what used to require the painful process of "passing it to context experts."
Now, the LLM is a "built-in context expert," and they don't need to vet the output anymore.
Serious orgs are going to have to figure out the human layer. It will be needed, no matter how 'hallucination-free' the AI tooling gets. AI will still have some spectacularly bad fuck ups or even worse time bombs that get embedded in a system and don't become apparent until months or years later.
A lot of this will be dumped on existing staff with predictable results as they don't have the bandwidth to do it right. I can envision "output compliance" or "AI QA" becoming dedicated positions at many orgs. It's clearly needed.
Once the hallucination rate drops below error rate of human workers, it won't be needed anymore.
With AI, I have to read through everything, often explain why it's wrong, and then rewrite everything anyways. I mean, I get way more billables, but I think it's symptomatic of how AI loses its advantage of being quick and accessible to those who don't understand the subject matter.
It wasted many people's time, probably an order of magnitude of time wasted (and money) than if the initial person put a modicum of effort into making it right in the first place. Instead they hand it off to their life partner claude and just assume its good enough.
It's to the point where I am feeling insulted when I get ai slop like this from people. If I am expected to perform at a high level then I expect that at the very minimum the slop throwers will proof read their slop.
I find that if Gemini Pro agrees with Claude Opus 4.8 and GPT 5.5 on something, it's almost certainly correct at a level where I wouldn't be likely to catch any errors myself.
And making a ton of corrections to a document everyone was hoping was ready to go is never fun politically.
In my experience the most effective work pattern for me is using agents to perform research and feedback on high level design, then I write the code manually, then I ask the agent to review the code for potential bugs/issues and fix those. The agents have a much easier time making small changes once the design is 90% there without going fully off the rails and generating slop.
I am working on writing skills to make the agent better but it is a bit painstaking. For example I had to write this inside of a skill because sometimes the agent would just stub out methods and leave TODOs: “always fully complete the requested task before finishing edits unless input is needed”.
Of course, it's pretty much impossible to hear a dissenting point of view today and everyone is going crazy on these drugs. I might be hilariously wrong but I think this is the best time to start a software company.
I think its the perfect time to be contrarian - think about it. If youre wrong - So what? The world will have changed for everyone in the field. If you are right? You stand to be positioned to win big financially whilst everyone elses brain is rotting away.
I do the second approach for coding with smallish steps and the output is fine
I can’t cite “from scratch” for something outside of my knowledge but I side LLM training or assisted search.
This is an interesting topic. We treat vetting output the same as doing the work ourselves, but that is not the case.
Doing the work is not the same as reviewing work done by others.
I have heard reports of software engineering companies that have gone full agentic. Their seniors only review stuff written by LLMs and it burns them out, because they have to switch context constantly.
I find this interesting because part of being a senior developer is that you are experienced enough that you won‘t make grave mistakes anymore. This is the case in many professions: you are relied upon to not make grave mistakes.
But those same people are now swamped with stuff that they are not able to review, so they will let a grave mistake slip through at some point.
So they really can‘t trust themselves anymore?
The problem is that output sometimes take longer to verify than to create in the first place.
That turns AI into a deeply negative ROI system for many applications.
Yeah probably not for the same reason I left VFX rather than have a lifetime of completely disregarding my own generative creativity and cleaning up LLM-generated bullshit. Fuck that. Double-fuck creating ‘content’ to train the models.
In code, LLMs automate away a lot of the drudgery. I wasn’t sad to avoid spending a couple hours looking up the usage patterns and idioms for some ported library, or do some rote task that didn’t make the project significantly better. In most other jobs, they automate away the only fun part and leave humans with all of the drudgery.
The tech industry has always been arrogant to some extent, but assuming the world of talented professional knowledge workers and creatives would be content to professionally proofread, apply lipstick to pigs, and polish turds is a whole new level of out-of-touch. I’d rather live out of my car and dig through the garbage for bottles with deposits.
Why?
So if they're having humans proofread what the AI produces, they must have found that to be necessary.
I think a lot of the time it's just pure laziness. AI gives people a magical "do all the work for me" button and it can bring out the worst in them.
Some people are given the button and really do not care.