This is perhaps among the best openers I've ever read.
[spoiler: the tech caught up, the data is interesting]
I read a lot. This article, entirely.
"Some day, AI will be able to sort this out."
Now I'm just waiting for the token costs to come down ;)
IMHO the worst part of running any business/project is the paperwork...
My taxesforms still get struck typewritten. For 2022, I thanked ChatGPT (lol). Audit me, I don't care (you will).
Also ignoring the benefits of subscriptions, an estimate in the magnitude of thousands of dollars for extracting egg prices still makes me feel like we aren't "there" yet. This should have been a problem with a much more efficient solution given the advancements in the AI, data analysis and OCR space. I am sort of disillusioned.
There's got to be a "it's a chicken/egg problem" joke in there somewhere, but i'm not seeing it.
Well, I guess you cannot make a chicken joke without breaking some eggs (I'll stop now. I'm really sorry, but come on, it's Sunday).
Less than one per day, assuming they're doing groceries only for themselves
So instead, I made a very simple UI where you just key in the amount (literally 5 keystrokes on average per image) and it finds the matching charge (or hit enter to instantly cycle through all matches). I've done bookeeping/taxes that way for a decade and keying has never been the bottleneck.
Recently I realized Amazon accounts for around a third of my credit card charges, by volume (yikes!). Unfortunately their transactions are more difficult to reconcile as portions of orders are charged piecemeal as they ship. Further, their webpage that is supposed to list your credit card charges with the matching order numbers is broken (lots of data missing - have reproduced and filed a bug report with their exec team which is still being worked on a month later).
So I wrote another tool. You download your order data and invoices via a personal data request, and it goes out and reconciles all of them. I wind up with a nice spreadsheet i can scroll around in, and whenever the cursor hits a row with an Amazon charge all the paperwork along with a generated order summary (granular down to the shipments and items) comes up on the screen to the right.
Pretty slick. And took less time to code up than his vibecoded project (but hats off to him anyway, sounds like a nice little project to hone your AI skills on). Sometimes these simple little bespoke tools are a far superior "productivity force multiplier" than fancy, generic commercial equivalents.
My guess is that it's the entry-point to OCR and the internet is flooded by that, just like pandas for data processing.
Leaving a comment so I can more easily find this
And for the people wondering about Pandas, use Polars instead
[1] https://www.npr.org/2025/03/18/nx-s1-5330454/egg-shortages-r...
[2] https://www.globalproductprices.com/rankings/egg_prices/
[3] https://farmaction.us/farm-action-calls-for-an-investigation...
https://en.wikipedia.org/wiki/Franchise_(short_story)
the protagonist is interviewed as a one-man "focus group" in lieu of a national election and one of the questions he is asked is "What do you think about the price of eggs?" and he said roughly "I have no idea, my wife does the shopping."
Eggs were actually quite stable for the 20 years prior to 2001, so maybe don't put your life savings into egg futures...
Egg prices: https://fred.stlouisfed.org/series/APU0000708111
CPI: https://fred.stlouisfed.org/series/CPIAUCSL
Core CPI (without food + energy prices): https://fred.stlouisfed.org/series/CPILFESL
I read that the price includes free range, eco, etc varieties which are more expensive and in more demand nowadays, probably just that explains a good chunk of the price increase.
https://www.thebignewsletter.com/p/hatching-a-conspiracy-a-b...
Reading through them, I wonder why CPIs aren't based on empirical correlational patterns between prices over time? Sort of like in these articles:
https://iopscience.iop.org/article/10.1088/1742-6596/1796/1/... https://www.ecb.europa.eu/pub/pdf/scpwps/ecbwp1011.pdf
Or maybe they are? I'm not an expert in this and reading through some of the government literature there's no mention of this.
Then at least you would know that a given price marker is a good empirical index of how other prices are changing also, at least for a given dimension/component.
Looking at my data, since we’ve had our first egg 743 days ago, our hens have produced 9,393 eggs, or an average of just above a dozen a day.
The app can also count chickens, since each chicken has a UHF RFID.
Still, it made for a somewhat interesting exploration of AI techniques.
Once this can be run entirely offline, with simple github installer [0]... I'll be scanning again. This definitely "reminesced a nerve" that "took me back..."
Unfortunately not looking good for accountants, among others...
[0] I'd recon the majority would use a cloud-based, off-device processing — they just selfie each receipt
----
Ten years ago I still used a smartphone; when the banks started allowing mobile deposit, was a very Trekkie day for me...
I can assume this person does in fact NOT need to worry about the price of eggs ?
...
>>These are the days of miracle and wonder. I can’t wait to see what [the next] 30 years of eggs looks like.
Now that I'm revisiting these comments, thanks for pointing out that 30 - 25 == five years into the future [honestly, I hadn't even given this any thought...]
1999 was ten years ago, right.!?
But if you put this into your accounting spreadsheet or whatever, you'd be off by a few cents all over the place, your account balances wouldn't match up. Then what do you do?
I've been looking into this and 96% isn't great. The solution is digital receipts... which are still being blocked by industry interests etc etc.
I tend to grow bored of a location after a year or two, though I'm certainly in the minority.
* Of course you didn't buy eggs every time you traveled somewhere, so probably not the entire truth.
...
> I can’t wait to see what 30 years of eggs looks like.
At $2.70 per receipt, i'd be in no hurry to find out!
I will use them tokens to be able to afford more eggs.
You could pay a human to read receipts, 1 every 30 seconds (that’s slow!), $15/hr (twice the US federal minimum wage!), plus tax and overhead ($15x1.35) comes out to $20.25/hr over 5 hours. $101 all in.
Sure, sure, a human solution doesn’t scale. But this sort of project makes me feel like we haven’t hit the industrialization moment that i thought we had quite yet.
These days, are MTurk workers simply feeding it into AI anyway, though? It's been a few years since I've run an MTurk campaign. At the time it was clear that humans were really doing it, as you get emails from the workers sometimes.
I'd wager gemini flash could get decent results. Id be willing to try on 100 receipts and report cost
Receipt scanning OCR has been around for a long time. Circa 2010 I ran enough HITs on Mechanical Turk [1] that I got my own account representative at AWS and I wondered what other kind of HITs other people were running and thought I would "go native" and try making $100 from Turk.
I am pretty good at making judgements for training sets, I have many times made data sets with 2,000-20,000 judgements; I can sustain the 2000 judgements/day of the median Freebase annotator and manage short burst much higher than that with mild perceptual side effects.
I gave up as a Turk though because the other HITs that were easy to find was the task of accurately transcribing cell phone snaps of mangled, damaged, crumpled, torn, poorly printed, poorly photographed or otherwise defective receipts. I can only imagine that these receipts had been rejected by a rather good classical OCR system. The damage was bad enough I could not honestly say I had done a 100% correct job on any single receipt, as I was being asked to do.
[1] in today's lingo: Multimodal with prompts like "Is this a photograph of an X?" and "Write a headline to describe this image"
Had the filtering been done during the initial document storage, then the cost would have been much cheaper than your $2,000 estimate. Essentially binning the receipts based on "eggs" or "no eggs" would be free. But, crucially, what happens when the question changes from price per egg to price per gallon of milk? Now the whole stack would need to be sorted again. The $2,000 manual classification would need to be re-applied.
Isn't traditional ML-based classification cheaper for this problem at industrial scale than an LLM though? The OP did of course attempt more traditional generic off-the-shelf OCR tools, but let's consider proper bespoke industrial ML.
Just as a off-the-cuff example, I would probably start with building a tool that locates the date/time from a receipt and takes an image snip of it. Running ONLY image snips through traditional OCR is more successful than trying to extract text from an entire receipt. I would then train a separate tool that extracts images of line items from a receipt that includes item name and price. Yet another tool could then be trained to classify items based on the names of the items purchased, and a final tool to get the price. Now you have price, item, and date to put into your database.
Perhaps generating the training data to train the item classifier is the only place I could see an LLM being more cost effective than a human, but classifying tiny image snips is not the same as one-shotting an entire receipt. As an aside, if there's any desire to discuss how expensive training ML is, don't forget the price to train an LLM as well.
All of this is to say I believe traditional ML is the solution. I'm still not seeing the value prop of LLMs at the industrialization scale outside of very targeted training data generation. A more flippant conclusion might be that we can replace a lot of the parts of data science that makes PhD types get bored with creating traditional ML solutions.
So, I've actually done similar work to this: getting paid piece-rate to manual enter data from paper invoices into an accounting system. It was so long ago I can't remember how fast I got at it, but it was way slower than 2 a minute/120 an hour. I doubt I got much more than a dozen an hour done. So, my gut reaction is that your estimate on the human cost is off by an order of magnitude.
It was less than a cent per receipt, but doing each was much quicker than 30 seconds. This was in 2017, to give you some idea how good OCR was.
Even before then, I've been disappointed no major chain encodes the receipt data into a QR code or something at the bottom of the receipt to side-step this whole thing. The closest you get is some places doing digital receipts nowadays.
Spherical cows aside though, I do agree with you that I should not consider scalability as a given.
>>So I told Codex “we have unlimited tokens, let’s use them all,” and we pivoted to sending every receipt through Codex for structured extraction. From that one sentence, Codex came back with a parallel worker architecture - sharding, health management, checkpointing, retry logic. The whole thing. When I ran out of tokens on Codex mid-run, it auto-switched to Claude and kept going. I didn’t ask it to do that. I didn’t know it had happened until I read the logs.
----
For anybody still thinking my goodness, how wasteful is this SINGLE EXAMPLE: remember that all of the receipts from the article have helped better-train whichever GPT is deciphering all this thermalprinting.
For a small business owner (like my former self), paying $1500 to have an AI decipher all my receipts is still a heck of a lot cheaper than my accountant's rate. It would also motivate me to actually keep receipts (instead of throw-away/guessing), simply to undaunt the monumental task of recordskeeping.
----
>>But the runs kept crashing. Long CLI jobs died when sessions timed out. The script committed results at end-of-run, so early deaths lost everything. I watched it happen three times. On the fourth attempt I said “I would have expected we start a new process per batch.” That was the fix ... Codex patched it, launched it in a tmux session, and the ETA dropped from 12 hours to 3. Not a hard fix. Just the kind of thing you know after you’ve watched enough overnight jobs die at 3 AM.
>>11,345 receipts processed. The thing that was supposed to take all night finished before I went to bed.