Hacker News

290 points by avyfain 6 days ago | 80 comments

Absolutely loved the article, the process, and the results. Hated the price.

You could pay a human to read receipts, 1 every 30 seconds (that’s slow!), $15/hr (twice the US federal minimum wage!), plus tax and overhead ($15x1.35) comes out to $20.25/hr over 5 hours. $101 all in.

Sure, sure, a human solution doesn’t scale. But this sort of project makes me feel like we haven’t hit the industrialization moment that i thought we had quite yet.

electroly 23 hours ago

From some minor historical experience with Mechanical Turk, I bet you could get humans to do this for one or two cents per receipt. You do them all three times for error checking for $0.03-$0.06 per receipt. I used to pay a nickel for much, much more than 5x this amount of transcription per job, and I got the feeling that I was overpaying based on how eagerly I got responses in and that I saw a lot of the same workers repeatedly.

These days, are MTurk workers simply feeding it into AI anyway, though? It's been a few years since I've run an MTurk campaign. At the time it was clear that humans were really doing it, as you get emails from the workers sometimes.

stavros 22 hours ago

I wasn't ready for Artificial Artificial Artificial Intelligence.

knollimar 2 days ago

There's no way there wasnt a more efficient way of doing this. Way too many tokens per receipt.

I'd wager gemini flash could get decent results. Id be willing to try on 100 receipts and report cost

andai 2 days ago

I set up the same thing a few months ago with Flash, seemed to work fine. I didn't test more than a few receipts though (concluded that my spending wasn't a problem, I just needed to make more money lol), so can't vouch for the reliability at scale. But it handled really wrinkled, faded old receipts quite well.

PaulHoule 2 days ago

My wife complains about people complaining about the price of eggs every time the subject comes up because it's her duty as a housewife to know about the price of all the protein sources and they are still a bargain -- who'd have thought that the price of transcribing the receipts would be seen as even more onerous?

Receipt scanning OCR has been around for a long time. Circa 2010 I ran enough HITs on Mechanical Turk [1] that I got my own account representative at AWS and I wondered what other kind of HITs other people were running and thought I would "go native" and try making $100 from Turk.

I am pretty good at making judgements for training sets, I have many times made data sets with 2,000-20,000 judgements; I can sustain the 2000 judgements/day of the median Freebase annotator and manage short burst much higher than that with mild perceptual side effects.

I gave up as a Turk though because the other HITs that were easy to find was the task of accurately transcribing cell phone snaps of mangled, damaged, crumpled, torn, poorly printed, poorly photographed or otherwise defective receipts. I can only imagine that these receipts had been rejected by a rather good classical OCR system. The damage was bad enough I could not honestly say I had done a 100% correct job on any single receipt, as I was being asked to do.

[1] in today's lingo: Multimodal with prompts like "Is this a photograph of an X?" and "Write a headline to describe this image"

moron4hire 2 days ago

You're counting just the egg-having receipts, but there were over 11 thousand receipts they had to go through to get to that 500-ish subset. I'm assuming OP wanted to process all of the receipts and then selected just eggs for a simple analytics job. With your rates, the human would cost almost $2000.

cheschire 2 days ago

Capturing the egg price from known egg receipts was the problem I was focused on, but you're right that there was also a filtering problem in the original spec. You get my upvote for continuing to make the problem interesting for me!

Had the filtering been done during the initial document storage, then the cost would have been much cheaper than your $2,000 estimate. Essentially binning the receipts based on "eggs" or "no eggs" would be free. But, crucially, what happens when the question changes from price per egg to price per gallon of milk? Now the whole stack would need to be sorted again. The $2,000 manual classification would need to be re-applied.

Isn't traditional ML-based classification cheaper for this problem at industrial scale than an LLM though? The OP did of course attempt more traditional generic off-the-shelf OCR tools, but let's consider proper bespoke industrial ML.

Just as a off-the-cuff example, I would probably start with building a tool that locates the date/time from a receipt and takes an image snip of it. Running ONLY image snips through traditional OCR is more successful than trying to extract text from an entire receipt. I would then train a separate tool that extracts images of line items from a receipt that includes item name and price. Yet another tool could then be trained to classify items based on the names of the items purchased, and a final tool to get the price. Now you have price, item, and date to put into your database.

Perhaps generating the training data to train the item classifier is the only place I could see an LLM being more cost effective than a human, but classifying tiny image snips is not the same as one-shotting an entire receipt. As an aside, if there's any desire to discuss how expensive training ML is, don't forget the price to train an LLM as well.

All of this is to say I believe traditional ML is the solution. I'm still not seeing the value prop of LLMs at the industrialization scale outside of very targeted training data generation. A more flippant conclusion might be that we can replace a lot of the parts of data science that makes PhD types get bored with creating traditional ML solutions.

moron4hire 2 days ago

Also, playing hotdog-not-hotdog on a receipt, looking for the price of eggs, and then entering them, is a very different job than the open-ended case of "enter all the relevant information from this receipt. There is large classification task that also has to take place to group name-brand items into generic categories (an open set that you don't know from the start) suitable for analyzing.

So, I've actually done similar work to this: getting paid piece-rate to manual enter data from paper invoices into an accounting system. It was so long ago I can't remember how fast I got at it, but it was way slower than 2 a minute/120 an hour. I doubt I got much more than a dozen an hour done. So, my gut reaction is that your estimate on the human cost is off by an order of magnitude.

stavros 2 days ago

One issue is that the human was less accurate than the LLM. The other is that the author probably didn't pay $1,500 for this, they probably paid $20 on a subscription.

barbegal 15 hours ago

Total receipts were over 11,000 so more like 100 hours or around $2000 so a similar price to the LLM.

N_Lens 2 days ago

AI has some weird unexpected uses that haven’t been fully uncovered yet, while it fails to scale or match the needed accuracy on expected usecases.

extraduder_ire 2 days ago

When I tried doing mechanical turk jobs out of curiosity, one of the tasks was checking/amending OCR'd receipts. (image on the left, textbox on the right)

It was less than a cent per receipt, but doing each was much quicker than 30 seconds. This was in 2017, to give you some idea how good OCR was.

Even before then, I've been disappointed no major chain encodes the receipt data into a QR code or something at the bottom of the receipt to side-step this whole thing. The closest you get is some places doing digital receipts nowadays.

wolfram74 2 days ago

I mean, at over 1000% the cost, the machine solution doesn't scale either?

cheschire 2 days ago

I think at a certain scale we're talking about switching to local trained models which don't have the same operating costs as running a frontier model for OCR. That would reduce the ongoing costs significantly. Might take longer than 30 seconds to read each receipt if you run multiple passes to ensure accuracy, but could run 24/7/365 without the same tax and administration overhead of humans.

Spherical cows aside though, I do agree with you that I should not consider scalability as a given.

wolfram74 2 days ago

I suppose if we had access to a public data set like this receipt bank, programmers could time themselves setting up a solution with off the shelf OCR algos. If they could clock in at under 10 hours they could advertise themselves as being "just as good as an LLM, but significantly cheaper." Downside for the managerial class that wants generative algos for the complete lack of legal protections.

ProllyInfamous 2 days ago

Not yet.

>>So I told Codex “we have unlimited tokens, let’s use them all,” and we pivoted to sending every receipt through Codex for structured extraction. From that one sentence, Codex came back with a parallel worker architecture - sharding, health management, checkpointing, retry logic. The whole thing. When I ran out of tokens on Codex mid-run, it auto-switched to Claude and kept going. I didn’t ask it to do that. I didn’t know it had happened until I read the logs.

----

For anybody still thinking my goodness, how wasteful is this SINGLE EXAMPLE: remember that all of the receipts from the article have helped better-train whichever GPT is deciphering all this thermalprinting.

For a small business owner (like my former self), paying $1500 to have an AI decipher all my receipts is still a heck of a lot cheaper than my accountant's rate. It would also motivate me to actually keep receipts (instead of throw-away/guessing), simply to undaunt the monumental task of recordskeeping.

----

>>But the runs kept crashing. Long CLI jobs died when sessions timed out. The script committed results at end-of-run, so early deaths lost everything. I watched it happen three times. On the fourth attempt I said “I would have expected we start a new process per batch.” That was the fix ... Codex patched it, launched it in a tmux session, and the ETA dropped from 12 hours to 3. Not a hard fix. Just the kind of thing you know after you’ve watched enough overnight jobs die at 3 AM.

>>11,345 receipts processed. The thing that was supposed to take all night finished before I went to bed.

qoez 2 days ago

Imagine how many 2001 era eggs he could have bought with that $101

MarceliusK 2 days ago

[dead]

ProllyInfamous 2 days ago

>Everyone needs a rewarding hobby. I’ve been scanning all of my receipts since 2001. I never typed in a single price - just kept the images. I figured someday the technology to read them would catch up, and the data would be interesting.

This is perhaps among the best openers I've ever read.

[spoiler: the tech caught up, the data is interesting]

I read a lot. This article, entirely.

andai 2 days ago

I have the same thing except instead of receipts, I've been saving everything.

"Some day, AI will be able to sort this out."

Now I'm just waiting for the token costs to come down ;)

ProllyInfamous 12 hours ago

It'll be less than a year and an .app will exist solely for this (to then be bought out by Accountoglomerate Co, INC, mostly for all the consumerdata). I can imagine (as a former) business owner support for this is worth $100/month, a personalplan much less but still interested [introfree tier?!].

IMHO the worst part of running any business/project is the paperwork...

My taxesforms still get struck typewritten. For 2022, I thanked ChatGPT (lol). Audit me, I don't care (you will).

MarceliusK 2 days ago

Technically interesting and genuinely well-written end to end

egeozcan 2 days ago

I usually avoid shallow comments but I feel like this time it has to be said as a conversation starter: That's a lot of eggs!

Also ignoring the benefits of subscriptions, an estimate in the magnitude of thousands of dollars for extracting egg prices still makes me feel like we aren't "there" yet. This should have been a problem with a much more efficient solution given the advancements in the AI, data analysis and OCR space. I am sort of disillusioned.

sgbeal 2 days ago

> This should have been a problem with a much more efficient solution given the advancements in the AI, data analysis and OCR space.

There's got to be a "it's a chicken/egg problem" joke in there somewhere, but i'm not seeing it.

egeozcan 2 days ago

I actually was going to go for the "why did the chicken not cross the road?". Then I wanted to say "because it was in a price negotiation with the author to sell its eggs", but it was too wordy. Then I thought, "because the author had it as an egg before it could hatch", but it was too dark... Then I gave up.

Well, I guess you cannot make a chicken joke without breaking some eggs (I'll stop now. I'm really sorry, but come on, it's Sunday).

bombcar 2 days ago

You’ve got two weeks to work on this before Eggster.

sgbeal 2 days ago

> (I'll stop now. I'm really sorry, but come on, it's Sunday).

FWIW, you made an eggceptional attempt :).

tclancy 2 days ago

Try a different vision model.

wiether 2 days ago

> That's a lot of eggs!

Less than one per day, assuming they're doing groceries only for themselves

MarceliusK 2 days ago

I wouldn't read this as "AI can't do this efficiently yet" but more like "we're still figuring out the playbook"

f0cus10 2 days ago

[dead]

rkagerer 19 hours ago

10 years ago I wrote a reconciliation tool in VBA in Excel. I scan I all the (mostly thermal-printed) receipts and it matches them to credit card charges. I always envisioned incorporating OCR to automatically extract the totals, but the libraries were never good enough for my taste (and I've used industry-leading ones in work settings that process millions of reads a day).

So instead, I made a very simple UI where you just key in the amount (literally 5 keystrokes on average per image) and it finds the matching charge (or hit enter to instantly cycle through all matches). I've done bookeeping/taxes that way for a decade and keying has never been the bottleneck.

Recently I realized Amazon accounts for around a third of my credit card charges, by volume (yikes!). Unfortunately their transactions are more difficult to reconcile as portions of orders are charged piecemeal as they ship. Further, their webpage that is supposed to list your credit card charges with the matching order numbers is broken (lots of data missing - have reproduced and filed a bug report with their exec team which is still being worked on a month later).

So I wrote another tool. You download your order data and invoices via a personal data request, and it goes out and reconciles all of them. I wind up with a nice spreadsheet i can scroll around in, and whenever the cursor hits a row with an Amazon charge all the paperwork along with a generated order summary (granular down to the shipments and items) comes up on the screen to the right.

Pretty slick. And took less time to code up than his vibecoded project (but hats off to him anyway, sounds like a nice little project to hone your AI skills on). Sometimes these simple little bespoke tools are a far superior "productivity force multiplier" than fancy, generic commercial equivalents.

ismailmaj 2 days ago

I don't know why people mess with tesseract in 2026, attention-based OCRs (and more recently VLMs) outperformed any LSTM-based approach since at least 2020.

My guess is that it's the entry-point to OCR and the internet is flooded by that, just like pandas for data processing.

mettamage 2 days ago

Painful comparison haha

Leaving a comment so I can more easily find this

And for the people wondering about Pandas, use Polars instead

eichin 2 days ago

I was surprised to learn (from this article) that there are local models that can do this (not sure if there are any that run on hardware I actually have though, unlike Tesseract which works fine on the scanning hardware I set up for it ~5 years ago.) For privacy reasons, cloud-based OCR is a non-starter...

segmondy 22 hours ago

surprisingly, the ocr models don't need much vram, they are often about 2b, so most 6gb GPU will handle it fine.

petercooper 2 days ago

Quite, I threw a so-so photo of an old, long receipt at Qwen 3.5 0.8MB (runs in <2GB) and it nailed spitting 20+ items out in under a second. AI is good at many things, but picking modern dependencies not so much.

andai 2 days ago

Are you running it with Ollama?

petercooper 2 days ago

LM Studio in this case

segmondy 22 hours ago

yup, deepseek-ocr-2 will have crushed this. then there's glm-ocr, dots-ocr, etc, paddle-ocr-vl, etc

tons of options ...

JumpCrisscross 21 hours ago

Expensive eggs are a political choice. Canada has eggs [1]. Mexico, too [2]. Meanwhile we have Tyson notching record profits [3] while facing zero antitrust scrutiny.

[1] https://www.npr.org/2025/03/18/nx-s1-5330454/egg-shortages-r...

[2] https://www.globalproductprices.com/rankings/egg_prices/

[3] https://farmaction.us/farm-action-calls-for-an-investigation...

PaulHoule 2 days ago

I am amused that this in the classic 1955 Asimov story

https://en.wikipedia.org/wiki/Franchise_(short_story)

the protagonist is interviewed as a one-man "focus group" in lieu of a national election and one of the questions he is asked is "What do you think about the price of eggs?" and he said roughly "I have no idea, my wife does the shopping."

rdiddly 2 days ago

This is the perfect job for AI, in that it's handling work the human didn't care enough to do manually. Although of course I don't care either. No value judgment there, just an observation. Imagine a place - a field let's say, part of a farm, long ago, but it had a road built through it, and thereby became a non-place, a patch of ground nobody dwells in or pays attention to or cares about, because when they're on it they're always heading somewhere else. The AI phenomenon is like that.

PowerElectronix 2 days ago

Inflation adjusted dsta just comes to tell us that either eggs have been outdoing the CPI for 25 years or that actual CPI is way higher than what the BLS calculates.

vitus 2 days ago

It depends what dates you're looking at, but energy (gas prices and more) and food (including eggs) are generally recognized as way more volatile than the rest of the CPI.

Eggs were actually quite stable for the 20 years prior to 2001, so maybe don't put your life savings into egg futures...

Egg prices: https://fred.stlouisfed.org/series/APU0000708111

CPI: https://fred.stlouisfed.org/series/CPIAUCSL

Core CPI (without food + energy prices): https://fred.stlouisfed.org/series/CPILFESL

PowerElectronix 2 days ago

That is very curious, yes. Eggs seem to just start to increase dramatically after 2000 and indeed outdo the CPI, disregarding the peaks and valleys of the different shocks to egg production like covid and the avian flu.

I read that the price includes free range, eco, etc varieties which are more expensive and in more demand nowadays, probably just that explains a good chunk of the price increase.

bix6 2 days ago

This is a good read if you haven’t seen it. Spoiler alert it’s private equity. Shocker I know.

https://www.thebignewsletter.com/p/hatching-a-conspiracy-a-b...

PowerElectronix 2 days ago

That is indeed a good read, I wasn't aware that there is now a Big Egg fixing egg prices.

voakbasda 2 days ago

I think it is now relatively safe to assume that there is Big X fixing the prices of X, for pretty much any X that could turn a profit.

ThrownOffGame 24 hours ago

[dead]

derbOac 2 days ago

I feel like those links are more useful than the target essay.

Reading through them, I wonder why CPIs aren't based on empirical correlational patterns between prices over time? Sort of like in these articles:

https://iopscience.iop.org/article/10.1088/1742-6596/1796/1/... https://www.ecb.europa.eu/pub/pdf/scpwps/ecbwp1011.pdf

Or maybe they are? I'm not an expert in this and reading through some of the government literature there's no mention of this.

Then at least you would know that a given price marker is a good empirical index of how other prices are changing also, at least for a given dimension/component.

MarceliusK 2 days ago

Or a third option: eggs are just a terrible proxy for CPI

tclancy 2 days ago

Without saying "I bought the exact same brand and type of egg" for 25 years, the data is probably pretty noisy and may reflect the author's income changes as well as the price of eggs.

joks 2 hours ago

The more recent eggs being from Whole Foods definitely points toward this. I'm in a different part of the country but eggs are currently ~15¢/egg at grocery stores around here.

yorwba 2 days ago

CPI tracks a weighted average of a large basket of different goods, of which eggs are only a small part. It would be extremely surprising if the change in egg prices over time closely matched CPI.

hbarka 2 days ago

It’s so exciting to read more and more articles like this, using LLMs to discover clever solutions. I mean how many of us have dreamed of scanning years of receipts, waiting for that moment when you know a DIY solo application is at hand. I’m not being sarcastic, I too have a drawer full of Costco receipts which to me are data waiting for insight, not just crinkly paper. It’s more than being clever, it’s the realization of using a device not as a tool, but an equal partner who can suggest what tools and approaches to do. The end product of the LLM is not the point (although it can produce it better than ever), it’s the way an LLM can elevate messy knowledge work. A single person can now say that analysis knows no bounds.

krogenx 20 hours ago

A bit of a shill comment but… I have chickens and have been tracking egg production in an app that I’ve built, a livestock manager of sorts called Manger.

Looking at my data, since we’ve had our first egg 743 days ago, our hens have produced 9,393 eggs, or an average of just above a dozen a day.

The app can also count chickens, since each chicken has a UHF RFID.

https://m.youtube.com/watch?v=_iGn_pZ3IkY

faxmeyourcode 2 hours ago

Wow, I didn't realize some RFID could reach 15 feet out - that's good to know. I naively thought you essentially had to be touching the surface of the tag.

ttul 24 hours ago

Tokens consumed: 1.6 billion Estimated token cost: $1,591

Wow.

dinohlm 2 days ago

The most surprising thing about this whole story is that he's been scanning all his receipts for the past 25 years. I've never heard of anyone doing this before and don't really know why you would want to.

Still, it made for a somewhat interesting exploration of AI techniques.

ProllyInfamous 12 hours ago

I did this, perhaps thirty years ago (rocking a flatbed in the 90s #ROFL)... for about two years. Then decided that OCR was terrible. I revisited on a multifunction copier, mid-00s — to the same conclusion.

Once this can be run entirely offline, with simple github installer [0]... I'll be scanning again. This definitely "reminesced a nerve" that "took me back..."

Unfortunately not looking good for accountants, among others...

[0] I'd recon the majority would use a cloud-based, off-device processing — they just selfie each receipt

----

Ten years ago I still used a smartphone; when the banks started allowing mobile deposit, was a very Trekkie day for me...

eeixlk 2 days ago

Apart from the comical cost of extracting this data from paper receipts, is it more likely that stores will publish their product costs over time so trends can be observed or be more like gas stations where no prices are listed. I have no idea why a box of Cheerios costs $7 for processed oats but i see millions of reasons to obscure that data.

sylos 21 hours ago

Stores will never publish anything like that. Why would they give consumers more informatian.

EdNutting 2 days ago

The AI writing of the article made me give up halfway through. It’s a neat idea but the writing style of these AI models is brain-grating, especially when it’s the wrong style choice for this kind of technical report.

gib444 2 days ago

> Estimated token cost $1,591

I can assume this person does in fact NOT need to worry about the price of eggs ?

OJFord 2 days ago

I think they worked that back from tokens used, hence the estimation, but their actual billing was Claude Code & Codex subscriptions. (Which probably was also the main contributor to it taking 14 days.)

tkgally 2 days ago

I haven't tried it with receipts, but I've gotten excellent OCR results with Gemini 3.0 and now 3.1 on some challenging texts: handwritten letters I couldn't fully decipher myself, vertically printed Japanese texts with tiny furigana readings next to the kanji, a 19th century book in English with extensive use of italics and small caps. Gemini is good at extracting text and formatting from complex layouts, and it might work with egg receipts, too.

smcg 2 days ago

Many states passed requirements for cage free eggs that went into effect by end of 2024 so that has had some effect on prices.

TurdF3rguson 23 hours ago

I think it's mostly been caused by avian-flu related shortages and rising feed costs. I've personally had an avian-flu disaster, it's a nightmare to recover from.

MarceliusK 2 days ago

Overall this feels less like a quirky egg project and more like a blueprint for how messy real-world data pipelines are going to look going forward

ProllyInfamous 2 days ago

>>Here’s what made the quality good: every time I caught something, I could show the agents what to look for and they’d go fix it everywhere.

...

>>These are the days of miracle and wonder. I can’t wait to see what [the next] 30 years of eggs looks like.

eichin 2 days ago

Not convinced of that edit - or at least, my read was "revisit this 5 years from now", not 30...

ProllyInfamous 12 hours ago

The edit was perhaps personal... actuarily, three decades is what I'd be given =D

Now that I'm revisiting these comments, thanks for pointing out that 30 - 25 == five years into the future [honestly, I hadn't even given this any thought...]

1999 was ten years ago, right.!?

rendaw 14 hours ago

Okay, so this is good for tracking egg price changes (I guess? It was $1,591).

But if you put this into your accounting spreadsheet or whatever, you'd be off by a few cents all over the place, your account balances wouldn't match up. Then what do you do?

I've been looking into this and 96% isn't great. The solution is digital receipts... which are still being blocked by industry interests etc etc.

flurb 2 days ago

Great article through and through. The total number of places you've bought eggs at made me feel a tad depressed though: 4 places where you lived at or spent a longer time, 5 you traveled to *.

I tend to grow bored of a location after a year or two, though I'm certainly in the minority.

* Of course you didn't buy eggs every time you traveled somewhere, so probably not the entire truth.

jtwaleson 23 hours ago

Hmm, I've been sending receipts straight into Gemini 3 Flash and it handles them just fine. No need for this whole pipeline and definitely MUCH cheaper. Am I missing something?

sgbeal 2 days ago

> Estimated token cost $1,591 > Confirmed egg receipts 589 > Total egg spend captured $1,972 > Total eggs 8,604

...

> I can’t wait to see what 30 years of eggs looks like.

At $2.70 per receipt, i'd be in no hurry to find out!

Metacelsus 2 days ago

And if the price reflected the externalities of factory farming, eggs would be even more expensive!

s1mn 2 days ago

I'm such a sucker for a good, data-driven article. Love this.

brcmthrowaway 2 days ago

Question: Do big chat providers tool call an dedicated OCR, or is it part of the LLM?

BoredPositron 2 days ago

There is a reason why reciept transcription is still the task with the highest demand on mechanical turk.

DeathArrow 2 days ago

Without 25 years of photographing receipts, weeks of agents coding and billions of token spent, I can predict that egg prices increased, and the graph of my egg consumption over time is concave, part because my income has risen, part because while all prices get inflated, eggs are still cheaper than other sources of protein, and I did in less than 1 microsecond.

I will use them tokens to be able to afford more eggs.

twinpost_rules 17 hours ago

[dead]