1. Get an open source pdf decoder
2. Decode bytes up to first ambiguous char
3. See if next bits are valid with an 1, if not it’s an l
4. Might need to backtrack if both 1 and l were valid
By being able to quickly try each char in the middle of the decoding process you cut out the start time. This makes it feasible to test all permutations automatically and linearly
I'm lucky to have parents with strong values. My whole life they've given me advice, on the small stuff and the big decisions. I didn't always want to hear it when I was younger, but now in my late thirties, I'm really glad they kept sharing it. In hidhsight I can see the life-experience / wisdom in it, and how it's helped and shaped me.
76 pages is a couple of months of work
Also look up double/triple data-entry systems, where you have multiple people enter the data and then flag and resolve differences. Won't protect you from your staff banding together to fuck you over with maliciously bad data, but it's incredibly effective to ensure people were Actually Working Their Blocks under healthy circumstances.
I consider myself fairly normal in this regard, but I don't have 76 friends to ask to do this, so I don't know how I'd go about doing this. Post an ad on craigslist? Fiverr? Seems like a lot to manage.
Unlike every other PDF format that has been attempted, the federal government doesn't have to worry about adoption.
DjVu [1] would be another option. It has really good open source tooling available, but it supports substantially less features than PDF, making it not really suitable as a drop-in replacement. The format is relatively simple though, so redaction should be fairly doable.
TIFF [2] is already occasionally used for government documents, but it's arguably more complex than PDF, so probably not a good choice for this.
[0]: https://en.wikipedia.org/wiki/Open_XML_Paper_Specification
It’s not a tools problem, it’s a problem of malicious compliance and contempt for the law.
For example, when the Mueller reports were released with redactions, they had no searchable text or meta data because they were worried about these exact kind of data leaks.
However, vast troves of unsearchable text is not a huge win for transparency.
PDFs are just a garbage format and even good administrations struggle.
The copy linked in the post:
https://www.justice.gov/epstein/files/DataSet%209/EFTA004004...
Three more copies:
https://www.justice.gov/epstein/files/DataSet%2010/EFTA02153...
https://www.justice.gov/epstein/files/DataSet%2010/EFTA02154...
https://www.justice.gov/epstein/files/DataSet%2010/EFTA02154...
Perhaps having several different versions might make it easier.
https://www.justice.gov/epstein/files/DataSet%209/EFTA007755...
This doesn't solve the "1 & l" problem for the pdf you are looking at, but it could be useful anyway.
https://www.justice.gov/epstein/files/DataSet%2011/EFTA02702...
Followup: pdfimages is 13x faster than pdftoppm
Hmm. Anyone got some spare CPU time?
Someone who made some progress on one Base64 attachment got some XMP metadata that suggested a photo from an iPhone. Now I don't know if that photo was itself embedded in a PDF, but perhaps getting at least the first few hundred bytes decoded (even if it had to be done manually) would hint at the file-type of the attachment. Then you could run your tests for file fidelity.
that's pointed out in the article. It's easy for plaintext sections, but not for compressed sections. Didn't notice any mention of checksums.
Or worse. She did.
We’d all be lucky if it was just distributing child porn.
I've known from the second I started doing debate and FX/DX in highschool, well, let's just say I never thought that the majority of the 2FA-folks would be worth a damn when tyranny really came knocking. Fear of the other as a form of manipulation, and a distraction from class consciousness, has been their literal raison d'état since decades before I was born.
I guess I was shocked that the President being a convicted rapist and documented child predator would be a bridge too far. But then we re-elected him.
I believe it. We voted for this. We do nothing in the face of zero actual justice. This is exactly as good as we deserve. And best of all, it certainly doesn't stop here. This is what they chose to not redact. When we know they spent enormous tax-payer hundreds-of-people hours redacting the documents.
I don't think it's even conspiratorial to say they left stuff in, so they could use it as justification for not releasing the other HALF of the files that haven't been released, even overly censored.
We deserve this, and the much worse that our apathy has invited.
As a non american looking in I feel like that applies to the other side as well and is how you ended up here.
Having paid a bit of attention during the election seeing bernie and trump at least in terms of rethoric more in line with eachother on the same trade agreements, migration, etc whilst also both outperforming Hillary in the same swing states, etc is not some coincidence.
And given that you live in a 2 party state it's always going to swing at some point eventually. No matter how depraved someone like trump is. If the next one is just as bad and they sit it out long enough they will get their turn.
I’d never believe Bill Gates would secretly slip antibiotics into his wife’s cocktail to treat an STI he got from a Russian prostitute on convicted pedophile estate.
But here we are.
Unfortunately no, it just seems to be greed, incompetence, and incompetent greed. At least when a tank drives over a protestor somebody gets to be on the side of the tank. When the bus goes off a cliff because the driver sold the steering wheel everybody dies.
Epstein was trying to remove tax on banker bonuses in the UK for some reason.
There might not be a single master plan but holy hell is this stuff intertwined with everything that happens.
I’m in the second group. When a majority of people miss the basics, when a large chunk treat internet content as daily reality rather than algorithmically served rage bait, it feels like there’s nothing you can do.
A friend once told me, “I wish I were more schizo like before, it was much more fun,” and in a bleak way, I get it. I’d almost prefer it if there really were a coherent plan, some deliberate attempt by the mighty to steer civilization. But right now it mostly looks like greed and cynicism. These days, a lot of it seems to be coming out of Silicon Valley but it will change as it always does like it did before.
the mascot of 4chan was literally pedobear, what time frame are you referring to?
More likely it's just an oversight, but it could also be CYA for dragging their feet, like "you rushed us, and look at these victims you've retraumatized". There are software solutions to find nudity and they're quite effective.
There's redaction to protect victims and there's redaction to protect specific co-conspirators in Epstein's spy ring
The challenge, as we're all experiencing together, is that the law is not inherently self-enforcing.
https://www.cbsnews.com/minnesota/news/ice-violations-judge-...
> ICE has likely violated more court orders in January 2026 than some federal agencies have violated in their entire existence," Schiltz said, adding that he counted 96 court orders that ICE has violated in 74 cases.
https://www.cbsnews.com/news/frustrations-from-judge-prosecu...
https://www.politico.com/news/2026/01/27/patrick-schiltz-jud...
https://storage.courtlistener.com/recap/gov.uscourts.mnd.230...
https://storage.courtlistener.com/recap/gov.uscourts.mnd.230...
Did you notice that one article I linked involved a DoJ lawyer admitting that she couldn't convince ICE to obey court orders that she was trying to transmit to them? That's beyond an allegation and into admission. How is that not evidence?
More on these ignored court orders:
https://www.mprnews.org/story/2026/01/28/ice-illegally-detai...
https://www.startribune.com/judge-orders-detainee-returned-m...
Judges themselves complained about their own orders being ivolated/ignored. Repeatedly.
"You are taking a piss" -- you are currently urinating.
"You are taking the piss" -- you are mocking me or this.
You might think "ok can't they be held in contempt for the pattern of ignoring court orders" and, well, you'd think so. But that looks a lot like a universal injunction or a class action and SCOTUS has deliberately been nerfing those.
If they've simply been committing crimes then judges don't have anything to do- they'd have to be prosecuted by someone, or I guess sued civilly, but that won't put them in jail either and takes forever.
https://www.govinfo.gov/content/pkg/PLAW-119publ38/pdf/PLAW-... : the Attorney General was to have produced the entirety of the Epstein files, with very narrowly-enumerated redactions, in December. She has not done so.
Furthermore, there are numerous allegations that the documents that have been released contain CSAM, which (referencing the PDF above) may fall afoul of 18 U.S.C. 2252–2252A.
In addition, one need only glance at the action in US courts to see egregious violations of the Constitution and valid court orders playing out daily.
https://www.documentcloud.org/documents/26513988-trorder0128...
https://storage.courtlistener.com/recap/gov.uscourts.mnd.230...
(It's also worth noting that almost none of the government's appeals to their losses in preliminary injunctions have been on the merits as to whether or not their actions were legal, but rather on the grounds of "no one should be allowed to challenge our actions," which has also been a fairly losing argument for everybody except SCOTUS.)
Obviously administrations can violate the law. Otherwise this is just an autocracy with term limits.
yes.... any administration can be found guilty of violating law, and should be dealt with accordingly.
Allegations are literally evidence. "He attacked me" is an allegation of a crime and is evidence that would be used in conjunction with other evidence to prosecute said crime.
They illegally withheld funds (impoundment) from congressionally authorized/mandated expenditures and relied on pocket rescissions to defund programs they didn't like: https://www.cbpp.org/research/federal-budget/pocket-rescissi...
They keep illegally appointing unqualified hacks as US attorney in defiance of the mandate they're approved by the Senate (Essayli, Habba, Halligan, Sarcone, Chattah) - judges have found at least five of the appointments illegal. As one example: https://www.politico.com/news/2025/10/28/judge-los-angeles-t...
They've repeatedly violated court orders to either return immigrant detainees or release them. "This is one of dozens of court orders with which respondents have failed to comply in recent weeks.": https://www.cnn.com/2026/01/27/politics/patrick-schiltz-judg...
The EPA illegally convened a secret panel of climate deniers to issue a sham report in order to repeal the endangerment finding: https://www.nytimes.com/2026/01/30/climate/energy-department...
His targeting and shakedowns of Universities, law firms, and media companies is transparently illegal jawboning.
Everything about the tariffs is obviously illegal which he confirms every time he opens his mouth since he's relying on 'national security' justifications to issue them without Congress and he keeps insisting they're punishment for some random perceived slight.
His illegal firing of Federal workers without the notice required: https://www.npr.org/2025/09/25/nx-s1-5544317/federal-probati...
Some sillier things like renaming the Kennedy Center -- the law that established it literally said that it couldn't be renamed without Congress -- so Trump firing everyone on the board and then appointing a bunch of his flunkees to vote for the name change doesn't cut it.. https://beatty.house.gov/sites/evo-subsites/beatty.house.gov...
It's a literal onslaught of illegality so I can't tell if you haven't read a news article since 2025 or if you're trolling.
The legal situation regarding CSAM is very strict no matter which country, and I better hope no one here will actually be dumb enough to provide actual links.
> even if you took the picture yourself.
I'd hope the punishment is more severe in that case!
Obviously most people are sensible most of the time but sometimes they are not.
And nudity is not required.
That's from RAINN, the US's largest anti-sexual violence organisation.
I'm talking about kids making photos of themselves. Which has been an issue multiple times in the past.
I tried to find the message in this blog post, but couldn't. (don't see how to search by date).
I had a reasonably simple problem to solve, slightly weird font and some 10 words in English (I actually only missed one or two blocks for missing letters to cover all I needed).
After a couple of days having almost everything (?) I just surrendered. This seems to be intentionally hostile. All the docs scattered across several repositories, no comprehensive examples, etc.
Absolutely awful piece of software from this end (training the last gen).
PDF is basically a prettify layer on top of the older PS that brings an all lot of baggage. The moment you start trying to do what should be simple stuff like editing lines, merging pages, change resolution of the images, it starts giving you a lot of headaches.
I used to have a few scripts around to fight some of its quirks from when I was writing my thesis and had to work daily with it. But well, it was still an improvement over Word.
A dynamic programming type approach might still be helpful. One version or other of the character might produce invalid flate data while the other is valid, or might give an implausible result.
The recipient is also named in there...
The search on the DOJ website (which we shouldn't trust), given the query: "Content-Type: application/pdf; name=", yields maybe a half dozen or so similarly printed BASE64 attachments.
There's probably lots of images as well attached in the same way (probably mostly junk). I deleted all my archived copies recently once I learned about how not-quite-redacted they were. I will leave that exercise to someone else.
Of course there are other content-types, e.g. searching for "Content-Type: image/jpeg" gets hits as well. But only a few of them actually have the base64 data, mostly there are just the MIME headers.. Looking for "/9j/" (which is Base64 for FF D8 FF, which is the header for JPEG files), the Trumpian justice.gov website ignores "/" and shows results case-insensitively, but there are 4 or 5 base64'ed JPEG images in there.
I also saw that the page is vulnerable to code injection, somehow garbage in one search result preview was OCREd as "<s [lots of garbage]>", and the rest of the search results were striken-through because "<s>" is the HTML to do that.
https://www.justice.gov/epstein/files/DataSet%2010/EFTA01804...
https://www.justice.gov/epstein/files/DataSet%209/EFTA007755...
https://www.justice.gov/epstein/files/DataSet%209/EFTA004349...
and than this one judging by the name of the file (hanna something) and content of the email:
"Here is my girl, sweet sparkling Hanna=E2=80=A6! I am sure she is on Skype "
maybe more sinister (so be careful, i have no ideas what the laws are if you uncover you know what trump and Epstein were into)...
https://www.justice.gov/epstein/files/DataSet%2011/EFTA02715...
[Above is probably a legit modeling CV for HANNA BOUVENG, based on, https://www.justice.gov/epstein/files/DataSet%209/EFTA011204..., but still creepy, and doesn't seem like there's evidence of her being a victim]
Anyway searching for the email sender's name, there's a screenshot of an email of hers in English offering him a girl as an assistant who is "in top physical shape" (probably not this Hanna girl). That's fucking creepy: https://www.expressen.se/nyheter/varlden/epsteins-lofte-till...
Cool article, however.
It's really really hard to give them the benefit of the doubt at this point.
Incompetence is incompetence.
They wasted months erasing Trump from that instead. So it's on them.
Claude Opus came up with this script:
https://pastebin.com/ntE50PkZ
It produces a somewhat-readable PDF (first page at least) with this text output:
https://pastebin.com/SADsJZHd
(I used the cleaned output at https://pastebin.com/UXRAJdKJ mentioned in a comment by Joe on the blog page)
https://www.mountsinai.org/about/newsroom/2012/dubin-breast-...
https://www.businessinsider.com/dubin-breast-center-benefit-...
Even names match up, but oddly the date is different.
There was a time when the guy making the cannon had to sit on top of it for the first shot. Perhaps this kind of policy could be adapted to other situations aswell.
Take the job to guard epstein? take the consequences when things go wrong.
Protect criminals? take the very real consequences if found out
For a while, my pet conspiracy theory was that this was Epstein's real cause of death: a lynching by a prison guard made to look like suicide.
I never took it too seriously, because no actual evidence; now I'm more inclined to think it was a coconspirator hoping it would mean no more evidence getting out.
All it takes is a single actor paying off some guards to ‘fall asleep’, a camera to be disabled, and a 15 minute window of opportunity. It’s much more probable than something like the US Government planning 9/11 and somehow keeping thousands of co-conspirators silent.
I don’t really spend a whole lot of time thinking about it since as you said, we’ll never know for sure. It just seems at least probable if he actually did have kompromat on powerful people.
She's a medical doctor, who became amnesic when on the stand for Maxwell's case
>Pressed about gaps in her memory, Dubin told the court: "It's very hard for me to remember anything far back and sometimes I can't remember things from last month. My family notices it. I notice it."
which uses this Rust zlib stream fixer: https://pastebin.com/iy69HWXC
and gives the best output I've seen it produce: https://imgur.com/itYWblh
This is using the same OCR'd text posted by commenter Joe.
Xerox would like a word.
https://news.ycombinator.com/item?id=29223815
Point being, "correcting" to "correct looking" may be worse than just accepting errors. Errors are often clearly identified by humans as a nonsense word. "Correcting" OCR can result in plausible, but wrong results that are more difficult for the human in the loop to identify.
So yes, the "fixed" output has errors, but it’s not hallucinating details like an LLM, nor is it trying to produce output that conforms to any linguistic or stylistic heuristics.
The phrase "correcting similar OCR'd PDFs" should have been "correcting similar OCR'd base 64 representations of PDFs".
Any chance you could share a screenshot / re-export it as a (normalized) PDF? I’m curious about what’s in there, but all of my readers refuse to open it.