Archive.today is directing a DDoS attack against my blog - https://news.ycombinator.com/item?id=46843805 - Feb 2026 (168 comments)
Ask HN: Weird archive.today behavior? - https://news.ycombinator.com/item?id=46624740 - Jan 2026 (69 comments)
With this said, I also disagree with turning everyone that uses archive[.]today into a botnet that DDoS sites. Changing the content of archived pages also raises questions about the authenticity of what we're reading.
The site behaves as if it was infected by some malware and the archived pages can't be trusted. I can see why Wikipedia made this decision.
It's very silly to talk about doxing when all someone has done is gather information anyone else can equally easily obtain, just given enough patience and time, especially when it's information the person in question put out there themselves. If it doesn't take any special skills or connections to obtain the information, but only the inclination to actually perform the research on publicly available data, I don't see what has been done that is unethical.
That's no justification for using visitors to your site to do a DDOS.
In the slang of reddit: ESH
No, harassment also includes persistent attempts to cause someone grief, whether or not they involve direct interactions with that person.
From Wikipedia:
> Harassment covers a wide range of behaviors of an offensive nature. It is commonly understood as behavior that demeans, humiliates, and intimidates a person.
Because the two are distinct, one can't simply replace "doxing" with "harassment".
>I for one will be buying Denis/Masha/whoever a well deserved cup of coffee.
Using one term when what is meant is actually the other serves nothing but to sow confusion.
And i don't just mean under colloquial definition, i mean under the legal definition of harrasment. In fact its fairly common for unwanted "positive" attention to be harrasment - e.g. unwanted sexual advances mostly fit that description.
I get that a call to action is a common feature of doxing and it wasn't present here, but its not a particularly common feature of harrasment outside of the context of doxing and nothing in the definition of harrasment requires it.
In that context I don't think the question ("actually, who is providing all this information to me and what interests drive them") is one that's misplaced. Maybe we shouldn't look into a gift horse's mouth but don't forget this could be a Trojan horse as well.
The article brought to light some ties to Russia but probably not ties to its government and its troll farms. Rather an independent and pretty rebellious citizen. That's good to hear. And that's valuable information. I trust the site more after reading the article, not less.
The article could have redacted the names they found but they were found with public sources and these sources validate the encountered information (otherwise the results could have been dismissed)
I'm not defending the archive.today webmaster but it's unfortunately understandable they are angry. Saying what the blogger did was merely point out public information is a gross oversimplification.
Oddly, I think archive.today has explicitly said that's not what they're there for, and the people shouldn't rely on their links as a long-term archive.
> Archive.today is a time capsule for web pages! > It takes a 'snapshot' of a webpage that will always be online even if the original page disappears.
So it doesn't necessarily raise questions about whether the content has been changed or not. The difference is in whether that change is there to make the archive usable - and of course, for archive.today, that's not the case.
This is absolutely the buried lede of this whole saga, and needs to be the focus of conversation in the coming age.
It still is, uBlocks default lists are killing the script now but if it's allowed to load then it still tries to hammer the other blog.
I don't know, I feel like everyone loses here.
(For those who don't know, he's currently trying to destroy one of the largest WP hosting providers with a bunch of lawsuits)
I don't think the DDOSing is a very good method for fighting back but I can't blame anyone for trying to survive. They are definitely the victim here.
If that blog really doxxed them out of idle curiosity they are an absolute piece of shit. Though I think this is more of a targeted campaign.
In this case, I didn't know that the archive.today people were doxxed until they started the ddos campaign and caught attention. I doubt anyone in this thread knew or cared about the blogger until he was attacked. And now this entire thing is a matter of permanent record on Wikipedia and in the news. archive.today's attempt at silencing the blogger is only bringing them more trouble, not less.
Barbara_Streisand_Mansion.jpg
Probably nothing and the DDoS hype was intentional to distract attention and highlight J.P.'s doxx among the other, making them insignificant.
J.P. might be the only one of the doxxers who could promote their doxx in media, and this made his doxx special, not the content?
Anyway, it made the haystack bigger keeping needle the same.
One of the really strange things about all of this is that there is a public forum post in which a guy claims to be the site owner. So this whole debacle is this weird mix of people who are angry and saying "clearly the owner doesn't want to be associated with the site" on the one hand, but then on the other hand there's literally a guy who says he's the one that owns the site, so it doesn't seem like that guy is very worried about being associated with it?
It also seems weird to me that it's viewed as inappropriate to report on the results of Googling the guy who said he owns the site, but maybe I'm just out of touch on that topic.
Which forum post? The post mentioned by the blogger, the post on an F-Secure forum (a company with cybersecurity products) was a request for support by the owner of archive.today regarding a block of their site. It's arguably not intended as a public statement by the owner of the archive, and they were simply careless with their username.
You don't know their motives for running their site, but you do get a clear message about their character by observing their actions, and you'd do well to listen to that message.
They might be the worst person ever but that doesn't matter. People can be good and bad, sometimes the victim sometimes the perpetrator.
Is it morally wrong to doxx someone and cause them to go to jail because they are running an archive website? Yes. It is. It doesn't matter who the person is. It does not matter what their motivations are.
> I don't think the DDOSing is a very good method for fighting back
I am really shocked by the conditional empathy people here are showing. The doxxing isn't less bad just because the reaction to it is bad.
Its like justifying bullying because the person "deserves" it.
[1] https://archive.today/20240714173022/https://x.com/archiveis...
[2] https://x.com/advancedhosters
[3] https://x.com/advancedhosters/status/1731129170091004412
[4] https://lj.rossia.org/users/mopaiv/257.html
[5] https://x.com/advancedhosters/status/1501971277099286539
If archive.whatever wasn't so useful to the general public, it'd be hard to distinguish from a criminal operation given the way it operates, unlike say the Internet Archive who goes through all of the proper legal paperwork to be a real nonprofit.
Every Reddit archived page used to have a Reddit username in the top right, but then it disappeared. "Fair enough," I thought. "They want to hide their Reddit username now."
The problem is, they did it retroactively too, removing the username from past captures.
You can see on old Reddit captures where the normal archived page has no username, but when you switch the tab to the Screenshot of the archive it is still there. The screenshot is the original capture and the username has now been removed for the normal webpage version.
When I noticed it, it seemed like such a minor change, but with these latest revelations, it doesn't seem so minor anymore.
That doesn't seem nefarious, though. It makes sense they wouldn't want to reveal whatever accounts they use to bypass blocks, and the logged-in account isn't really meaningful content to an archive consumer.
Now, if they were changing the content of a reddit post or comment, that would be an entirely different matter.
No, certain edits are understandable and required. Even the archive.org edits its pages (e.g. sticks banners on them and does a bunch of stuff to make them work like you'd expect).
Even paper archives edit documents (e.g. writing sequence numbers on them, so the ordering doesn't get lost).
Disclosing exactly what account was used to download a particular page is arguably irrelevant information, and may even compromise the work of archiving pages (e.g. if it just opens the account to getting blocked).
The major reason archive.today was being used is that it also bypassed paywalls, and I don't think perma.cc does that normally.
With all of this context shared, the Internet Archive is likely meeting this need without issue, to the best of my knowledge.
[1] https://meta.wikimedia.org/wiki/Wikimedia_Endowment
[2] https://perma.cc/about ("Perma.cc was built by Harvard’s Library Innovation Lab and is backed by the power of libraries. We’re both in the forever business: libraries already look after physical and digital materials — now we can do the same for links.")
[3] https://community.crossref.org/t/how-to-get-doi-for-our-jour...
[4] https://www.crossref.org/fees/#annual-membership-fees
[5] https://www.crossref.org/fees/#content-registration-fees
(no affiliation with any entity in scope for this thread)
If pricing is so much that you have to have a call with the marketing team to get a quote, i think it would be a poor use of WMF funds.
Especially because volume of links and number of users that wikimedia would entail is probably double their entire existing userbase at least.
Ultimately we are mostly talking about a largely static web host. With legal issues being perhaps the biggest concern. It would probably make more sense for WMF to create their own than to become a perma.cc subscriber.
However for the most part, partnering with archive.org seems to be going well and already has some software integration with wikipedia.
https://wikimediaendowment.org/annualreports/2023-2024-annua...
I hope so. Archiving is a legal landmine.
Shortcut is to consume the Wikimedia changelog firehose and make these http requests yourself, performing a CDX lookup request to see if a recent snapshot was already taken before issuing a capture request (to be polite to the capture worker queue).
You can see a text box for it on the right, if you go on the waybackmachine's homepage. I used it yesterday.
Anyone can request anything be removed and they may honor the request: https://help.archive.org/help/how-do-i-request-to-remove-som... they say nothing about only removing things illegal in the US or anything like that, meaning they can and will remove things based on personal judgements about whether it should be archived.
https://www.in.gov/nircc/planning/highway/traffic-data/inter...
and reddit blocks their agent seemingly. It is open source though.
I think ArchiveBox[1] is the most popular. I will give it a shot, but it's a shame they don't support URL rewriting[2], which would be annoying for me. I read a lot of blog and news articles that are split across multiple pages, and it would be nice if that article's "next page" link was a link to the next archived page instead of the original URL.
2: https://github.com/ArchiveBox/ArchiveBox/discussions/1395
Open source. Self hosted or managed. Native iOS and Android apps.
Its Content Scripts feature allows custom JS scripts that transform saved content, which could be used to do URL rewriting.
archive.today is very popular on HN; the opaque, shortened URLs are promoted on HN every day
I can't use archive.today. I tried but gave up. Too many hassles. I might be in the minority but I know I'm not the only one. As it happens. I have not found any site that I cannot access without it
The most important issue with archive.today though is the person running it, their past and present behaviour. It speaks for itself
Whomever it is, they have lot of info about HN users' reading habits given that archive.today URLs are so heavily promoted by HN submitters, commenters and moderators
"Geolocation" as a justication is ambiguous
Why a need for geolocation
Geolocation can be used for multiple purposes
"DNS performance" is only one purpose
Other purposes might offer the user no benefit, and might even be undesirable for users
As a result, some users don't send EDNS subnet. It's always been optional to send it
Even public resolvers, third party DNS services, like Cloudflare, recognise the tradeoffs for users and allow users to avoid sending it. Popular DNS software makes compiling support for EDNS subnet optional
Archive.today wants/needs EDNS subnet so bad it tries to gather it using a tracking pixel or it tries to block users who dont send it, e.g., Cloudflare users
Thus, before one even considers all the other behaviour of this website operator, some of which is mentioned in this thread, there is a huge red flag for anyone who pays attention to EDNS subnet
As with almost all websites repeated DNS lookups are not an absolute requirement for successful HTTP requests
There are some IP addresses for archive.{today,is,md,ph,li,...} that have continued to work for years
Anyone interested in the reading habits of HN users can just take a look at news.ycombinator.com ;)
https://gitflic.ru/project/magnolia1234/bypass-paywalls-fire...
Anyway, extensions are just signed zip files. You can extract them and view the source. BPC sources are not compressed or obfuscated. The extension is evaluated and signed by Mozilla (otherwise it wouldn't install in release-channel Firefox), if you put any stock in that.
http-request set-header user-agent "Mozilla/5.0 (Linux; Android 14) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.103 Mobile Safari/537.36 Lamarr" if { hdr(host) -m end economist.com }
Years ago I used some other workaround that no longer works, maybe something like amp.economist.com. AMP with text-only browser was a useful workaround for many sitesWorkarounds usually don't last forever. Websites change from time to time. This one will stop working at some point
There are some people who for various reasons cannot use archive.today
This unfamiliarity is why I try to use programs that more HN readers are familiar with, like curl or wget, in HN examples. But I find those programs awkward to use. The examples may contain mistakes. I don't use those programs in real life
For making HTTP requests I use own HTTP generators, TCP clients, and local forward proxies
Given the options (a) run a graphical web browser and enable Javascript to solve an archive.today CAPTCHA that contains some fetch() to DDoS a blogger or (b) add a single line to a configuration file and use whatever client I want, no Javascript required, I choose (b)
"promoted" as used here means placing an archive.tld URL at the top of an HN thread so that many HN readers will follow it, or placing these URLs elsewhere in threads
Could you not in theory record the whole TLS transaction? Can it not be replayed later and re-verified?
Up until an old certificate leaks or is broken and you can fake anything "from back when it was valid", I guess.
The technology for doing this is called a Zero Knowledge Proof TLS Oracle:
https://eprint.iacr.org/2024/447.pdf
The 10k-foot view is that you pick the random numbers involved in the TLS handshake in a deterministic way, much like how zk proofs use the Fiat-Shamir transform. In other words, instead of using true randomness, you use some hash of the transcript of the handshake so far (sort of). Since TLS doesn't do client authentication the DH exchange involves randomness from the client.
For all the blockchain haters out there: cryptocurrency is the reason this technology exists. Be thankful.
On that occasion, the target of the attack was a site named northcountrygazette.org, whose owner seems to have never become aware of the attack. The HN commenter noted when they went to the site manually it was incredibly slow, which would suggest the DDoS attempt was effective.
I tried to see if there was anything North Country Gazette had published that the webmaster of archive.today might have taken issue with, and I couldn't find anything in particular. However, the "Gazette" had previously threatened readers with IP logging to prosecute paywall bypassers (https://news.slashdot.org/story/10/10/27/2134236/pay-or-else...), and also blocks archivers in its robots.txt file, indicating it is hostile towards archiving in general.
I can no longer access North Country Gazette, so perhaps it has since gone out of business. I found a few archived posts from its dead website complaining of high server fees. Like the target of this most recent DDoS, June Maxam, the lady behind North Country Gazette, also appears/appeared to be a sleuth.
Ultimately, what we all use it for is pretty straight-forward, and it seems like by now we should've arrived at having approximately one best implementation, which could be used both for personal archiving and for iternet-facing instances (perhaps even distributed). But I don't know if we have.
This would have sounded Very Normal in the 2000s... I wonder if we can go back :)
If a site (or the WAF in front of it) knows what it's doing then you'll never be able to pass as Googlebot, period, because the canonical verification method is a DNS lookup dance which can only succeed if the request came from one of Googlebots dedicated IP addresses. Bingbot is the same.
That's maybe a bit insane to automate at the scale of archive.today, but I figure they do something along the lines of this. It's a perfect imitation of Googlebot because it is literally Googlebot.
The curious part is that they allow web scraping arbitrary pages on demand. So if a publisher could put in a lot of arbitrary requests to archive their own pages and see them all coming from a single account or small subset of accounts.
I hope they haven't been stealing cookies from actual users through a botnet or something.
It would be challenging to do with text, but is certainly doable with images - and articles contain those.
In the archive.today case, it looks pretty automated. Surely just adding an html comment would be sufficient.
At which point we still lack a satisfactory answer to the question. Just how is archive.today reliably bypassing paywalls on short notice? If it's via paid accounts you would expect they would burn accounts at an unsustainable rate.
Why? in the world of web scrapping this is pretty common.
Maybe they use accounts for some special sites. But there is definetly some automated generic magic happening that manages to bypass paywalls of news outlets. Probably something Googlebot related, because those websites usually give Google their news pages without a paywall, probably for SEO reasons.
Surely it wouldn't be too hard to test. Just set up an unlisted dummy paywall site, archive it a few times and see what the requests looks like.
They bypass the rendering issues by "altering" the webpages. It's not uncommon to archive a page, and see nothing because of the paywalls; but then later on, the same page is silently fixed. They have a Tumblr where you can ask them questions; at one point, it's been quite common for everyone to ask them to fix random specific pages, which they did promptly.
Honestly, you cannot archive a modern page, unless you alter it. Yet they're now being attacked under the pretence of "altering" webpages, but that's never been a secret, and it's technologically impossible to archive without altering.
Anything on twitter post-login-wall for one. A million only-semi-paywalled news articles for others. But mainly an unfathomably long tail.
It was extremely distressing when the admin started(?) behaving badly for this reason. That others are starting to react this way to it is understandable. What a stupid tragedy.
https://en.wikipedia.org/wiki/Wikipedia:Archive.today_guidan...
They're basically recommending changing verifiable references that can easily be cross-checked and verified, to "printed on paper" sources that could likely never be verified by any other Wikipedian, and can easily be used to provide a falsification and bias that could go unnoticed for extended periods of time.
Honestly, that's all you need to know about Wikipedia.
The "altered" allegation is also disingenuous. The reason archive.org never works, is precisely because it doesn't alter the pages enough. There's no evidence that archive.today has altered any actual main content they've archived; altering the hidden fields, usernames and paywalls, as well as random presentation elements to make the page look properly, doesn't really count as "altered" in my book, yet that's precisely what the allegation amounts to.
The allegation here is that they altered page content not just to remove their own alias, but to insert the name of the blogger they were targeting. That moves it from a defensible technical change for accessibility to being part of their bizarre revenge campaign against someone who crossed them.
https://archive-is.tumblr.com/post/806832066465497088/ladies...
https://archive-is.tumblr.com/post/807584470961111040/it-see...
BTW, they also alter paywalls and other elements, because otherwise, many websites won't show the main content these days.
It kind of seems like "altered" is the new "hacker" today?
Compare (the changed element is near the very bottom of the page; replace the "[dot]" since these URLs seem to trigger spam filters for some commenters):
archive [dot] is/gFD6Z
megalodon [dot] jp/2026-0219-1628-23/https://archive.is:443/gFD6Z
From hero to a Kremlin troll in five seconds.
Not sure why it would only be on archive.is and not the others but ‘is’ loads for me.
That effort appears to have gone nowhere, so now suddenly archive.today commits reputational suicide? I don't suppose someone could look deeper into this please?
> Regarding the FBI’s request, my understanding is that they were seeking some form of offline action from us — anything from a witness statement (“Yes, this page was saved at such-and-such a time, and no one has accessed or modified it since”) to operational work involving a specific group of users. These users are not necessarily associates of Epstein; among our users who are particularly wary of the FBI, there are also less frequently mentioned groups, such as environmental activists or right-to-repair advocates.
> Since no one was physically present in the United States at that time, however, the matter did not progress further.
> You already know who turned this request into a full-blown panic about “the FBI accusing the archive and preparing to confiscate everything.”
Not sure who he's talking about there.
Hardly possible for Wikimedia to provide a service like archive.today given the legal trouble of the latter.
Strangely naive.
Oh? Do tell!
AT archives the page as seen, even including a screenshot.
IA archives the page as loaded, then when you view hamfistedly injects its header bar and executes the source JS. As you'd expect the result is often wrecked - or tampered.
I personally just don't use websites that paywall important information.
>Oh? Do tell!
They do. In the very next paragraph in fact:
The guidance says editors can remove Archive.today links when the original
source is still online and has identical content; replace the archive link so
it points to a different archive site, like the Internet Archive,
Ghostarchive, or Megalodon; or “change the original source to something that
doesn’t need an archive (e.g., a source that was printed on paper)Hopeless. Caught tampering the archive.
The whole situation is not great.
I did so. You're welcome.
As for the rest, take it up with Jimmy Wiles, not me.
Unfortunately this happens more often than one would expect.
I found this out when I preserved my very first homepage I made as a child on a free hosting service. I archived it on archive.org, and thought it would stay there forever. Then, in 2017 the free host changed the robots.txt, closed all services, and my treasured memory was forever gone from the internet. ;(
Oh good. That's definitely a reasonable thing to do or think.
The raw sociopathy of some people. Getting doxxed isn't good, but this response is unhinged.
We live at a moment where it's trivially easy to frame possession of an unsavory (or even illegal) number on another person's storage media, without that person even realizing (and possibly, with some WebRTC craftiness and social engineering, even get them to pass on the taboo payload to others).
In response to J.P's blog already framed AT as project grown from a carding forum + pushed his speculations onto ArsTechnica, whose parent company just destroyed 12ft and is on to a new victim. The story is full of untold conflicts of interests covered with soap opera around DDoS.
It’s still a threat isn’t it?
The article about FBI subpoena that pulled J.P's speculations out of the closet was also in ArsTechnica and by the same author, and that same article explicitly mentioned how they are happy with 12ft down
--- US publishers have been fighting web services designed to bypass paywalls. In July, the News/Media Alliance said it secured the takedown of paywall-bypass website 12ft.io. “Following the News/Media Alliance’s efforts, the webhost promptly locked 12ft.io on Monday, July 14th,” the group said. (Ars Technica owner Condé Nast is a member of the alliance.) ---
I see WP is not proposing to run its own.
Like Wikipedia?
1) provides a snapshot of another site for archival purposes. 2) provides original content.
You're arguing that since encyclopedias change their content, the Library of Congress should be allowed to change the content of the materials in its stacks.
By modifying its archives, archive.today just flushed its credibility as an archival site. So what is it now?
As an end user of Wikipedia there are occasions where content has been scrubbed and/or edits hidden. Admins can see some of those, but end users cannot (with various justifications, some excellent/reasonable and some.. nebulous). That's all I'm saying, nothing about Congress or such other nonsense. It seems like an occasion of the pot calling the kettle names from this side of the fence.
An archival site (by default definition) promises you that it will not modify its content. And when it does, it's no longer an archival site.
Wikipedia has never been an archival site and it never will be. archive.today was an archival site, but now it never will be again.
Meanwhile their IMA on Reddit: no promises, no commitment. Just like Microsoft EULA :)
https://old.reddit.com/r/DataHoarder/comments/1i277vt/psa_ar...
I'm quoting all of that because is lacks an explicit promise of non-modification /i
Meanwhile seriously, if you were disappointed not to see e.g. "We explicitly don't promise not to modify", then perhaps you should consider why, regardless, this site was trusted enough to get a gazillion links in Wikipedia... and HN.
And I'm quoting all of that because it lacks an explicit (or implicit) promise of modification. :)
It was (emphasis on past-tense) so-trusted because it advertises itself as an archival site. (The linked disclaimer is all about it not being a "long-term" archival site. It says it archives pages for latecomers. There is an implication here that it archives them accurately. What use is a site for latecomers if they change the content to be something else?) If they'd said or indicated they would be changing the content to no longer reflect the original site, Wikipedia would not have linked to them because they wouldn't be a credible source.
In any case, now I can't use them to share or use links since we can no longer trust those archives to be untampered. When I share a link to nyt content on archive.today or copy and paste content into email, I'm putting my name on that declaring "nyt printed this". If that's not true, it's my reputation.
Just like it was archive.today's.
What's your better idea?
Archive.org snapshots may load javascript from external sites, where the original page had loaded them. That script can change anything on the page. Most often, the domain is expired and hijacked by a parking company, so it just replaces the whole page with ads.
Example: https://web.archive.org/web/20140701040026/http://echo.msk.r...
----
And another example: https://web.archive.org/web/20260219005158/https://time.is/
The page "got changed" every second. It is easy to make an archived page which would show different content depending on current time or whether you have Mac or Windows, or your locale, or browser fingerpring, or been tailored for you personally
Isn't there a substantial overlap with the copyright holders?
> Internet archives wayback machine works as alternative to it.
It is appalling insecure. It lets archives be altered by page JS and deleted by the page domain owner.
Nonstarter for anything that you actually want to be preserved, especially anything controversial.
Yes, they are essentional, and that was the main reason for not blacklisting Archive.today. But Archive.today has shown they do not actually provide such a service:
> “If this is true it essentially forces our hand, archive.today would have to go,” another editor replied. “The argument for allowing it has been verifiability, but that of course rests upon the fact the archives are accurate, and the counter to people saying the website cannot be trusted for that has been that there is no record of archived websites themselves being tampered with. If that is no longer the case then the stated reason for the website being reliable for accurate snapshots of sources would no longer be valid.”
How can you trust that the page that Archive.today serves you is an actual archive at this point?
Oh dear.
> How can you trust that the page that Archive.today serves you is an actual archive at this point?
Because no-one shown evidence that it isn't.
Wikipedia does not have a project page with this exact name.
I assume that is weasel words for 404 Not Found.
To https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment...
I read that up to the first "proof", https://web.archive.org/web/20260218135501/https://www.googl...
It lands "503 Service Unavailable No server is available to handle this request."
But this one is not credible either so...
ArsTechica just did the same - removed Nora from older articles. How can you trust ArsTechica after that?
I don't know what you're talking about re: Ars removing her name from old articles.
2. We learned about Nora's involvement from Patokallio. We learned about Nora's non-involvement... also from Patokallio. They could have reached a settlement with AT that includes hiding Nora's name.
3. Regardless of who Nora is, it is interesting to see the extent of this censorship: so far only gyrovague.com and arstechnica.com, but not tomshardware.com and not tech.yahoo.com. This shows which sites are working closely with the AT defamation campaign, and which are simply copywriting the news feed.
If AT is appropriating some random person's name as an alias, it seems helpful to report on that publicly in order to expose the practice and help clear up the misinformation.
One with title 'Archive.today CAPTCHA page executes DDoS; Wikipedia considers banning site'
I'll try to add the link with comment edit:
This has Nora's name https://web.archive.org/web/20260210195502/https://arstechni...
The current version has not
I am lost here. It is definitively an organized defamation campaign.
“You are guilty simply because I am hungry”
And again, the accusation against Archive.today isn't just that they removed their "Nora" alias from a snapshot, but that they replaced it with the name of the blogger they were quarreling with. There's no defensible reason to do that outside of petty revenge (which tracks with the emails and public statements from the Archive.today maintainer).
Oh, yes, by removing the name in the context of "Streisand Effect".
> petty revenge
How does it "revenge"? Was it a porn page? Or something bad?
It is likely to be just a funny placeholder name of the same length to come in mind.
--
We could find good and bad motives for both AT and Ars.
The bias against AT was here apriori. Paywall-story for CondeNast, russophobia for the rest.
The porn smear threats came later, via email.
How does the tech behind archive.today work in detail? Is there any information out there that goes beyond the Google AI search reply or this HN thread [2]?
[1] https://algustionesa.com/the-takedown-campaign-against-archi... [2] https://news.ycombinator.com/item?id=42816427
They also tampered with their archive for a few of the social media sites (Twitter, Instagram, Blogger) by changing the name of the signed in account to Jani Patokallio. https://megalodon.jp/2026-0220-0320-05/https://archive.is:44...
I think Wikipedia made the right decision, you can't trust an archival service for citations if every time the sysop gets in a row they tamper with their database.
I assume it must be a blanket ban on Finnish IPs as there has been comments about it on Reddit and none of my friends can get it to work either. 5 different ISPs were tried. So at the very least it seems to affect majority of Finnish residential connections.
That's awesome. I wish everyone made sure of their facts. Thanks.
Now it's obviously possible that my VPN was whitelisted somehow, or that the GeoIP of it is lying. This is just a singular datapoint.
setInterval(function(){fetch("https://gyrovague.com/tag/"+Math.random().toString(36).subst...",{ referrerPolicy:"no-referrer",mode:"no-cors" });},1400);
https://archive-is.tumblr.com/post/808911640210866176/people...
archive.org also complies with takedown requests, so it's worth asking: could the organised campaign against archive.today have something to do with it preserving content that someone wants removed?
It would be interesting to run the numbers, but I get the feeling that AI generated articles may have a higher LIX number. Authors are then less inclined to "fix" the text, because longer word makes them seem smarter.
There is so much is archived there, to lose it all would be a tragedy.
owner-archive-today . blogspot . com
2 years old, like J.P's first post on AT
They also cannot hijack data with a residential botnet or buy subscriptions themselves. Otherwise, the saved page would contain information about the logged-in user. It would be hard to remove this information, as the code changes all the time, and it would be easy for the website owner to add an invisible element that identifies the user. I suppose they could have different subscriptions and remove everything that isn't identical between the two, but that wouldn't be foolproof.
https://megalodon.jp/2026-0221-0304-51/https://d914s229qk4kj...
https://archive.is/Y7z4E
The second shows volth's Github notifications. Volth was a major nix-pkgs contributor, but his Github account disappeared.
https://github.com/orgs/community/discussions/58164
This particular addon is blocked on most western git servers, but can still be installed from Russian git servers. It includes custom paywall-bypassing code for pretty much every news websites you could reasonably imagine, or at least those sites that use conditional paywalls (paywalls for humans, no paywalls for big search engines). It won't work on sites like Substack that use proper authenticated content pages, but these sorts of pages don't get picked up by archive.today either.
My guess would be that archive.today loads such an addon with its headless browser and thus bypasses paywalls that way. Even if publishers find a way to detect headless browsers, crawlers can also be written to operate with traditional web browsers where lots of anti-paywall addons can be installed.
Thanks for sketching out their approach and for the URI.
https://www.reddit.com/r/Advice/comments/5rbla4/comment/dd5x...
The way I (loosely) understand it, when you archive a page they send your IP in the X-Forwarded-For header. Some paywall operators render that into the page content served up, which then causes it to be visible to anyone who clicks your archived link and Views Source.
I’m guessing by using a residential botnet and using existing credentials by unknowingly ”victims” by automating their browsers.
> Otherwise, the saved page would contain information about the logged-in user.
If you read this article, theres plenty of evidence they are manipulating the scraped data.
But I’m just speculating here…
I guess if they can control a residential botnet more extensively they would be able to do that, but it would still be very difficult to remove login information from the page, the fact that they manipulated the scraped data for totally unrelated reasons a few times proves nothing in my opinion.