By the way, one of my favorite pastimes is to download the latex source for papers on arxiv and read all the commented-out stuff.
% we should make sure this theorem is actually true
A lot of physics journals do. Anything ending in "Letters" (e.g. Applied Physics Letters, Physical Review Letters".
Science has a word limit per article.
Nature doesn't have a hard limit, but if it exceeds 6-8 pages, it needs to be exceptional.
“ArXiv declares independence from Cornell” (science.org)
811 points | 3 months ago | 291 comments
If Google just wanted them to exist and didn't care about profiting off of the search traffic they wouldn't partner with Mozilla.
I submit to open things because I want my material to be openly available. If I wanted restrictions, I would submit to gated journals.
This isn't me siding with AI companies by the way; it's a slippery slope argument.
Sometimes those two are in conflict, such that it will not be possible to satisfy both simultaneously.
It's also good that it doesn't gatekeep with the paywalls that you can pretty much only afford by affiliating yourself with a toll-paying institution.
Obviously, there are plenty of flaws with this system:
1. If you're associated with a brand (e.g., Google, MIT) or have a recognizable co-author (e.g., Yann LeCun), you'll get attention and citations no matter what.
2. "Vouching" can also just mean accepting someone's email request without ever having met or known them.
3. It puts the effort on the readers to decide whether each paper is valuable, and particularly scientifically valuable, for which most readers will be unequipped.
4. "Minimal standards" can be gamed by AI-generated submissions.
I'd love to see a synthesis of arXiv, open-access publishing and artifact reviews, like the following:
- Have a number of reviewers on retainer, or design a reward system similar to bug bounties. The reward mechanism probably shouldn't be based on money or allow a winner-takes-all strategy.
- Have a number of badges with respect to the quality and value of the paper. For example: validated by peers (i.e., reviewed by at least 3 peers with minimum borderline accept consensus), valuable (i.e., reviewed by at least 5 peers with a valuable indicator), etc.
- Allow vouched comments on the platform, and moderate for self-promotion, toxicity, etc. Obviously a big ask.
- Improve the "vouching" system, or add badges like "vouched by X people" or "vouched by established scientist".
Hope their new organization will implement some of these improvements.
- Organise peer feedback - Publish the work - Recognise good work, helping with both discovery and credit
That latter part especially is what allows publishers to charge the ridiculous markup that they do.
But with "modern" technology, feedback and publishing really doesn't require all that infrastructure - email and arXiv can easily be used to self-organise that. So we built a system of recognition that does not block publication, and can be used as a layer on top of arXiv and any other venue, allowing peers to vouch ("endorse") for a work.
I had even proposed and implemented an integration for arXiv Labs that got accepted, but then never merged. I should follow up on that...
You say it as if replication crisis doesn't exist and publish or perish is not a thing.
Removing this (often very basic) peer review doesn't somehow fix the problem. The solution lies in more thorough reviews and replication studies, not in everyone deciding for themselves.
Examples:
“It’s now difficult to prepare for the world three months from now if the median LLM-produced computer science paper is better than that produced by the median grad student.”
https://news.cornell.edu/stories/2026/06/digital-research-re...
the heel turn to unlimited for profit was only possible because of their unique structure and the fact they were already selling commercial products. arxiv is not selling anything so theres no financial incentive to take over.
Despite the imperfections, I found arXiv indispensable for my research. In particular, mathematics has a slow peer review cycle (it's hard to read and understand, and many referees require that they fully understand a paper to accept it, which imo is a little flawed, but that's the culture). I had several papers that were under review for more than a year (single journal, only one round of revisions), and arXiv was my only showcase. Both works ended up very highly cited, but publication delays would have been an even bigger problem if arXiv wasn't there.
But most researchers and grad students (like me) often subscribe to daily mailing list of the papers dropping that day from their particular field. Having a cursory read at the paper titles and then opening the papers further relevant to you is a morning ritual for many.
To view a specific paper, just take original link and change "arxiv" --> "alphaxiv". For example: https://www.alphaxiv.org/abs/1706.03762
https://www.scholar-inbox.com/landing
It is a recommendation system for new papers that come out each day. If you train it a bit by specifying what you like and don't like you'll get a pretty reliable feed.
You can find it here: https://bsky.app/profile/arxiv-daily-bot.bsky.social
[1] https://scirate.com/arxiv/quant-ph
Supposing of course your field roughly matches one of the categories.
I kept it up out of habit for a year after grad school. Then moved on.
In other words, Arxiv is what you use when you want to inform yourself on new research, conferences are for furthering your career by getting closer to your PhD graduation, expand your CV etc. And then to network and mingle with researchers in person and try to get hired.
I really like the idea. In short: arXiv, HAL and similar sites host the papers without any peer review (short of perhaps stopping crank spam) or access control. They're freely available to anyone. Authors then submit arXiv IDs (or similar) to the reviewers of "overlay journals", which then review and accept or not. The overlay journal accepts a paper by just adding it to its list of accepted arXiv identifiers, and that's that.
This ensures accessibility for all, keeps peer review, yet takes a lot of the practical hurdles away from actually running a journal. A journal can now just be a group of people who give thumbs up or down to arXiv identifiers, and if that group's conclusion start having weight in the community then it's become an important journal. Maybe they give away their listings for free, maybe they charge to read the reviews – it's really up to them what the business model (if any) will be.
It's really nice.
Papers “being in” a journal hasn’t made sense for a long time, but curation is valuable as is staking reputation on something.
People I was with called some of this “badges”, there is no reason why a paper cannot be reviewed by a set of people who say “this is new and innovative stuff in the field and highly important if true, but we’re not making claims about the stats” and a different set able to say “the stats here is spot on but we don’t know how relevant it is in biology” and another to say “we can rerun the code and get the same analysis results out, but we don’t know if the analysis is doing anything useful”. Right now we have journals making some combination of claims, and authors have to pick a single journal.
Once you view journals as a list of papers, the exclusivity seems weird. Once you see that journals are then a set of identifiers added to a paper, or rather statements about a paper, there’s lots of interesting ways you can imagine more useful things than current publishing.
It doesn’t need much funding or staff and not quite sure why they’re going through all this rigmarole and independence. I almost think they’d be better off like Apache where there ade very few employees.
My point is that a LaTeX PDF can launder epistemic status. An unreviewed argument starts to look like established research merely because it adopts the visual grammar of a paper.
Its fairly rigid and newcomers often complain that it's too repetitive but if you read such papers for years, you learn to very quickly navigate such a paper that adheres to these conventions and you quickly see if it's something you care about right now or not. Blog posts don't have the same formal structure and it makes the quick skimming and assessment much harder.
My point is it's still useful to have a somewhat authoritative place to cite (high quality) blog post level content. arXiv has formatting requirements and doesn't go down like random personal sites.
> a LaTeX PDF can launder epistemic status
True to a certain extent, although something people are aware of and they can judge the content themselves (hopefully).
Based on how arXiv papers get boosted around on social media, I don't believe this to be the case.
Also, because most folks don't want to deal with paywalls, it's standard practice to put the last version of your draft before conditional acceptance on an online repository. It used to be SSRN for econ/finance, but they sold out to Elsevier, so now arxiv is increasingly being used.
I suggest knowing some people who have written works for peer review and done peer review themselves.
Some people outside academia give peer review quite the undeserved aura.
There's a lot of trash on ArXiv, how much of it is in your diet should depend on your ability to evaluate the quality of research.
arXiv users are the peers doing the review.
"Peer review" has existed for centuries before journals created their own bad for-profit version.