The “small web” is bigger than you might think
523 points by speckx 2 days ago | 216 comments

susam 2 days ago
A little shell function I have in my ~/.zshrc:

  pages() { for _ in {1..5}; do curl -sSw '%header{location}\n' https://indieblog.page/random | sed 's/.utm.*//'; done }
Here is an example output:

  $ pages
  https://alanpearce.eu/post/scriptura/
  https://jmablog.com/post/numberones/
  https://www.closingtags.com/blog/home-networking
  https://www.unsungnovelty.org/gallery/layers/
  https://thoughts.uncountable.uk/now/
On macOS, we can also automatically open the random pages in the default web browser with:

  $ open $(pages)
Another nice place to discover independently maintained personal websites is: https://kagi.com/smallweb
reply
unsungNovelty 18 hours ago
Hey!!!!!

That is my website! To be fair, the hard part is hard to keep a personal website regularly updated without making people think it's abandoned. I don't have a regular post cadence. So it looks like I don't touch the website at all for months. But I regularly update my posts and other sections event if there isn't any new posts.

I also wrote something similar to OP - https://www.unsungnovelty.org/posts/10/2024/life-of-a-blog-b...

And I'd like to also mention https://marginalia-search.com/ which is a small OSS search engine I have been using more and more theese days. I find it great to find IndieWeb / Small Web content.

reply
SyneRyder 3 hours ago
Not sure if this will be considered helpful, but if you include:

<link rel="alternate" type="application/atom+xml" href="https://www.unsungnovelty.org/index.xml" />

in the HEAD of the pages on your website, it makes autodiscovery of the RSS feed a bit easier - not just for crawlers, but also for people with RSS plugins in their browser. It will make the RSS icon appear in their browser's URL field for easy subscription. Took me a while to find the RSS link at the bottom of your pages!

reply
thesuitonym 8 hours ago
For my part, if I come across a personal site that hasn't been updated in a few months, I don't assume it's abandoned, just that the person hasn't had anything to say for a while. I'd rather see a site with updates every few months, or even once or twice a year, than one with an update every other week saying "Sorry I haven't updated."
reply
mikestorrent 3 hours ago
Now this is what makes me feel the Small Web... creators randomly showing up like this on HN. I feel like I used to see this kind of thing more.
reply
sylware 9 hours ago
Sadely this search engine is now javascript only. So the "small" web...
reply
SyneRyder 3 hours ago
If that's an issue, and if you don't mind building something out yourself, Marginalia have an excellent API that you can connect to from your own personal non-Javascript meta-search engine. I did that, and I find Marginalia awesome to deal with. They're one of my favorite internet projects.

(Also, thanks for reminding me that it was time I donated something to the Marginalia project: https://buymeacoffee.com/marginalia.nu )

reply
unsungNovelty 7 hours ago
Couple of things.

1. No. It's not javascript only. https://old-search.marginalia.nu/ is still available. It is also mentioned in https://about.marginalia-search.com/article/redesign/ as gonna be there for a very long time.

2. I don't think just because it uses javascript make it bad. It's a very nice site now. I prefer it better than old version. My website doesn't use JS for any functionality yet. But I've never said never either. The reason hasn't arised that I need to use JS. The day it does, I will use it.

But I understand the sentiment though. I used to be a no js guy before. But I've been softened by the need to use it professionally only to think --- hmmm, not bad.

reply
Noumenon72 13 hours ago
They barely mentioned your website (fourth in five urls, mainly talking about indieblog.com and kagi.com/smallweb), so "That is my website!" is confusing and makes it seem like you're autoresponding to a keyword.
reply
unsungNovelty 12 hours ago
Why should I auto-respond to a keyword? Just curious seeing it here buddy. Breathe easy.
reply
jibal 5 hours ago
Yeah, it didn't confuse sensible people who are capable of putting themselves in someone else's shoes.

I did a Small Web search at Marginalia and was immediately pointed to sites that claim that I and everyone in my political party are literally the spawn of Satan--I really don't think it's my thing.

I helped develop the ARPANET back in 1969-1970 while working for the UCLA Comp Sci dept, got a brief mention in RFC 57, hold several network patents, and was on usenet before the usenix conference where we voted to call it that ... I'm bemused by all the people who claim that boomers are technologically inept (I think they have us mixed up with our parents). Anyway it's been a heck of a wild ride and didn't end up quite how JCR Licklider envisioned it.

reply
mikestorrent 3 hours ago
You sound like someone with stories.... got any I can read?
reply
jibal 3 hours ago
Nothing offhand that I can share. But take a look at https://www.goodreads.com/book/show/281818.Where_Wizards_Sta...
reply
speefers 11 hours ago
[flagged]
reply
mikestorrent 3 hours ago
Get over yourself
reply
ddtaylor 20 hours ago
Anyone curious this is the same for Linux, except use xdg-open like this:

  $ xdg-open $(pages)
reply
sdoering 24 hours ago
This is so lovely. Just adopted it for arch. And set it up, so that I can just type `indy n` (with "n" being any number) and it opens n pages in my browser.

Thanks for sharing.

reply
robgibbons 22 hours ago
`indy 500`
reply
drob518 22 hours ago
And I thought I keep a lot of tabs open...
reply
matheusmoreira 17 hours ago
These curated discovery services require RSS and Atom feeds. My site doesn't even have those. Looks like I'm too small for the small web.
reply
gzread 17 hours ago
Same here, but I'm considering adding it. I already have HTML, so it can't be that hard to add another format. More onerous is needing to write a blog post at least once a week.

And woe betide thee whose website isn't a blog.

reply
thesuitonym 8 hours ago
That's big web thinking. A small website doesn't need to be discoverable. It's supposed to be for you, and if someone else stumbles upon it, and finds it useful or entertaining, that's a bonus.
reply
encom 7 hours ago
I get the sentiment, but it has to discoverable to some extent, otherwise there's no real point in publishing it on a webserver.
reply
oooyay 2 days ago
Caveat that Kagi gates that repo such that it doesn't allow self-submissions so you're only going to see a chunk of websites that other people have submitted that also know about the Kagi repo.
reply
mxuribe 2 days ago
But per the instructions, it seems like that if one wants to add your own website, then one needs to add 2 other small websites (that are not on the list already)...so technically it does open things up to those who are not aware of the repo...assuming their site is pulled in when someone wants to add their own website. Obviously this scale is slow...but i think that's kinda the point, eh? Nevertheless, for every 1 person wanting to add their stuff, 2 others would technically get added i guess.

See: https://github.com/kagisearch/smallweb?tab=readme-ov-file#%E...

reply
skrtskrt 22 hours ago
Yeah I’ve added my own site along with 3 others and the PR was merged in an hour.

Honestly the hard part was that a lot of the sites I wanted to submit were already there!

reply
ramblin_ray 10 hours ago
Thanks for the info!

If anyone wants to join up and add our sites together, here's mine:

https://yesteryearforever.xyz/

reply
dwedge 13 hours ago
That top one only updates once a year. Not saying that as a criticism, just how lucky he was to update recently enough to end up in this top comment
reply
viscousviolin 2 days ago
That's a lovely bit of automation.
reply
deadbabe 21 hours ago
What do you mean automation, he’s not even using any AI agent!
reply
postalcoder 18 hours ago
Multiple layers of curation works really well. Specifically, using HN as a curation layer for kagi's small web list. I implemented this on https://hcker.news. People who have small web blogs should post them on HN, a lot of people follow that list!
reply
nvardakas 12 hours ago
This is great. There's something refreshing about discovering websites by accident instead of through an algorithm optimizing for engagement. I've been spending more time on personal blogs lately and the writing quality is consistently better than anything on Medium or Substack probably because nobody's writing for SEO.
reply
varun_ch 2 days ago
A fun trend on the "small web" is the use of 88x31 badges that link to friends websites or in webrings. I have a few on my website, and you can browse a ton of small web websites that way.

https://varun.ch (at the bottom of the page)

There's also a couple directories/network graphs https://matdoes.dev/buttons https://eightyeightthirty.one/

reply
101008 2 days ago
A beautiful trend that has been going for 30 years ;-)

One of the happiest moments of my childhood (I'm exagerating) was when my button was placed in that website that I loved to visit everyday. It was one of the best validations I ever received :)

reply
skciva 24 hours ago
What inspired me to pursue computer related fields was making little badges and forum signatures in Photoshop as a teen. Heartwarming to see this tradition has persisted
reply
Terr_ 24 hours ago
I can't be the only one with an ancient collection of artistically-mismatched "under construction" graphics.
reply
FinnKuhn 22 hours ago
Not mine, but I don't think you are: http://www.textfiles.com/underconstruction/
reply
wkjagt 12 hours ago
Oh boy, I do miss the days of things being under construction. It meant people were actually constructing things.
reply
lagniappe 22 hours ago
My favorite is the 'AOL Sucks!'
reply
technothrasher 11 hours ago
My similar happy childhood moment was when my home page made the Netscape "Rants and Raves" page for my extensive tribute to Lindsey Wagner (the actress who played the Bionic Woman), and that leading to my local newspaper interviewing me for an article on what the heck this "World Wide Web" was. I went on and on about how the web was revolutionary as an equalizer, allowing anybody to publish and actually be heard without the old barriers to entry. Sounded good, but the web hasn't exactly fully lived up to my vision.
reply
Terretta 10 hours ago
Pamphleteering has a storied tradition. Self-publishing remains accessible today.

What confuses me are the reflexive "why would I publish if I'm not getting the ad revenue" and "why would anyone take their time w/o getting paid" type remarks.

Same comments about music: nobody will record songs without getting paid. And games: what's even the point in playing a shooter without dropping loot?

The last one encapsulates the whole problem well.

Over on /r/division2 a majority of players are baffled by a one month only "Realism" mode (all March, worth trying!) that turns off loot boxes and loot drops from tangos. You can solo or co-op the Division 2 Warlords of New York expansion, set in Manhattan, receiving a couple additional base weapons and weapon mods each mission completed. It's refreshing to enjoy beating scenarios while liberated from opening every scrap pile on the street then sorting through inventory for hours.

Gamers on reddit seem universally convinced the gameplay loop for a tactical PvE shooter should be about getting the next loot, rather than executing a mission cleanly or enjoying a strategically cooperative evening with friends defeating a zip code and its boss.

"I won't play a game that's not rewarding." "I won't write a song that doesn't make me a millionaire." "I won't capture my thoughts on a subject unless I get $0.003 an eyeball."

Somewhere we lost just enjoying the play.

reply
kalavan 8 hours ago
Sounds like classic crowding-out of intrinsic motivation.

There's a story, I can't find the page at the moment, of someone who was getting pranked all the time (his house TPed or egged or something). So he offered the miscreants $1 to do it tomorrow. He kept on doing it like this, and then a few days later, he offered a quarter. By the time he had got down to a dime, they said "there's no way we're going to do it for such a measly sum" and left.

Better sourced examples also exist: fewer citizens supported a decision to build a nuclear waste repository in Switzerland enjoyed more support if they would be offered compensation: https://www.bsfrey.ch/wp-content/uploads/2021/08/crowding-ef... p. 96 (sixth page of the PDF).

reply
cindyllm 8 hours ago
[dead]
reply
101008 7 hours ago
> What confuses me are the reflexive "why would I publish if I'm not getting the ad revenue" and "why would anyone take their time w/o getting paid" type remarks.

I published free content during the 90s and early 2000s in the internet, so I lived through that moment when you write something just for the pleasure of it. What I think it changed is that back at the time, it was you and your keyboard and that was your gun. The best content (that is, the best idea+writing) won. People would share in forums, MSN, emails, with friends, etc. It was more democratic in the sense that we were all equal.

Today that doesn't work anymore. You can write a very good piece but no one will discover it because the behaviour has changed. You probably will have to invest in ads, or being someone already known in the topic, etc. And I am talking before AI, with all the AI noise/slop/content, it's impossible today. So if I am going to fight against big media who are also writing shitting content about the same topics, or Instagram influencers who are posting silly memes, and I need to invest money, may as well try to earn something back.

PS: I may write an article about it.

reply
NooneAtAll3 24 hours ago
my main problem with such links is... how often do you update them? how often do you check those websites to see that they're still active?

I remember going through all the blogs linked on terry tao's blog - out of like 50 there were only 8-ish still alive :(

reply
susam 22 hours ago
I don't use 88x31 buttons but I do maintain an old-fashioned blogroll on my personal website: https://susam.net/roll.html

I follow the same set of websites with my feed reader too. There is an OPML file at the end of that page that I use with my feed reader. I keep the list intentionally small so that I can realistically read every post that appears on these websites on a regular basis.

Although I usually read new posts in my feed reader, I still visit each website on the list at least, roughly, once a month, just to see these personal sites in their full glory. These are blogs I have been following for years, in fact some of them for a couple of decades now! So when a new post appears on one of these websites, I make time to read it. It is one of the joys of browsing the Web that I have cherished ever since I first got online on the information superhighway.

Keeping the list small also makes it easy for me to notice when a website goes defunct. Over all these years a few websites did indeed sadly disappear, which I then removed from my list.

reply
bung 17 hours ago
gotta be using pixel fonts for those, they been around for 25 years, actually readable at 8px lol
reply
dudefeliciano 11 hours ago
but why link to apple.com and vercel.com?
reply
8organicbits 2 days ago
One objection I have to the kagi smallweb approach is the avoidance of infrequently updated sites. Some of my favorite blogs post very rarely; but when they post it's a great read. When I discover a great new blog that hasn't been updated in years I'm excited to add it to my feed reader, because it's a really good signal that when they publish again it will be worth reading.
reply
pixodaros 22 hours ago
One of the many things I disagree with Scott Alexander on is that to me, frequent blog updates signal poor quality not excellent writing. Its hard to come up with an independent, evidence-based opinion on something worth sharing every week, but easy to post about what you read lots of angry or scary posts about. People who post a lot also tend to have trouble finding useful things to do in their offline life. It is very unusual that he managed to be both a psychiatrist and a prolific blogger and he quit the psychiatry job before he had children or other care responsibilities.
reply
chrneu 16 hours ago
I have a "frequent post" section of my blog and a "deeper" section. Unless you're interested in the frequent posts they aren't in your face on my blog. It's kind of a best of both worlds type thing.

The frequent posts also let me quickly try out new methods of telling stories or presenting information or new techniques. I think this tends to speed up how often I post larger effort things cuz I can practice skills with frequent posts.

A good comparison would be a youtuber with a patreon. The youtube gets the produced media, whereas the patreon gets "cell phone in the moment" updates.

but i totally agree that when folks are finding things to post about that can be problematic and annoying.

reply
baud147258 8 hours ago
> frequent blog updates signal poor quality not excellent writing

it might be true, but there are exceptions, like acoup (history-focused), which is written by ancient history professor.

reply
userbinator 15 hours ago
[dead]
reply
oopsiremembered 2 days ago
I'm with you. Also, sometimes I'm specifically looking for some dusty old site that has long been forgotten about. Maybe I'm trying to find something I remember from ages ago. Or maybe I'm trying to deeply research something.

There's a lot more to fixing search than prioritizing recency. In fact, I think recency bias sometimes makes search worse.

reply
freediver 2 days ago
To clarify criteria is less than 2 years since last blog post.
reply
senko 10 hours ago
You may want to clarify that on https://github.com/kagisearch/smallweb because the README there says:

> Blog has recent posts (<7 days old)

This may be different than inclusion criteria for websites in general, but on first read it looks like it has to be very active.

I might have missed something while skimming it, but would assume others would miss it as well.

reply
SyneRyder 3 hours ago
There's two criteria, I agree it's hard to skim:

* The blog must have a recent post, no older than 12 months, to meet the recency criteria for inclusion.

* Criteria for posts to show on the website: Blog has recent posts (<7 days old), The website can appear in an iframe

The latter criteria is for the website / post to appear in Kagi's random Small Web feature, where they display the blog post in an iframe. (So I think only posts from the last week are displayed there.) Being on the list should ensure that any new posts could be displayed in Small Web though, and presumably that the website is indexed in Kagi's Teclis index as well. At least, I really hope that the Teclis index is including all of those old blog posts too, and not discarding them.

EDIT: I just realized freediver actually is Vladimir - I'd love to know if Teclis does index all those older blog posts too. I assume it does index everything that is still present in the RSS feeds?

reply
senko 2 hours ago
Thank you. I swear I read that three times and missed the other criteria until you pointed it out and then I found it. :/
reply
est 23 hours ago
also kagi exclude non-English sites. Sad for mixed language blogs like mine.
reply
8organicbits 9 hours ago
I know kagi doesn't do it, but it is possible to specify language in the feed (xml:lang) such that a feed reader can filter languages the user doesn't understand out of multi-language feeds. One challenge is that lots of bloggers forget to add that tag.
reply
gzread 17 hours ago
start a small web directory for your language!
reply
freediver 2 days ago
Kagi Small Web has about 32K sites and I'd like to think that we have captured most of (english speaking) personal blogs out there (we are adding about 10 per day and a significant effort went into discovering/fidning them).

It is kind of sad that the entire size of this small web is only 30k sites these days.

reply
flir 2 days ago
Suspect there's a long tail/iceberg you still haven't captured (source: you haven't found me yet and I'm not hiding, I'm just not chasing SEO).
reply
eichin 14 hours ago
Same - but mine are also primarily so I can hand out links to specific articles - they're not hidden but they're not advertised either (and they're static sites with almost zero logging, so I wouldn't really notice either except that this site has a published list :-)
reply
freediver 2 days ago
I am happy to hear this.
reply
jopsen 24 hours ago
> I'd like to think that we have captured most of (english speaking) personal blogs

I think that's naive.

But maybe thats just because my blog wasn't on the list :)

reply
boxedemp 5 hours ago
Neither was either of mine, but I don't advertise them and specifically don't post them on social media
reply
krapp 24 hours ago
Neither is mine, But that's fine with me.
reply
freediver 17 hours ago
That is about to change :)
reply
Melatonic 14 hours ago
Hell yeah !
reply
aquova 2 days ago
What methods are you using to find them? I notice my own doesn't appear, although it does show up well under some (very niche) Google search terms. I suspect there's the potential for an order of magnitude more sites than have been found.
reply
freediver 22 hours ago
Checking HN every day to see if something interesting surfaces :)
reply
famahar 14 hours ago
I noticed that Kagi Small Web tends to lean towards more tech focused blogs. So it feels more like you've captured that subset of the small web, especially if your main source is hackernews.

Not sure if you've used this as a source too but there's a lot of tiny personal sites in this directory too. https://melonland.net/surf-club

reply
savolai 2 days ago
Does this use frames or iframe? https://kagi.com/smallweb

I would expect a raw link in the top bar to the page shown, to be able to bookmark it etc.

reply
susam 2 days ago
There is a '↗'-shaped icon in the navigation bar at the top. If you click on that it takes you to the original post in a new tab. On Firefox and Safari, you can also right click that icon and add the original post to the bookmarks.
reply
savolai 5 hours ago
Not visible on iphone xs/13 mini.
reply
gzread 17 hours ago
FYI frames don't exist any more. They're not supported by browsers.
reply
savolai 5 hours ago
reply
zahlman 2 days ago
Does this concept of "personal blog" include people periodically sharing, say, random knowledge on technical topics? Or is it specifically people writing about their day-to-day lives?

How would I check if my site is included?

reply
susam 24 hours ago
You can check: <https://github.com/kagisearch/smallweb/blob/main/smallweb.tx...>. I can see that your RSS URL is listed there.

But it currently does not appear in the search results here: <https://kagi.com/smallweb/?search=zahlman>. The reason appears to be this:

"If the blog is included in small web feed list (which means it has content in English, it is informational/educational by nature and it is not trying to sell anything) we check for these two things to show it on the site: • Blog has recent posts (<7 days old) [...]"

(Source: https://github.com/kagisearch/smallweb#criteria-for-posts-to...)

reply
mattlondon 22 hours ago
Why would you only include blogs in your small web index? That must be a minute fraction of what is out there?

I can't think of a single blog that I read these days (small or not), yet there are loads of small "old school" sites out there that are still going strong.

reply
susam 21 hours ago
> Why would you only include blogs in your small web index?

I am not associated with this project, so this would be a question for the project maintainer. As far as I understand, the project relies on RSS/Atom feeds to fetch new posts and display them in the search results. I believe, this is an easier problem to solve than using a full blown web crawler.

However, as far as I know, Kagi does have its own full blown crawler, so I am not entirely sure why they could not use it to present the Small Web search results. Perhaps they rely on date metadata in RSS feeds to determine whether a post was published within the last seven days? But having worked on an open source web crawler myself, many years ago, I know that this is something a web crawler can determine too if it is crawling frequently enough.

So yes, I think you have got a good point and only the project maintainer can provide a definitive answer.

reply
gzread 17 hours ago
I think it includes anything that's in the form of a chronological list of posts and noncommercial.

If you made a website instead of a blog, well... you're excluded. It's the small blogosphere, not the small web

reply
Cyan488 2 days ago
I'm noticing sites that break the rules. I report (flag) them, is that useful or should I just PR to remove them?
reply
freediver 2 days ago
PR is better!
reply
gzread 17 hours ago
It doesn't have mine, because no rss
reply
pil0u 2 days ago
[dead]
reply
Peteragain 14 hours ago
I'm very keen on public libraries. I'm fortunate in that our village has a community run one, there is the county one, and I can get to The British Library. Why do these entities exist? A real question - not rhetorical. Whatever the answer, I am sure the same mechanism could "pay for" public hosting.
reply
pipeline_peak 4 hours ago
You want the government to fund the small web?
reply
zeusdclxvi 13 hours ago
Are you asking why public libraries exist?
reply
wink 13 hours ago
I don't want to be part of the "small web" - I want to be part of the web. If my stuff can't be found in a sea of a million ad-ridden whatever sites so be it, but I am not going out of my way to submit stuff to special search engines or web rings, I've been there in the 90s.
reply
DeathArrow 13 hours ago
My point also: I rather see the existing web transformed than being part of some obscure circles that not many people care about.
reply
plewd 12 hours ago
I doubt the web will allow itself to be transformed into our idealized version of it, so the question seems to just be: do you want to be part of the obscure circle or not?

Neither choice is right or wrong, but I like the idea of a cool community amidst the enshittification of the rest of the web.

reply
danhite 24 hours ago
Isn't this a simple compute opportunity? ...

> March 15 there were 1,251 updates [from feed of small websites ...] too active, to publish all the updates on a single page, even for just one day. Well, I could publish them, but nobody has time to read them all.

if the reader accumulates a small set of whitelist keywords, perhaps selected via optionally generating a tag cloud ui, then that est. 1,251 likely drops to ~ single page (most days)

if you wish to serve that as noscript it would suffice to partition in/visible content eg by <section class="keywords ..." and let the user apply css (or script by extension or bookmarklet/s) to reveal just their locally known interests

reply
8organicbits 19 hours ago
The tag cloud part may be a challenge. Web feeds don't always tag their content.

I have a blog filter that does something similar (https://alexsci.com/rss-blogroll-network/discover/), but the UI I ended up with isn't great and too many things are uncategorized.

reply
danhite 15 hours ago
Kudos on your site effort and I immediately see your point.

In fact I took your topmost entry with no helpful site/update tags and dove in a little to try to understand why a RSS friendly blogger might not be passing along ~ tags for better reader discovery.

Turns out my scarce info test case blogger has a mastodon that immediately lists all these tags about himself [I've stripped it down] ...

#FrontEnd Developer #CSS #Halifax #London #Singapore Technical writer and rabbit-hole deep-diver Former Organiser for https://londonwebstandards.org & https://stateofthebrowser.com Interests: #Bushcraft #Outdoors #DnD #Fantasy #SciFi #HipHop #CSS #Eleventy #IndieWeb #OpenSource #OpenWeb

I conclude if he knew such site and post tags getting to RSS would be of use, he'd probably make the tiny effort to wire the descriptions.

Nonetheless I merely crawled links for a minute to found this info, so I imagine something like the free tier of the Cloudflare crawling api might suffice over time for a simplistic automated fix to hint decorate blog sites.

I mean, given that we're not trying to recreate pagerank, but just trying to tip the balance in favor of desirable initial discovery.

reply
8organicbits 10 hours ago
Very cool.

Crawling related sites for tags could work (open graph tags on the website are another good source). I'm wary of mixing data across contexts though. A blog and a Mastodon profile may intend to present a different face to the world or could discuss different topics.

reply
afisxisto 2 days ago
Cool to see Gemini mentioned here. A few years back I created Station, Gemini's first "social network" of sorts, still running today: https://martinrue.com/station
reply
627467 2 days ago
I read alot against monetization in the comments. I think because we are used monetization being so exploitative, filled with dark patterns and bad incentives on the Big Web.

But it doesnt need to be thia way: small web can also be about sustainable monetization. In fact there's a whole page on that on https://indieweb.org/business-models

There's nothing wrong with "publishers" aspiring to get paid.

reply
ardeaver 20 hours ago
I also think equating good = "no monetization" is exactly how we've ended up in a situation where everything is controlled by a few giant mega corps, hordes of MBAs, and unethical ad networks.

We should want indie developers, writers, etc to make money so that the only game in town doesn't end up being those who didn't care about being ethical. </rant>

reply
UqWBcuFx6NV4r 22 hours ago
Yep. People have very short memories. I remember that ethical ad network in the late 2000s that all the cool tech bloggers would use.
reply
shermantanktop 2 days ago
This is a specific definition of "small web" which is even narrower than the one I normally think of. But reading about Gemini, it does make me wonder if the original sin is client-side dynamism.

We could say: that's Javascript. But some Javascript operates only on the DOM. It's really XHR/fetch and friends that are the problem.

We could say: CSS is ok. But CSS can fetch remote resources and if JS isn't there, I wonder how long it would take for ad vendors to have CSS-only solutions...or maybe they do already?

reply
AdamN 12 hours ago
I would put it all on cookies. No third party cookies (at all) - good. JS and CSS and even autoplay video is fine as long as there are no third party cookies.

That would make the Small Web bigger but it would get to the main point. I'd be fine with a site like the New Yorker that has more bells and whistles be included as long as I could experience it without a tracked ad from DoubleClick.

Right now any serious outfit simply cannot be included in the Small Web but we really need companies there.

reply
fbilhaut 8 hours ago
Totally agree. I run a few professional websites/apps that deliberately avoid tracking technologies. They only use first-party session cookies and minimal server logs for operational purposes.

Interestingly, I’ve noticed that some users find this suspicious because there's no cookie banner ! People may have become so used to seeing them that a site without one can look dubious or unprofessional. And I'm pretty sure some maintainers include them just to conform with common practice or due to legal uncertainty.

Maybe a simple, community-driven, public declaration might help. Something like a "No-Tracking Web Declaration". It could be a short document describing fair practices that websites could reference, such as "only first-party session cookies", "server logs used only for operational purposes", etc.

A website could then display a small statement such as "This site follows the No-Tracking Web Declaration v1.0". This might help legitimate the approach, and give visitors and operators confidence that avoiding usual bells and whistles can actually be compliant with applicable regulations.

I (and AI) drafted something here, contributions would be highly welcomed: https://github.com/fbilhaut/no-tracking

reply
akkartik 2 days ago
Yeah, CSS is Turing Complete: https://lyra.horse/x86css
reply
zahlman 2 days ago
I wonder: what's the least that could be removed from CSS to avoid Turing-completeness?
reply
gzread 17 hours ago
Most of the problem with ads isn't even the ads these days, but the bloat. Static image ads would be a huge improvement.
reply
mattlondon 22 hours ago
You need to go more tin-foil-hat

Its not just JavaScript, it's cookies, it's "auto loading" resources (e.g. 1x1 pixels with per-request unique URLs), it's third-party http requests to other domains (which might art cookies too).

I think the XKCD comic about encryption-vs-wrench has never been more apt for Gemini the protocol...

reply
upboundspiral 2 days ago
I think the article briefly touches on an important part: people still write blogs, but they are buried by Google that now optimizes their algorithm for monetization and not usefulness.

Anyone interested in seeing what the web when the search engines selects for real people and not SEO optimized slop should check out https://marginalia-search.com .

It's a search engine with the goal of finding exactly that - blogs, writings, all by real people. I am always fascinated by what it unearths when using it, and it really is a breath of fresh air.

It's currently funded by NLNet (temporarily) and the project's scope is really promising. It's one of those projects that I really hope succeeds long term.

The old web is not dead, just buried, and it can be unearthed. In my opinion an independent non monetized search engine is a public good as valuable as the internet archive.

So far as I know marginalia is the only project that instead of just taking google's index and massaging it a bit (like all the other search engines) is truly seeking to be independent and practical in its scope and goals.

reply
marginalia_nu 2 days ago
Thanks for shilling.

Regarding the financials, even though the second nlnet grant runs out in a few weeks, I've got enough of a war chest to work full time probably a good bit into 2029 (modulo additional inflation shocks). The operational bit is self-funding now, and it's relatively low maintenance, so if worse comes to worst I'll have to get a job (if jobs still exist in 2029, otherwise I guess I'll live in the shameful cardboard box of those who were NGMI ;-).

reply
boxedemp 2 days ago
I think that's a cool project, though I found the results to be less relevant than Google.
reply
janalsncm 2 days ago
Whether the results are less relevant or not depends massively on what you searched and whether the best results even exist in the Marginalia search index or not.

If Google is ranking small web results better than Marginalia, that’s actionable.

If the best result isn’t in the index and it should be, that’s actionable.

reply
marginalia_nu 2 days ago
Well to be fair, Marginalia is also developed by 1 guy (me), and Google has like 10K people and infinite compute they can throw at the problem. There has been definite improvements, and will be more improvements still, but Google's still got hands.
reply
janalsncm 2 days ago
Hey Marginalia, cheers. Imo fewer hands can also be an advantage.

There are no PMs breathing down your neck to inject more ads in the search results, you don’t depend on any broken internal bespoke tools that you can’t fix yourself, and you don’t need anybody’s permission to deploy a new ranking strategy if you want to.

reply
gzread 17 hours ago
I've used Marginalia to search for technical documentation before, unironically. Whatever it does find is pretty much guaranteed to be non-slop.
reply
lich_king 2 days ago
> Google that now optimizes their algorithm for monetization and not usefulness.

I don't think they do that. Instead, "usefulness" is mostly synonymous with commercial intent: searching for <x> often means "I want to buy <x>".

Even for non-commercial queries, I think the sad reality is that most people subconsciously prefer LLM-generated or content-farmed stuff too. It looks more professional, has nice images (never mind that they're stock photos or AI-generated), etc. Your average student looking for an explanation of why the sky is blue is more interested in a TikTok-style short than some white-on-black or black-on-gray webpage that gives them 1990s vibes.

TL;DR: I think that Google gives the average person exactly the results they want. It might be not what a small minority on HN wants.

reply
marginalia_nu 2 days ago
Google and most search engines optimize for what is most likely to be clicked on. This works poorly and creates a huge popularity bias at scale because it starts feeding on its own tail: What major search engines show you is after all a large contributor to what's most likely to be clicked on.

The reason Marginalia (for some queries) feels like it shows such refreshing results is that it simply does not take popularity into account.

reply
BrenBarn 2 days ago
> I think that Google gives the average person exactly the results they want.

There is some truth in this, but to me it's similar to saying that a drug dealer gives their customers exactly what they want. People "want" those things because Google and its ilk have conditioned them to want those things.

reply
sdenton4 2 days ago
On the one hand, a search engine is not heroin... It's a pretty broken analogy.

On the other hand, we could probably convince Cory Doctorow to write a piece about how fentanyl is really about the enshitification of opiates.

reply
GuB-42 2 days ago
I don't expect many people to agree but I think that the "small web" should reject encryption, which is the opposite direction that Gemini is taking.

I don't deny the importance of encryption, it is really what shaped the modern web, allowing for secure payment, private transfer of personal information, etc... See where I am getting at?

Removing encryption means that you can't reasonably do financial transactions, accounts and access restriction, exchange of private information, etc... You only share what you want to share publicly, with no restrictions. It seriously limits commercial potential which is the point.

It also helps technically. If you want to make a tiny web server, like on a microcontroller, encryption is the hardest part. In addition, TLS comes with expiring certificates, requiring regular maintenance, you can't just have your server and leave it alone for years, still working. It can also bring back simple caching proxies, great for poor connectivity.

Two problems remain with the lack of encryption, first is authenticity. Anyone can man-in-the-middle and change the web page, TLS prevents that. But what I think is an even better solution is to do it at the content level: sign the content, like a GPG signature, not the server, this way you can guarantee the authenticity of the content, no matter where you are getting it from.

The other thing is the usual argument about oppressive governments, etc... Well, if want to protect yourself, TLS won't save you, you will be given away by your IP address, they may not see exactly what you are looking at, but the simple fact you are connecting to a server containing sensitive data may be evidence enough. Protecting your identity is what networks like TOR are for, and you can hide a plain text server behind the TOR network, which would act as the privacy layer.

reply
marginalia_nu 2 days ago
Big thing that made encryption required is arguably that ISPs started injecting crap into webpages.

Governments can still track you with little issue since SNI is unencrypted. It's also very likely that Cloudflare and the like are sharing what they see as they MITM 80% of your connections.

reply
jopsen 24 hours ago
> It's also very likely that Cloudflare and the like are sharing what they see as they MITM 80% of your connections.

Maybe, I suspect not, but even so if we reduce the number of men in the middle that's pretty nice.

reply
marginalia_nu 24 hours ago
Between what Snowden told us, and the CLOUD Act, it seems quite likely.
reply
throw5 2 days ago
> But what I think is an even better solution is to do it at the content level: sign the content, like a GPG signature

How would this work in reality? With the current state of browsers this is not possible because the ISP can still insert their content into the page and the browser will still load it with the modified content that does not match the signature. Nothing forces the GPG signature verification with current tech.

If you mean that browsers need to be updated to verify GPG signature, I'm not sure how realistic that is. Browsers cannot verify the GPG signature and vouch for it until you solve the problem of key revocation and key expiry. If you try to solve key revocation and key expiry, you are back to the same problems that certificates have.

reply
axblount 2 days ago
Signatures do have similar problems to certificates. But Gemini doesn't avoid them either and often recommends TOFU certificates. I think the comment's point was that digital signatures ensure identity but are unsuitable for e-commerce, a leading source of enshittification.
reply
interroboink 2 days ago
> you are back to the same problems that certificates have.

Some of the same problems. One nice thing about verifying content rather than using an SSL connection is that plain-old HTTP caching works again.

That aside, another benefit of less-centralized and more-fine-grained trust mechanisms would be that a person can decide, on a case-by-case basis what entities should be trusted/revoked/etc rather than these root CAs that entail huge swaths of the internet. Admittedly, most people would just use "whatever's the default," which would not behave that differently from what we have now. But it would open the door to more ergonomic fine-grained decision-making for those who wish to use it.

reply
einr 14 hours ago
I've been thinking the same thing for years -- thank you for saying it. I agree completely.

Another pro is that no encryption means super low power microcontrollers and retrocomputers can browse freely. The system req's go down by orders of magnitude. I think enforcing TLS in the Gemini protocol was a huge mistake; there are so many retrocomputing enthusiasts that would love to browse Geminispace on their Amigas and 486s -- it might actually have been a significant part of the userbase -- but they're locked out because their CPUs simply cannot reasonably handle modern TLS.

reply
xantronix 24 hours ago
I have noticed that when I encounter an HTTP-only web site, I know I am in for a pleasant, calm, well-curated experience, and I mean that without a hint of irony.

I don't have a lot to say about the technical discussion here, other than "TLS null cipher could be fine but also a lot more infrastructure than desirable", which could subvert your intent here.

Maybe we should normalise TOR usage before it becomes a surefire signal to the FBI to raid one's home.

reply
adiabatichottub 2 days ago
> It also helps technically. If you want to make a tiny web server, like on a microcontroller, encryption is the hardest part.

> Two problems remain with the lack of encryption, first is authenticity. Anyone can man-in-the-middle and change the web page, TLS prevents that. But what I think is an even better solution is to do it at the content level: sign the content, like a GPG signature, not the server, this way you can guarantee the authenticity of the content, no matter where you are getting it from.

If your microcontroller can't do TLS then it probably won't do GPG either. But you can still serve HTTP content on port 80 if you need to support plaintext. I believe a lot of package distribution is still over HTTP.

Edit: Sorry, missed the web server part somehow and was thinking of a microcontroller based client.

> In addition, TLS comes with expiring certificates, requiring regular maintenance, you can't just have your server and leave it alone for years, still working. It can also bring back simple caching proxies, great for poor connectivity.

Yeah, TLS and DNS are the two of the biggest hurdles to a completely distributed Internet. Of course you go down that road and you get IPFS, which sounds cool to me, but doesn't seem to have ever taken off.

reply
zzo38computer 2 days ago
> If your microcontroller can't do TLS then it probably won't do GPG either.

It is not a problem if you are only serving static files.

reply
adiabatichottub 2 days ago
I guess I was thinking microcontroller as client, so yes I agree
reply
honeycrispy 2 days ago
Anyone between you and the server can change the content of the page on unencrypted connections. I would love to live in a world where encryption is unnecessary, but unfortunately that world does not exist right now.
reply
bigstrat2003 17 hours ago
We do live in that world. Encryption is not at all necessary for the majority of web sites out there. The practice you speak of is not commonplace and does not need active defending against.
reply
gzread 17 hours ago
ICE is tracking people on social media, and the only sign it's happening to you is when they show up to your door with guns and handcuffs. If they could track who was reading subversive content, they would. It's better we don't give away more information than necessary.
reply
nomdep 9 hours ago
[dead]
reply
cristoperb 2 days ago
You could do signatures/MAC without encryption to guarantee that the message was not modified
reply
UqWBcuFx6NV4r 22 hours ago
Okay! Cool! Let’s just (nonsensically) develop a new protocol for this small group of nerds’ fetish for ‘retro tech’. A protocol that nobody will use.

You do realise that “is it technically possible?” Is like 1% of the question in computing, at most, yes? HTTP and HTTPS are what we’ve got.

reply
mattlondon 22 hours ago
This whole post is about Gemini the protocol, a new protocol for a small group of nerds' fetish for retro tech (it's basically modern gopher).
reply
gzread 17 hours ago
The internet has become more hostile since its early days, in multiple ways and one of those ways is that networks spy on you more. They used to inject ads into the content, but that stopped being profitable when the majority went to HTTPS. If they could do it again on a large scale they would. The NSA is also saving all the data they can. It's important to hide the content, now we have first-world countries using information about what you read to make life-or-death decisions. Encryption should now be seen as necessary.
reply
swiftcoder 24 hours ago
> If you want to make a tiny web server, like on a microcontroller, encryption is the hardest part

Even an esp32 can (just) handle TLS. Given relatively modern designs, you end up on remarkably small chips before TLS is a real blocker

reply
wibbily 22 hours ago
Seconded. I got a Gemini server running on an esp8266 once, there really isn't a lower bound.
reply
jimbokun 23 hours ago
I think a simpler argument would be that small web is not a good fit if your content is sensitive in the place you are publishing from. It’s meant for public publishing. If you need encryption, use a different distribution mechanism.
reply
UqWBcuFx6NV4r 23 hours ago
That is not the only protection that HTTPS offers. US ISPs used to inject ads into HTML HTTP responses.

Can all this performative love for unencrypted HTTP just die already. You’ve all forgotten what it was actually like, and what the drawbacks actually are. This is so tiring.

reply
gzread 17 hours ago
The act of reading can also be sensitive.
reply
UqWBcuFx6NV4r 22 hours ago
Your entire comment is paragraphs of grasping at straws.

> The other thing is the usual argument about oppressive governments, etc... Well, if want to protect yourself, TLS won't save you, you will be given away by your IP address, they may not see exactly what you are looking at, but the simple fact you are connecting to a server containing sensitive data may be evidence enough. Protecting your identity is what networks like TOR are for, and you can hide a plain text server behind the TOR network, which would act as the privacy layer.

A huge ‘citation needed’ for this whole paragraph. Just admit that you don’t care about this use case and move on. Don’t present a contrived and completely justified hypothetical where oppressive governments behave exactly in a way that happens to mean that there’s only room for the technologies that you personally are into.

You’ve completely departed from reality. It’s not 2004 anymore.

reply
GuB-42 4 hours ago
Honestly, yeah, I don't care about this use case.

I just mentioned that because I expected someone to say "but privacy...", because privacy and encryption go hand in hand. And my argument is that the encryption we usually think of in the context of the web is TLS, and it is not a good fit in that context.

The goal here is to publish information for everyone to see, it is not secret messaging, what you may want to protect is your identity. There are networks especially designed for this, and you are better off using these, but if you are not, then I believe that accessing a HTTP website through an anonymizing proxy (like TOR) is better at protecting your identity than relying on the TLS layer of HTTPS or Gemini.

reply
iamnothere 22 hours ago
I think the small web ought to just serve everything over Tor (onion sites). No domain needed, no worries about surveillance. You can put a gateway in front of your minimalist web server or run the gateway on an OpenWRT router. This also helps the network by generating cover traffic and encouraging civilian use.
reply
jl6 22 hours ago
TLS client certs are quite a nice approach to identity though.
reply
krapp 2 days ago
>Removing encryption means that you can't reasonably do financial transactions, accounts and access restriction, exchange of private information, etc... You only share what you want to share publicly, with no restrictions. It seriously limits commercial potential which is the point.

People will still do financial transactions on an unencrypted web because the utility outweighs the risk. Removing encryption just guarantees the risk is high.

reply
zzo38computer 2 days ago
> People will still do financial transactions on an unencrypted web because the utility outweighs the risk. Removing encryption just guarantees the risk is high.

That does not necessarily require TLS to mitigate (although TLS does help, anyways). There are other issues with financial transactions, whether or not TLS is used. (I had idea, and wrote a draft specification of, "computer payment file", to try to improve security of financial transactions and avoid some kinds of dishonesty; it has its own security and does not require TLS (nor does it require any specific protocol), although using TLS with this is still helpful.) (There are potentially other ways to mitigate the problems as well, but this is one way that I think would be helpful.)

reply
zzo38computer 2 days ago
> I think that the "small web" should reject encryption, which is the opposite direction that Gemini is taking.

I think it should allow but not require encryption.

> Removing encryption means that you can't reasonably do financial transactions, accounts and access restriction, exchange of private information, etc... You only share what you want to share publicly, with no restrictions. It seriously limits commercial potential which is the point.

Note that the article linked to says "the Gemini protocol is so limited that it’s almost incapable of commercial exploitation", even though Gemini does use TLS. (Also, accounts and access restriction can sometimes be used with noncommercial stuff as well; they are not only commercial.)

> It also helps technically. If you want to make a tiny web server, like on a microcontroller, encryption is the hardest part.

This is one of the reasons I think it should not be required. (Neither the client side nor server side should require it. Both should allow it if they can, but if one or both sides cannot (or does not want to) implement encryption for whatever reason, then it should not be required.)

> Anyone can man-in-the-middle and change the web page, TLS prevents that. But what I think is an even better solution is to do it at the content level: sign the content, like a GPG signature

Using TLS only prevents spies (except Cloudflare) from seeing or altering the data, and does not prevent the server operator from doing so (or from reassigned domain names, if you are using the standard certificate authorities for WWW; especially if you are using cookies for authentication rather than client certificates which would avoid that issue (but the other issues would not entirely be avoided)).

Cryptographic signatures of the files is helpful, especially for static files, and would help even if the files are mirrored, so it does have benefits. However, these are different benefits than those of using TLS.

In other cases, if you already know what the file is and it is not changing, then using a cryptographic hash will help, and a signature might not be needed (although you might have that too); the hash can also be used to identify the file so that you do not necessarily need to access it from one specific server if it is also available elsewhere.

> Well, if want to protect yourself, TLS won't save you, you will be given away by your IP address, they may not see exactly what you are looking at, but the simple fact you are connecting to a server containing sensitive data may be evidence enough.

There is also SNI. Depending on the specific server implementation, using false SNI might or might not work, but even if it does, the server might not provide a certificate with correct data in that case (my document of Scorpion protocol mentions this possibility, and suggestions of what to do about it).

reply
SeanDav 5 hours ago
With the increasingly intrusive legislated age verification and content monitoring being forced globally, I can easily see this as a catalyst to drive the Gemini protocol past critical mass.
reply
lasgawe 2 days ago
mm, yeah. I like the idea of the small web not as a size category but as a mindset. people publishing for the sake of sharing rather than optimizing for attention or monetization.
reply
rapnie 2 days ago
The fediverse is also generally experienced as a small web, where it comes to mindset. Though that is not always to the liking or preference of those expecting to find alternatives to big church social media platforms.
reply
apples_oranges 2 days ago
Feeding llms you mean
reply
8organicbits 2 days ago
Is there a good free-but-subscriber-only solution for blogs? It seems like a contradiction, but in practice it may be manageable.
reply
pixl97 2 days ago
If it takes off in any amount, then LLMs will just subscribe and pull said data from sites at a reasonable pace (or not, it's free so make many accounts).
reply
cosmicgadget 2 days ago
Loginwall or email newsletter with a summary on the open web.
reply
stronglikedan 2 days ago
they gotta eat too!
reply
ang_cire 13 hours ago
I remember the days of just chatting with folks on No Mutants Allowed (fallout fan forum) or SubSim.com, about life and stuff. We all sort of knew each other after years chatting about the games we played, and it felt so cozy and low-stakes.

I recently jumped back onto IRC Rizon, and man, what a throwback.

The small web really is MUCH bigger when you start adding other protocols like IRC and onion sites.

reply
qudat 20 hours ago
I built https://prose.sh as part of my journey into Gemini and back out. Ya, it's just a simple blog, but you can completely manage it with ssh and is compatible with hugo when people want to eject.

We also recently released support for plain-text-lists which is a gemini-inspired spec that use lists as its foundational structure.

https://pico.sh/plain-text-lists

example: https://blog.pico.sh/ann-034-plain-text-lists

reply
oxag3n 22 hours ago
> To be fair, I should point out that the “small” web was never defined by the number of sites, but by the lack of commercial influence.

That was my understanding before it grew - it's a web of small indie sites.

reply
followdev 19 hours ago
I built FollowDev.com which is like Kagi Small Web but for software developer blogs.

It has about 1000 blogs in the repo at the moment. Discovering was the most time consuming part.

reply
lich_king 2 days ago
It's easy to hand-curate a list of 5,000 "small web" URLs. The problem is scaling. For example, Kagi has a hand-curated "small web" filter, but I never use it because far more interesting and relevant "small web" websites are outside the filter than in it. The same is true for most other lists curated by individual folks. They're neat, but also sort of useless because they are too small: 95% of the things you're looking for are not there.

The question is how do you take it to a million? There probably are at least that many good personal and non-commercial websites out there, but if you open it up, you invite spam & slop.

reply
freediver 2 days ago
I mainly use Kagi Small Web as a starting point of my day, with my morning coffee. Especially now when categories are added, always find something worth reading. The size here does not present a problem as I would usually browse 20-30 sites this way.
reply
lich_king 2 days ago
Right, but that basically works as a retro alternative to scrolling through social media. If you're looking for something specific, it's simultaneously true that there's a small web page that answers your question and that it's not on any "small web" list because the owner of the webpage never submitted it there, or didn't meet the criteria for inclusion.

For example, I have several non-commercial, personal websites that I think anyone would agree are "small web", but each of them fails the Kagi inclusion criteria for a different reason. One is not a blog, another is a blog but with the wrong cadence of posts, etc.

reply
freediver 2 days ago
Feel free to suggest changes to criteria for inclusion. It is mostly the way it is now as the entire project is maintained by one person - me :)
reply
rambambram 22 hours ago
It might sound stupid, but I'm not a git or github user, I would rather fill in a webform to submit a new website and feed.
reply
freediver 17 hours ago
The (artificial) barrier to entry is there for a reason - one person maintains the entire project and because it is fairly technical to submit the acceptance rate has been close to 99%.
reply
rambambram 12 hours ago
I guessed that might be the reason, smart move. Have you tried a webform in the past which resulted in a lot of crappy submissions?
reply
lich_king 2 days ago
Looking at the criteria again, I can think of at least three things that arbitrarily exclude large swathes of the small web:

1) The requirement that it needs to be a blog. There's plenty of small-web sites of people who obsess over really wonderful and wacky stuff (e.g., https://www.fleacircus.co.uk/History.htm) but don't qualify here.

2) The requirement that it needs to be updated regularly. Same as above - I get that infrequently updated websites don't generate a "daily morning" feed, but admitting them wouldn't harm in any way.

3) Blanket ban on Substack-like platforms while allowing Blogspot, Wordpress.com, YouTube, etc. Bloggers follow trends, so you're effectively excluding a significant proportion of personal blogs created in the last six years, including the stuff that isn't monetized or behind interstitials. The outcomes are pretty weird: for example, noahpinionblog.blogspot.com is on your list, but noahpinion.blog is apparently no longer small web.

reply
freediver 24 hours ago
1) It has to have a feed (we dont want to overcrawl) so hence 'blog' - more accurately any site with an RSS/atom feed would do

2) 'Regularly' means posted in the last 2 years to be included

3) Substack has an annoying subcribe popup and ads/popups are against the spirit of what this represents

reply
nottorp 24 hours ago
> The question is how do you take it to a million?

Do you need to take it to a million in the same place? Is that still "small"?

Why not have 2000 hand curated directories instead?

reply
lich_king 7 hours ago
> Why not have 2000 hand curated directories instead?

It depends on what you're trying to achieve. If you want to have a personal feed of stories from interesting people, 50 is probably enough to give you some interesting daily reading. But if you want to build a "small web" search lens, you absolutely need to cover. For example, Kagi is billing a "small web" search filter, but it excludes a lot of the small web because they only allow actively-maintained blogs and only a subset of them.

reply
cosmicgadget 2 days ago
My approach operates under the assumption that good, non-commercial webpages will be similar to other good webpages. Slop, SEO spam, and affiliate content will resemble other such content.

So a similarity-based graph/network of webpages should cluster good with good, bad with bad. That is what I've seen so far, anyway.

With that, you just need to enter the graph in the right place, something that is fairly trivial.

reply
ZebrasEat 23 hours ago
Has there been any effort in taking any of these small web type approaches into a headscale type space? My preferences would be to have a private area where whitelisting prevents crawling or scraping. Am baffled why someone hasn’t created a headscale server and started distributing nodes to personally known ‘good intentioned’ humans. Anyone ever heard of anything like this?
reply
8organicbits 19 hours ago
reply
esseph 23 hours ago
Parts of the "dark web" are built somewhat like this
reply
draxil 13 hours ago
I think Gemini might be wise to rename, now there's a commercial product trampling all over it's namespace.
reply
jonasced 12 hours ago
However if it's supposed to continue being an indie protocol and fly under the radar, it's perfect
reply
KurSix 11 hours ago
The funny thing is that RSS solved the "too many websites" problem 20 years ago
reply
tonymet 2 days ago
I’m not sold on gemini. Less utility, weaker, immature tools. Investing on small HTTP based websites is the right direction. One could formalize it as a browser extension or small-web HTTP proxy that limits JS, dom size, cookie access etc using existing Web browsers & user agents.
reply
trinsic2 2 days ago
Can anyone point me to the best place to get castor going? I cant install it on my 22.04 install unmet dependencies...
reply
jmclnx 2 days ago
I moved my site to Gemini on sdf.org, I find it far easier to use and maintain. I also mirror it on gopher. Maintaining both is still easier than dealing with *panels or hosting my own. There is a lot of good content out there, for example:

gemini://gemi.dev/

FWIW, dillo now has plugins for both Gemini and Gopher and the plugins work find on the various BSDs.

reply
dwg 24 hours ago
Love the irony: Man builds a Gemini-style feed aggregator for small web, finding it, well, not so small.
reply
romaniv 2 days ago
Small Web, Indie Web and Gemini are terminally missing the point. The web in the 90s was an ecosystem that attracted people because of experimentation with the medium, diversity of content and certain free-spirited social defaults. It also attracted attention because it was a new, exciting and rapidly expanding phenomenon. To create something equivalent right now you would need to capture those properties, rather then try to revive old visual styles and technology.

For a while I hoped that VR will become the new World Wide Web, but it was successfully torpedoed by the Metaverse initiative.

reply
cdrnsf 2 days ago
There's an element of nostalgia, certainly but it's also a reaction to the overwhelmingly commercial web. Why not build something instead of scrolling through brief videos interspersed with more and more ads that follow you everywhere?

Large companies have helped build the web but they've done at least as much, if not more, to help kill it.

reply
Karrot_Kream 2 days ago
The small web can be a lot of things, but IMO it gets too overrun by the ideologically zealous. One does not have to believe in primitive anarchism to enjoy camping, for example. In general it seems any niche idea on the internet is like candle flame to zealous moths.
reply
krapp 23 hours ago
Ideological zealots are more or less the only people who hate the modern web so much that they want to quarantine themselves within an entirely different and functionally limited protocol or ecosystem. Everyone else is fine discussing camping in Facebook groups and on Reddit and wherever, maybe just using an ad blocker.
reply
cdrnsf 9 hours ago
I don't think there's anything terribly modern about a collection of large companies trying to present themselves as the entirety of a given thing (the internet in this case).
reply
gzread 2 days ago
It's about capturing the noncommerciality, not the experimentation. Most of the small web sites are just blogs, a solved problem by now, but there's interesting content in many of them.
reply
SirFatty 2 days ago
Which is exactly the point of Gemini.
reply
skeeter2020 2 days ago
I'm a dinosaur who bemoans the loss of whatever-it-was we had prior to the mass exploitation and saturation of the web today, so I feel it's my duty to check out Gemini and stop complaining. I'm prepared to trade ease of use or some modern functionality for better content and less of what the internet has become.
reply
mattlondon 2 days ago
Not quite. I think Gemini has deliberately gone for a "text only" philosophy, which I think is very constraining.

The early web had a lot going on and allowed for a lot of creative experimentation which really caught the eye and the imagination.

Gemini seems designed to only allow long-form text content. You can't even have a table let alone inline images which makes it very limited for even dry scientific research papers, which I think would otherwise be an excellent use-case for Gemini. But it seems that this sort of thing is a deliberate design/philosophical decision by the authors which is a shame. They could have supported full markdown, but they chose not to (ostensibly to ease client implementation but there are a squillion markdown libraries so that assertion doesn't hold water for me)

It's their protocol so they can do what they want with it, but it's why I think Gemini as a protocol is a dead-end unless all you want to do is write essays (with no images or tables or inline links or table-of-contents or MathML or SVG diagrams or anything else you can think of in markdown). Its a shame as I think the client-cert stuff for Auth is interesting.

reply
wibbily 24 hours ago
It’s tough but one of the tenets of Gemini is that a lone programmer can write their own client in a spirited afternoon/weekend. Markdown is just a little too hard to clear the bar. Already there was much bellyaching on the mailing list about forcing dependence on SSL libraries; suggesting people rely on more libraries would have been a non-starter

Note that the Gemini protocol is just a way of moving bytes around; nothing stops you from sending Markdown if you want (and at least some clients will render it - same with inline images).

reply
krapp 23 hours ago
Didn't the creator of the protocol go on a rant when someone made a browser for Gemini that included a favicon?

I can't imagine the backlash if someone tried to normalize Markdown. Isn't the entire point of Gemini that it can never be extended or expanded upon?

Maybe it would be better to create an entirely different protocol/alt web around Markdown that didn't risk running afoul of Gemini's philosophical restrictions?

reply
gzread 17 hours ago
Yeah, instead someone makes a new and incompatible protocol whenever they want to change it.

> The SmolNet consists of content available through alternative protocols outside the web such as gemini:// gopher:// Gopher+ gophers:// finger:// spartan:// text:// SuperText nex:// scorpion:// mercury:// titan:// guppy:// scroll:// molerat:// terse:// fsp://. There is a summary of the main SmolNet protocols.

- https://wiki.archiveteam.org/index.php/SmolNet

reply
krapp 11 hours ago
Molerat at least seems to use Markdown but many do seem to be "Gemini but X." I wonder how much use any of those get?
reply
mattlondon 22 hours ago
I think a "markdown-web" that uses some of the Gemini approaches for privacy and auth/identity etc would be pretty nice.

Of course, as others have said, we could just use HTML without JavaScript or cookies and we'd be a lot of the way there with 95% less effort but hey in the future we'll probably just query an AI rather than load a web page ourselves.

reply
krapp 22 hours ago
Given how many people on HN say they like Gemini in principle but wish it weren't so restrictive, some people would use it. All of those people might just be that cross section of HN users, however.
reply
okuntilnow 24 hours ago
There are images in geminispace, and audio, and (probably) video. It's just not inline. One of constraints of the protocol is that pages cannot load content without your express say-so.
reply
shadowbyte17 10 hours ago
the 88x31 badge thing is wild, i remember seeing those back in the early 2000s. it’s a really interesting way to build connections.
reply
graemep 23 hours ago
Edited text out as posted in wrong thread.
reply
mtmail 23 hours ago
The comment belongs to this thread https://news.ycombinator.com/item?id=47401809
reply
graemep 22 hours ago
Thanks.
reply
tonymet 2 days ago
hats off to https://1mb.club/ and https://512kb.club/ for cataloging and featuring small web experiences
reply
ontehfritz 20 hours ago
Just because this is relevant to the small web and started off as just that ... so I am going to do a very shameful post as I am the creator:

https://www.demarkus.io/ https://github.com/latebit-io/demarkus

Mark protocol, with client tools, tui, server, and mcp (yes I know, hype train, but useful for agents) only markdown format. Simple :)

reply
october8140 17 hours ago
Podnet on Picotron.
reply
DeathArrow 13 hours ago
But if people want to keep sites free of commercial influences can't they do that with existing technologies? Do they really need to use some exotic technologies which have the side effect of limiting the commercial use?

My point is that by using exotic tech, they limit the general public exposure to the "un-commercial web".

reply
Gunax 24 hours ago
It's sad how the snall web became invisible.

I used to use all sorts of small websites in 2005. But by 2015 I used only about 10 large ones.

Like many changes, I cannot pinpoint exactly when this happened. It just occurred to me someday that I do not run into many unusual websites any longer.

It's unfortunate that so much of our behavior is dictated by Google. I dint think it's malicious or even intentional--but at some point they stopped directing traffic to small websites.

And like a highway closeure ripples through small town economies, it was barely noticed by travellers but devestating to recipients. What were once quaint sites became abandoned.

The second force seems to be video. Because video is difficult and expensive to host, we moved away from websites. Travel blogs were replaced with travel vlogs. Tutorials became videos.

reply
nanobuilds 19 hours ago
The experience of the internet would be so much more interesting if the search engines unearthed rare blogs or writing from small creators and bloggers that thought things through or shared original ideas.

It did seem we had that for a while and now everything funnels back to a handful of big platforms.

Maybe as AI swallows the data of the entire web, it would start to look for these small sites, small creators, and rare personal content to keep itself interesting and we'll see more of them?

reply
b00ty4breakfast 14 hours ago
>I dint think it's malicious or even intentional...

it's indirectly intentional in that Google isn't wringing it's hands trying to destroy tiny blogs but they (Google) have deliberately chosen to ignore anything that doesn't play the SEO game, whatever the driver of that game is.

reply
Ylpertnodi 23 hours ago
> I dint think it's malicious or even intentional--but at some point they stopped directing traffic to small websites.

Small websites have small dollars?

reply
Gunax 22 hours ago
Does Google get money for directing traffic to sites (barring sites that pay for ad placement, that is)?

If I search for 'astronomy', Google doesnt earn any money whether i go to the Wikipedia page for astronomy, or Joe's astronomy site, right?

reply
type0 18 hours ago
They do earn money when you go to Malory's Astrology page that is full of Google Ads
reply
pixodaros 18 hours ago
Google gets money for showing ads and sponsored content on the search page. If you click on a site with Google ads or Google scripts, it gets more money and monetarizable PII. So its in their interest to prioritize sites with Google ads or Google services, but only Google staffers know exactly how the search algorithm works.

A few years ago they upranked all results on a few trusted domains, so many of those domains filled up with advertising and cheap copywritten content. They framed this as 'fighting misinformation.'

reply
heliumtera 2 days ago
How many would be left after removing self promotion, AI generated content and "how I use AI?"(Claude code like everybody else)
reply
pipeline_peak 23 hours ago
Every answer I’ve seen to corporate internet whether it be Mastodon or Gemini just sounds like shell script slab city.

Just another place for hackers to go and keep to themselves.

Non tech people, who needs them! In shelf script slab city, we can share cooking recipes in plaintext.

Just spin up your instance, map it to a port, initialize the listener daemon, download the cli viewer…and you’re in shell script slab city! Who needs Twitter

reply
shevy-java 22 hours ago
I welcome all and every attempt to "rescue" the world wide web how it once was, even if modernized. I see this every time I try to use google search - google ruined the search engine. That was not a mistake; that was not an accident. The goal is to divvy up the world wide web into commercial interests that help sustain the google empire; the fight google waged against ublock origin also shows this.

I have no idea if Gemini can help revitalize the small web or not; I am a pessimist b nature so who knows. I fear so many things have changed in the last 20-30 years that some things may be permanently lost. For instance, in the early 2000s or so, a local university offered personal homepages for every student. This stopped in like 2010 or so. And never came back. At any rate, I welcome any idea to try to rescue the part of the world wide web not already destroyed by commercial interests. People seem to use private entities though; I saw that with discord. Then compared it to oldschool mIRC on freenode or galaxynet. Discord is private. I hate that.

reply
renegat0x0 14 hours ago
[dead]
reply
ldsjunior 9 hours ago
[dead]
reply
ryan14975 18 hours ago
[dead]
reply
PeteAyscough 22 hours ago
[dead]
reply
productinventor 2 days ago
[flagged]
reply
constructrurl 22 hours ago
[dead]
reply
myylogic 2 days ago
[flagged]
reply
hettygreen 2 days ago
[flagged]
reply
elashri 2 days ago
Bro are you kidding? The whole point of small web is to avoid cooperations, AI and governments.
reply