Hacker News

Show HN: I built a tool that watches webpages and exposes changes as RSS

305 points by vkuprin 2 days ago | 79 comments

I built Site Spy after missing a visa appointment slot because a government page changed and I didn’t notice for two weeks.

It watches webpages for changes and shows the result like a diff. The part I think HN might find interesting is that it can monitor a specific element on a page, not just the whole page, and it can expose changes as RSS feeds.

So instead of tracking an entire noisy page, you can watch just a price, a stock status, a headline, or a specific content block. When it changes, you can inspect the diff, browse the snapshot history, or follow the updates in an RSS reader.

It’s a Chrome/Firefox extension plus a web dashboard.

Main features:

- Element picker for tracking a specific part of a page

- Diff view plus full snapshot timeline

- RSS feeds per watch, per tag, or across all watches

- MCP server for Claude, Cursor, and other AI agents

- Browser push, Email, and Telegram notifications

Chrome: https://chromewebstore.google.com/detail/site-spy/jeapcpanag...

Firefox: https://addons.mozilla.org/en-GB/firefox/addon/site-spy/

Docs: https://docs.sitespy.app

I’d especially love feedback on two things:

- Is RSS actually a useful interface for this, or do most people just want direct alerts?

- Does element-level tracking feel meaningfully better than full-page monitoring?

ahmedfromtunis 2 days ago

As a (former) reporter, site monitoring is a big part of what I do on a daily basis and I used many, many such services.

I can attest that, at least from the landing page, this seems to be a very good execution of the concept, especially the text-based diffing to easily spot what changed and, most importantly, how.

The biggest hurdle for such apps however are 'js-based browser-rendered sites' or whatever they're called nowadays. How does Site Spy handle such abominations?

vkuprin 2 days ago

Thanks, that’s a really good question. Site Spy uses a real browser flow, so it generally handles JS-rendered pages much better than simple HTML-only polling tools. In practice, the trickier cases tend to be sites with aggressive anti-bot protection or messy login/session flows rather than JS itself. I’m trying to make those limitations clearer so people don’t just hit a vague failure and feel let down

rozumem 14 hours ago

Curious how you're thinking about getting around anti-bot protection. I scrape a lot and I've noticed many highly trafficked sites investing in anti-bot measures recently, with the rise of AI browsers and such. Still, cool idea, congrats on the launch.

vkuprin 11 hours ago

I'm planning to add proxy rotation across different regions to help with geo-restricted content and rate limiting. Anti-bot is an arms race though — some sites just can't be monitored without solving a captcha, which isn't something I'm trying to do. Focused on making the common cases work well rather than promising to bypass everything.

NicuCalcea 12 hours ago

JS-rendered websites are sometimes even better, they usually have some sort of internal API that you can access directly instead of relying on the website styling which may change at any moment.

I'm a fellow reporter who needs to keep tabs on some websites. I used various tools, including running my own Klaxon[1] instance, but these days I find it easier to just quickly vibe-code a crawler and use GitHub Actions to run it periodically. You can make it output an RSS feed, email you, archive it with archive.today, take a screenshot, or trigger whatever action you want.

1: https://github.com/themarshallproject/klaxon

csto12 20 hours ago

What do you do now if you don’t mind sharing?

xnx 2 days ago

I like https://github.com/dgtlmoon/changedetection.io for this. Open source and free to run locally or use their Saas service.

raphman 2 days ago

There's also https://github.com/thp/urlwatch/ - (not aware of any SaaS offer - self-hosted it is).

vkuprin 2 days ago

Yep, urlwatch is a good one too. This category clearly has a strong self-hosted tradition. With Site Spy, what I’m trying to make much easier is the browser-first flow: pick the exact part of a page visually, then follow changes through diffs, history, RSS, and alerts with very little setup

vkuprin 2 days ago

Yep, changedetection.io is a good project. With Site Spy, I wanted to make the browser-first workflow much easier: install the extension, connect it to the dashboard, click the exact part of the page you care about, and then follow changes as diffs, history, or RSS with very little setup. I can definitely see why the open-source / self-hosted route is appealing too.

shaunpud 13 hours ago

Does your project use changedetection.io behind the scenes? When I look at the _All Watches Feed_ the contents of the rss file include;

    <?xml version='1.0' encoding='UTF-8'?>
    <rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"><channel><title></title><link>https://changedetection.io</link><description>Feed description</description><docs>http://www.rssboard.org/rss-specification</docs><generator>python-feedgen</generator><lastBuildDate>Thu, 12 Mar 2026 10:10:10 +0000</lastBuildDate></channel></rss>

vkuprin 12 hours ago

Yes — it's a fork of changedetection.io. I went into more detail here: https://news.ycombinator.com/item?id=47349141. The RSS link you spotted was a leftover, already fixed

msp26 12 hours ago

see:

https://news.ycombinator.com/item?id=47349069

xnx 2 days ago

Changedetection has an extension too: https://chromewebstore.google.com/detail/changedetectionio-w...

beepbooptheory 2 days ago

Sure but this one has a MCP server, costs money, and was presumably made last night!

nicbou 2 days ago

It's been around for a while and recommended by many. I tried it myself and it's okay.

KomoD 12 hours ago

Are you referring to OP's site? The domain was registered 2026-01-16 so it certainly hasn't been around for a while.

pelcg 2 days ago

Looks cool and this can be self hosted and it is for free.

Nice will try this out!

bharrison 20 hours ago

Calls to mind a long forgotten, and perhaps one of the most useful bash scripts I've written which addressed this issue in a very rudimentary way; Curl a url, hash the result to file, compare the current hash to the saved hash from the previous run and generate an email on diff.

My goal was to monitor the online release of tickets for the 2009 Scion Rock fest (luckily no js, or even Adobe Flash[!] in use), and it worked brilliantly to that end.

[grammar]

dannyfritz07 4 hours ago

How do you generate a meaningful diff from a saved hash?

msp26 12 hours ago

I got claude to reverse engineer the extension and compare to changedetection and here's what it came up with. Apologies for clanker slop but I think its in poor taste to not attribute the opensource tool that the service is built on (one that's also funded by their SaaS plan)

---

Summary: What Is Objectively Provable

- The extension stores its config under the key changedetection_config

- 16 API endpoints in the extension are 1:1 matches with changedetection.io's documented API

- 16 data model field names are exact matches with changedetection.io's Watch model (including obscure ones like time_between_check_use_default, history_n, notification_muted, fetch_backend)

- The authentication mechanism (x-api-key header) is identical

- The default port (5000) matches changedetection.io's default

- Custom endpoints (/auth/, /feature-flags, /email/, /generate_key, /pregate) do NOT exist in changedetection.io — these are proprietary additions

- The watch limit error format is completely different from changedetection.io's, adding billing-specific fields (current_plan, upgrade_required)

- The extension ships with error tracking that sends telemetry (including user emails on login) to the developer's GlitchTip server at 100% sample rate

The extension is provably a client for a modified/extended changedetection.io backend. The open question is only the degree of modification - whether it's a fork, a proxy wrapper, or a plugin system. But the underlying engine is unambiguously changedetection.io.

vkuprin 12 hours ago

Fair point, and I should have been upfront about this earlier. The backend is a fork of changedetection.io. I've built on top of it — added the browser extension workflow, element picker, billing, auth, notifications, and other things — but the core detection engine comes from their project. That should have been clearly attributed from the start, and I'll add it to the docs and about page.

changedetection.io is a genuinely great project. What I'm trying to build on top of it is the browser-first UX layer and hosted product that makes it easier for non-technical users to get value from it without self-hosting and AI focus approach

P.S -> I've also added an acknowledgements page to the docs: https://docs.sitespy.app/docs/acknowledgements

briaoeuidhtns 8 hours ago

have you adhered to the license? https://github.com/dgtlmoon/changedetection.io/blob/master/C... . if so, where can I get a copy of the source?

vkuprin 5 hours ago

Yes — the project is Apache 2.0 licensed (https://github.com/dgtlmoon/changedetection.io/tree/master?t...), which permits forking and commercial use. There's also a COMMERCIAL_LICENCE.md in the repo for hosting/resale cases, and I've reached out to the maintainer directly about it. Attribution is here: https://docs.sitespy.app/docs/acknowledgements

Kotlopou 12 hours ago

I use RSS to get updates from a ll the stuff I read online at once, and thought this would be nice for those websites that don't already have an RSS feed, but... Perhaps I'm stupid, but I can't actually find the RSS output? And searching for RSS on https://docs.sitespy.app/docs returns no hits.

vkuprin 11 hours ago

Not stupid at all — the docs were missing an RSS page, which is on me. I've just added one: https://docs.sitespy.app/docs/dashboard/rss. RSS feeds are available per watch, per tag, or across all watches from the dashboard. Thanks for flagging it, this is exactly the kind of feedback that helps

dannyfritz07 4 hours ago

What does an RSS URL look like from a chrome extension?

tene80i 2 days ago

RSS is a useful interface, but: "Do most people just want direct alerts?" Yes, of course. RSS is beloved but niche. Depends who your target audience is. I personally would want an email, because that's how I get alerts about other things. RSS to me is for long form reading, not notifications I must notice. The answer to any product question like this totally depends on your audience and their normal routines.

ikari_pl 2 days ago

It's niche because some companies decided so.

you used to have native RSS support in browsers, and latest articles automatically in your bookmarks bar.

ctxc 2 days ago

That's good reasoning, but the parent's point still stands?

hrmtst93837 11 hours ago

If you design for email alerts you invite reply loops, permanent delivery failures, and all the headaches of scaling SMTP. RSS, while nerdy, offloads almost every operational hassle to the client and works fine for polling when instant delivery isn't mandatory.

Some users want to pipe these updates into scriptable things like Slack, bots, or custom dashboards, where RSS is much easier to handle than email. If you offer both, people will use whichever fits their workflow, and that isn't always predictable.

Symbiote 24 hours ago

I added my employer's website RSS feed to the all-staff Slack channel. I find it useful, I don't know about others but no one has grumbled.

https://slack.com/intl/en-gb/help/articles/218688467-Add-RSS...

pilina 17 hours ago

Nobody has mentioned, that FreshRSS [0] has built-in web scraper too. It is also easier to use (little bit, imho), than changedetection.io. Works like a charm for me - mostly window shoping thrift shops.

[0](https://freshrss.org/)

toyg 17 hours ago

Does anyone here remember WinerWatch? The tool Mark Pilgrim wrote, that monitored changes to blog posts by (the notoriously trollish) Dave Winer? It caused a big stir in the blogosphere in 2003, and Mark took it down very quickly after public blowback. I always thought it was pretty cool and he should have left it up.

One of the very few surviving posts on the matter: https://burningbird.net/forget-the-law-forget-the-technology...

rafterydj 9 hours ago

What an interesting glimpse into the blogosphere. That post in particular is interesting, given the much larger discussions that followed it on essentially the same topic: how to decide what people can and can not say when there are those who ruin the commons. I'm thinking of Twitter/X's Community Notes feature and the many reputation mechanisms people put in place, and the subsequent griping of community politics that arise from those mechanisms...

lightningflash 15 hours ago

Kudos for the launch. Super useful for people tracking granular updates!

I've built something similar for the startup I'm working at, but across the web with a Natural Language Interface: https://parallel.ai/products/monitor. It can find diffs in previous content or find new pages pertaining to the query depending on the semantics. Hope people on this thread are able to derive value from it.

enoint 2 days ago

Quick feedback:

1. RSS is just fine for updates. Given the importance of your visa use-case, were you thinking of push notifications?

2. Your competition does element-level tracking. Maybe they choose XPath?

vkuprin 2 days ago

Yep, Site Spy already has push notifications, plus email and Telegram alerts. I see RSS as the open interface for people who want to plug updates into their own reader or workflow. For urgent things like visa slots or stock availability, direct alerts are definitely the main path.

And yeah, element-level tracking isn't a brand new idea by itself. The thing I wanted to improve was making it easy to pick the exact part of a page you care about and then inspect the change via diffs, history, or RSS instead of just getting a generic "page changed" notification

Hauk307 2 days ago

This is cool. I'd use it to track when state wildlife agencies update their regulation pages — those change once a year with no announcement and I always miss it. Element-level tracking would be perfect for that vs watching the whole page. To answer your question: I'd want both RSS and direct alerts (email/push) depending on urgency.

multidude 9 hours ago

This is directly useful for financial data monitoring. I've been thinking about watching specific elements on energy report pages (EIA weekly inventory releases, OPEC statements) rather than scraping the full page. The element picker + RSS output is exactly the right interface for that — pipe the change event straight into an NLP pipeline without the noise of a full page diff.

The RSS question: yes, RSS is useful precisely because it's composable. It works with anything. Direct alerts are convenient but RSS is infrastructure.

plutokras 2 days ago

I have my own hobby RSS server built around the Google Reader API. Two of my plugins are pretty similar to what you described: one checks a page’s current state against the last saved version and publishes an entry if anything changed, the other is basically a CSS selector-based feed builder. Always good to see RSS content here, thanks for posting!

On your questions: some people prefer RSS, others email, and services exist to convert between the two in both directions. My own rule of thumb is email for things that need actual attention and RSS for everything that can wait. If you’re thinking about turning this into a service, supporting both would make sense since people are pretty split on this.

BloodAndCode 11 hours ago

i've actually been looking for something like this. full-page change monitors get noisy really fast, especially on sites with lots of small UI changes.

being able to watch a specific element sounds way more useful in practice (price blocks, availability text, etc).

curious how fragile the element tracking is though. if the site slightly changes the DOM structure or class names, does the watcher usually survive or do you end up reselecting the element pretty often?

vkuprin 11 hours ago

The selector is stored as a CSS path and matched against the fetched HTML on each check — so as long as the element's structure and nesting stay roughly the same, minor layout changes don't usually break it.

The fragile cases are sites that generate class names on every build (React/webpack/vite apps often do this) — those selectors will just stop working.

For semantic elements like price tags, availability text, or content blocks, they tend to be stable enough that it's not a real problem day-to-day. And if a filter stops matching entirely, the watch flags with error message it rather than silently giving you empty diffs.

BloodAndCode 11 hours ago

that makes sense. the error flagging is a nice touch — silent failures are usually the worst part of these tools. i've also run into the “random class name” problem on a lot of modern frontends. have you experimented with more semantic selectors (text anchors, attributes, etc) as a fallback, or do you try to keep it simple on purpose?

vkuprin 11 hours ago

Yeah, semantic anchors are definitely the right direction — [data-testid], aria-label, or text proximity tend to survive rebuilds much better than class paths. The picker leans towards CSS right now but that's something I want to improve.

The harder problem is auth-gated content — Instagram feeds, dashboards, paywalled pages. Browser Steps handles it today (you can script login flows), but honestly I think the real fix is AI-assisted interaction. A small cheap model that can find what you care about without needing a brittle selector at all. That's where I want to take this — less "maintain a CSS path", more "here's what I'm interested in, figure it out...

BloodAndCode 11 hours ago

that direction makes a lot of sense. selectors always end up turning into maintenance work sooner or later.

the idea of a small model just identifying “the thing that looks like a price / status / headline” feels much closer to semantic detection than DOM-path tracking

curious though — would you run that model on every check, or only when the selector fails? seems like a nice hybrid approach to keep things cheap.

vkuprin 9 hours ago

Yes! Exactly this direction — the hybrid fetch is already live: plain HTTP first, Chromium if the content looks off. LLM semantic targeting is the next step, but only triggered when a selector breaks, not on every check — too expensive otherwise.

iamflimflam1 2 days ago

Something I was planning on building but never got round - if anyone wants to do it then feel free to use this idea.

Lots of companies really have no idea what javascript is being inserted into their websites - marketing teams add all sorts of crazy scripts that don't get vetted by anyone and are often loaded dynamically and can be changed without anyone knowing.

A service that monitors a site and flags up when the code changes - even better if it actually scans and flags up malicious code.

jameswondere007 12 hours ago

I think it's very useful cause i also had these types of miss fortunes in the past and i don't want to experiance these again at all

rippeltippel 19 hours ago

This is a very common use case, I myself hand-coded something similar 13 years ago [1]. Kudos for the choice of RSS!

[1] https://github.com/piero/WebDiff

pentagrama 23 hours ago

That looks nice, I use the free plan of https://visualping.io for some software changelogs, RSS feeds are a paid feature. Will check this out.

vkuprin 23 hours ago

Yeah, RSS is free on all plans — it felt like a core feature, not an upsell

bananaflag 2 days ago

Very good!

This is something that existed in the past and I used successfully, but services like this tend to disappear

vkuprin 2 days ago

That’s a completely fair concern. Services in this category do need to earn trust over time. I built the backend to handle a fair amount of traffic, so I’m not too worried about growth on that side. My goal is definitely to keep this running for the long term, not treat it like a one-off project

electrotype 13 hours ago

Will this work on websites that may have Cloudflare captchas popping from time to time?

dogline 2 days ago

With lots of people showing how Saas apps can be easily written these days, I'm not as interested in those articles, as people showing off new ideas of what I can do with these new found abilities. This is cool.

ramgale 10 hours ago

Looks good! RSS is underrated for this honestly. I'd rather check a feed when I want than get pinged at 2am because a sidebar changed.

lkozloff 2 days ago

Love this - I had a similar idea years ago, specifically for looking at long-text privacy policies and displaying the `diff`... but obviously never built it.

What you've done here is that and so much more. Congrats!

dev_at 2 days ago

There's also AnyTracker (an app) that gives you this information as push notifications: https://anytracker.org/

Knork-and-Fife 2 days ago

and also visualping.io which sends email alerts

layman51 2 days ago

How might this tool work in terms of “archiving” a site? This is just something I was wondering given the recent change and controversy about archiving service sites on Wikipedia.

vkuprin 2 days ago

Site Spy keeps snapshot history, so you can revisit older versions of a page and inspect how it changed over time, not just get the latest alert. I’d describe it more as monitoring with retained history than as a dedicated public archive, but deeper archival integrations are definitely something I’ve thought about

digitalbase 2 days ago

Cool stuff. You should make it OSS and ask a one time fee for it. I would run it on my own infra but pay you once(.com)

jamiemallers 8 hours ago

[dead]

pwr1 2 days ago

Interesting... added to bookmarks. Could come in handy in the future

makepostai 2 days ago

This is interesting, gonna try it on our next project! thumb up

hinkley 2 days ago

Back in 2000 I worked for a company that was trying to turn something like this into the foundation for a search engine.

Essentially instead of having a bunch of search engines and AI spamming your site, the idea was that they would get a feed. You would essentially scan your own website.

As crawlers grew from an occasional visitor to an actual problem (an inordinate percent of all consumer traffic at the SaaS I worked for was bots rather than organic traffic, and would have been more without throttling) I keep wondering why we haven’t done this.

Google has already solved the problem of people lying about their content, because RSS feeds or user agent sniffing you can still provide false witness to your site’s content and purpose. But you’d only have to be scanned when there was something to see. And really you could play games with time delays on the feed to smear out bot traffic over the day if you wanted.

deceptionatd 2 days ago

Well-designed sitemaps and use of something like https://www.indexnow.org/ helps.

Cloudflare has Crawler Hints which works well IME: https://blog.cloudflare.com/crawler-hints-how-cloudflare-is-...

hinkley 21 hours ago

Do you know if cloudfront has anything like this?

butterlesstoast 2 days ago

This is quite a lovely implementation. Congrats!

nicbou 2 days ago

Buddy I love you!

I have wanted this for so long! My job relies on following many German laws, bureaucracy pages and the like.

In the long run I want specific changes on external pages to trigger pull requests in my code (e.g. to update a tax threshold). This requires building blocks that don't exist, and that I can't find time to code and maintain myself.

I currently use Wachete, but since over a year, it triggers rate limits on a specific website and I just can't monitor German laws anymore. No tools seem to have a debounce feature, even though I only need to check for updates once per month.

vkuprin 2 days ago

German laws and bureaucracy pages are exactly the kind of thing where tracking one specific part of a page is much more useful than watching the whole page. And yeah, more control over check frequency makes a lot of sense if monthly checks are enough and rate limits are the main problem. I’d be curious what kind of schedule would work best for you there?

nicbou 2 days ago

Monthly is fine, but not monthly all at once, because I watch multiple pages on one website, and that triggers the rate limiting.

The ideal pipeline for me would be "notice a change in a specific part of a page, use a very small LLM to extract a value or answer a question, update a constant in a file and make a pull request".

I've been thinking about this pipeline for a long time because my work depends on it, but nothing like it seems to exist yet. I'll probably write my own, but I just can't find the time.

vkuprin 23 hours ago

You can already work around the rate-limit issue today — there's a global minimum recheck interval in Settings that spreads checks out across time. Not per-site throttling yet, but it prevents one domain from getting hit too many times at once.

The pipeline you described — detect a change, extract a value with a small LLM, open a PR — is pretty much exactly what the MCP server is designed for. Connect Site Spy to Claude or Cursor, and when a specific part of a page changes, the agent can handle the extraction and PR automatically. I don't think anyone has wired up that exact flow yet, but all the pieces exist.

SherryWong 14 hours ago

Looks cool

breadcat 2 days ago

i love a good rss tool. Thanks for sharing

OSaMaBiNLoGiN 24 hours ago

Tool looks useful. But how is it that toggling between light/dark mode results in a multi-second freeze..? Scrolling drops frames, confirmed with dev tools.

Tested on m1 pro 2021 laptop and recent higher-end (4080, 14700k, etc) desktop. Same on both.

The fuck?

vkuprin 23 hours ago

Yeah, that was a real bug — CSS transitions on the body were blocking the thread during theme switches. I pushed a fix for it earlier today. Should be smooth now, but let me know if you still see it

m-hodges 23 hours ago

Great for opponent monitoring on political campaigns. We made an in-house version of this on Biden ‘20.

tonyekh 22 hours ago

[dead]

docybo 2 days ago

that's quiet good. will give a try congrat !