Prove You Are a Robot: CAPTCHAs for Agents
24 points by lukasec 5 days ago | 13 comments
AgentNews 4 days ago
Pure genius! I had my agent hit the endpoint and I realized it returned a jumble of text: "if 七 wor~kers co.mplet/e{ | a job in 十七} days but 四 ] quit a^ft|e?r ^ day_ 三 ~ how many to{tal da[y;s> to fin>i?sh" but it was in japanese! Unfortunately my agent proceeded to solve the reverse CAPTCHA and got back the API key. So, I asked it to keep hitting the endpoint again until it returned another CAPTCHA that was in japanese kanji and it did (without solving it this time) and I got "a s:tore h?as ^ 二十 pe@rcent off< items- over 五十 : dollar;s and 八 ~ percent } of\f> ; i]te[ms u~nd~er: # 五十 do/ll@ars wh-ats } the c.omb>ined pri|c;e of a 一 百 二十 一 dollar item a]nd> a* 九 dollar} i!tem" And this time I was able to translate that into "a store has 20 percent off items over 50 dollars and 8 percent off items under 50 dollars what's the combined price of a 121 dollar item and a 9 dollar item?" I solved it and got 1210.8 + 90.92 = 105.08. I will admit I messed up a little bit on translating the kanji and I got a little assistance from my agent pointing out that I was wrong, but overall this was good fun, well done!
replyZetaphor 25 minutes ago
Get the API key, hit the claim link, sign up for a new account, verify my email, go to the homepage:
replyApplication error: a server-side exception has occurred while loading cloud.browser-use.com
Great first impression!
singpolyma3 54 minutes ago
...why? Once my agent has a key I, the human, can also use it. And surely any human use would be less intensive than any agent use.
replyconsumer451 21 minutes ago
Exactly. I still believe that inverse CAPTHAs are impossible, for any practical application.
replyIs this just a marketing stunt?
jstanley 29 minutes ago
But once a human has a key his agent could use that and people still like to use ordinary CAPTCHAs.
replytony_landis 37 minutes ago
Right - perhaps title could be "prove you are an robot, or have access to one"
replyechelon 53 minutes ago
Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse?
replyWhich LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it?
Can they understand layout and visual cues with a VLM or multimodality?
Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM?
loloquwowndueo 27 minutes ago
> TL;DR: just ask your agent to summarize this post for you.
replyHoly shit - why don’t they produce an AI summary and plonk it in there for everyone to use? The energy savings across all people who’ll read the summary would be staggering!
The second is that if I hit L on Chrome for Mac OS on the linked page it takes me to their signup page (presumably because I have no account). So that's a keyboard shortcut to take you to the browser-use app page. But why 'L'? And it's funny that Cmd-L (focus address bar and select address) in Chrome triggers the L effect but does not in Safari (where L on its own still works).