This was like 1998-2003, and non technical people were doing it too. I think I am the only one from that friend group who would even consider that as something to watch out for.
I'm always intrigued by the German FE-Schrift ("fälschungserschwerende Schrift", "more-difficult-to-forge font") chooses shapes for characters that makes it hard for them to be turned into one another (like a 3 into an 8 or so):
https://en.wikipedia.org/wiki/Chinese_numerals#Financial_num...
> a string like “аpple.com” with Cyrillic а (U+0430) is pixel-identical to “apple.com” in 40+ fonts. The user, the browser’s address bar, and any visual review process all see the same pixels. This is not theoretical. It is a measured property of the font files shipping on every Mac.
Current implementations of "Computer Use" Agentic AI tools mostly use visuals -- screenshotting of a computer screen and interpreting it.
These pixel-dentical character pairs will be a straight failure mode for those automations and could possibly be a threat vector if crafted well.
Things like the Fraktur characters are obvious mismatches in any font I know, I do do wonder why they're on the list.
I would recommend template matching using normalized cross-correlation (TM_CCOEFF_NORMED in opencv.)
Also this paper from Nvidia critically scrutinizing SSIM may be relevant: https://research.nvidia.com/publication/2020-07_Understandin...
Like even if the two characters look quite different, if they both look like the same letter in different fonts that is a problem. It doesn't mattter if you can tell the difference between the glyphs in a side by side comparision. What matters is what letter the user interprets the glyph as.
Also, I remember 8x16 VGA font that came with KeyRus had some slight differences between Cyrillic and Latin lookalikes, that brought some strange sense of comfort when reading, and especially typing the letter c, because its Cyrillic lookalike is located on the same key.
that is very interesting.
I imagine the browser could take some context clues and switch rendering to puny code if the locale of the user is nowhere near a cyrillic region. But that is only going to patch some edge cases and miss others.
Ideally, the solution is password managers everywhere, which don't have this vulnerability, instead of using human eyes to visually recognize web urls and thus is vulnerable.
Anyone reading this - please, please, please do not make any assumptions based on the end-user's geography.
Signed, someone who can cross 3 national and 4 language borders within a few hours of driving.
I think the lack of exploration of the context around the problem and current mitigations is an issue with the article - it spends a lot of time talking about the possible threat, but very little time on whether the attack is actually practical with modern mitigations.
it also defaults to not loading HTML in emails, which i love. really opened my eyes to how dumb it really is to just accept all kinds of dynamic content in unknown messages. (kinda same as how the modern web relies on remote code execution to work)
Was that the intention?
> "This is not theoretical. It is a measured property of the font files shipping on every Mac."
some patterns of speech are so recognizably LLM, i am convinced that the AI detection startups have a very strong chance to succeed on text.
> some patterns of speech are so recognizably LLM, i am convinced that the AI detection startups have a very strong chance to succeed on text.
The problem for them is the market. Those who actually want to buy AI detection tools usually want the impossible - detecting any kind of AI-written text, or even AI-written-human-edited text.
You're right in that many HN articles (not going to comment on this one specifically) are very easy to detect. But that's just because these article writers are too lazy to even use any of the plethora of tools that remove the smells automatically, or tools that write without them in the first place (I've made such a tool myself), or even just adjusting the prompt to write in a different style that avoids them.
Most people who would be interested in paying for AI detection tools want them to detect all of the above cases too, which is of course impossible.
This text made me curious, I liked the approach the author has taken. And it made me think how I would do it. My first idea would be to use ImageMagick to render text and then use ImageMagick's https://imagemagick.org/script/compare.php to somehow calculate the risk of confounding glyphs.
So: Don't be snarky? Maybe we need another rule here, to limit comments on "LLM style" https://news.ycombinator.com/newsguidelines.html
Most of the added value in this article can be summed up by saying that the Cyrillic glyphs are identical to the similar English ones in the fonts that author looked at (which isn't true for all fonts), and author didn't find many other such examples.
_______
¹ Try matching that word with "censorship" for fun
I don't have a Mac.
I did run into some issues in early versions on when characters in Linux commands or visible web addresses were replaced. Fortunately the source docs were HTML, and it was easy to exclude code or pre nodes when rendering.
I thought this was so clever, but the leaker was never caught using it, to the best of my knowledge.