Input on the iPhone is so dreadful nowadays. Their palm rejection is definitely worse than before, so mistyping is more frequent. Their text-correction algorithm for typing is worse than before, and it frequently makes incorrect corrections to words that I don't notice, because they change words a few words back from where I typed. And STT hasn't improved. On top of that, my fingers are tired of the phone form factor. Please make the iphone not a chore to use, apple.
Looks like Wispr Flow uses a cloud model [0]:
> Cloud based speech processing infrastructure for 1B users
It gets to be a messy comparison because my iPhone can do STT with no latency pretty well fully on device, but Wispr Flow requires a cloud model, but to be fair, older Apple devices do as well. It's not an apples and oranges comparison, but I think those technical details make this a non direct comparison in a few ways.
For on-device with low system resource usage, Apple's is pretty damn good.
I wish more companies focused on how they can help humans instead of replacing us or squeezing us as hard as possible in the name of productivity.
My experience is limited to my elderly parents who have trouble seeing. With the text size Apple allows them to set it to, their phones are unreadable. Text runs off the screen in every app, 1st and 3rd party.
In their bill example, the user is told to confirm with the provider. Why not offer to call the number on the bill? Instead of telling them to use text detection, do it for them? Presumably Apple Intelligence would already have that capability. I’m afraid this will be a gimmick at best.
EDIT: Forgot to mention, the grip is good to see. Hopefully they don’t charge the apple tax on it.
I have a problem with astigmatic halation that makes ‘dark mode’ difficult to read. Since iOS 26, multiple aspects of the system have been made dark only, contrary to the system setting. Writing text correctly should be the lowest of low-hanging fruit.
I suspect this is more of a flashy ‘AI’ promotion rather than reflective of any real commitment.
They treat new industry advancements as technology, not products itself.
AI will be a feature to improve the customer experience, not the product itself.
https://blog.google/products-and-platforms/platforms/android...
https://android-developers.googleblog.com/2024/09/talkback-u...
this sort of thing really needs input from someone that uses it before we can judge it
Don't get me wrong, Apple using these technologies to help humans who are in need of help is laudable. But let's not pretend we don't know why most corporations don't look into this kind of thing. I think if we're being honest, we all very much know why they leave this sort of thing to the always nebulous "others".
> “When we work on making our devices accessible by the blind,” he said, “I don’t consider the bloody ROI.” It was the same thing for environmental issues, worker safety, and other areas that don’t have an immediate profit. The company does “a lot of things for reasons besides profit motive. We want to leave the world better than we found it.”
— https://www.forbes.com/sites/stevedenning/2014/03/07/why-tim...
I was just answering the question of why other corporations don't.
Money.
There's relatively little money in helping the visually impaired. You have to do it because you want to do it. Not because you're going to get rich.
I assume almost everyone looks into spending less money than more money for equivalent goods and services.
My one hope is that this eventually becomes widespread enough to stop alt text scolds.
Without that, there wouldn’t really be great vlm and conversational models.
The AI companies might have paid for the dictation of some videos on their own but voice assistants etc wouldn’t have existed and our ability to have AI that eventually understands the world would be much much harder.
1. Use AI to determine how much a bill is for
2. Call up the people who billed you and ask them how much they billed you
3. Pay billed amount
(I'm also picturing the poor CSR at the other end of the phone wading through hundreds and hundreds of call logs over the years for simple requests and managers up above screaming 'why is this guy calling us all the damn time costing us money'...)
Maybe just don't wear them in a car?
I use those motion cues on my iPhone even though I don't struggle with motion sickness https://www.youtube.com/shorts/OxbjggMcKrk
Still somewhat odd when a bus drives out from behind your Terminal mind.
Why not?
Don’t be so scared of variety. You just keep subjecting yourself to more of the same. The unending familiarity makes you dull.
https://www.youtube.com/watch?v=B3SmsSCvoss
Those made the ad stand out in my opinion.
I think the trap in creating anything is doing it for a crowd. Art, software, anything... it turns out better when it is made with a specific, named individual in-mind.
Accessibility features are almost always championed and field-tested with one specific loved one in mind and I think that's what keeps the technical solutions personable and grounded.
Probably 80% of "LLM's are below expectation" complaints (from the general population) involves some form of image analyses.
Image tokenization is hard because unlike language tokenization, where every token is extremely dense with meaning, image tokens tends to be meaningless or irrelevant but are processed all the same.
Give an SOTA LLM a picture of toothpicks and ask it to move one to make a square, and it will probably struggle and fumble it. But give a mid-size LLM from 2 years ago the same problem in verbal form, and it will nail it almost every time.
That takeaway is, do everything you can to avoid having the LLM need to rely on images for the answer.
The above caption for Apple Vision Pro is for a video that to me, as an Apple Vision Pro user, is discomforting.
More questions are raised than are answered by the short video: Is the user able to fit the Apple Vision Pro by him/herself? What happens when dwelling on a directional control misregisters? Can the user recalibrate the "Eyes and Hands" setting? Dwelling on a control displaces focus and there may be impeding objects in the path of the power wheelchair. Is this really a good idea?
To my sensibility, the video is unsettling (at best), especially given how cumbersome Apple Vision Pro is.
[0] https://www.apple.com/newsroom/2026/05/apple-unveils-new-acc...
Through that lens, this all looks a bit performative to me, but again, maybe I'll be pleasantly surprised.
The one thing I'm mildly excited to see is the improvement to Voice Control, as guessing what the programmatic name of a button is or having to constantly use a numbers grid to target elements doesn't sound fun.
To respond to what I see in some of the comments:
- On speech rate: It does take quite a bit of practice to crank up the speech rate and there's a degree of retraining you need to do when you switch voices. A lot of more "human" sounding voices are harder to follow at super high speeds which is why a lot of people prefer more robotic but consistent speech and generally aren't convinced by AI-powered TTS yet; they often fall apart if you raise the speech rate past a certain point. - Re: actually waiting for the target audience's verdict: This is so important. I see more and more companies, individuals etc. talk about accessibility, build accessibility solutions and evangelize AI for accessibility without EVER talking to the people they claim to help. This will almost certainly mean mistakes will be made, up to and including doing more harm than good. If you want to do accessibility right, that includes AI products of any kind, hire people with lived experience or you'll get the equivalent of machine-translated text, hackerproof security in one click or an AI-powered coffee bar that orders thousands of rubber gloves. Coincidental note: I have time for new projects right now :P
https://developer.apple.com/documentation/accessibility/brai...
The other thing is that if you're around others, voice input means you have no privacy. Even if you're not doing anything particularly private, it's a bit awkward and potentially embarrassing. If you use touch input in conjunction with a screen reader, you can be more like a "normal" user in that what you're doing is just between you and your phone.
People talk a lot about how MacOS has gone downhill but I feel like it would have been a good start if developers could continue to patch over Apple's shortcomings like they used to be able to.
I imagine that we would be a few years into a spectrum of tools like this if they didn't lock it down like they do.
Totally aware that plenty of HN commenters are very glad that Apple keeps this locked down. I'm just the other opinion, that's all.
iOS is just painfully good. I can pause a video, put my finger on text inside the video, and copy it. Until they added it, I didn’t even know how much I needed that.
I have fond memories of an old coworker 10 years ago who is blind. He would use his phone no problem, texting, going about his day, he was even on Tinder (credit to Tinder for making their app so accessible long ago). He would commute on his own, walk to the train station, even transfer to another train during peak rush hour. I’m not saying it was all easy for him, but nothing in this video really stood out to me more than what shirt was on the bed. I know other services/apps have long existed to be the “eyes” for people who need support, but this video feels….uneventful?
I may be cynical about this though, as I often hate how Apple’s marketing makes these emotional bids about how life-critical they are to society - which is fair to a degree..but it just feels cheap to be glamorising “look! we saved this person from pending doom, cool right??”
Additionally I don't believe this is just marketing. This is adaption to a changing market. Apple's customer base is aging and having these kinds of features will allow them to keep using Apple products for a longer.
my go to example of this is this talk by Saqib Shaikh (a blind software engineer at Microsoft) giving a talk about Visual Studio. Link is timestamped
I wish more people would watch videos like this just because having a realistic idea of how blind people do certain tasks can help you move from pity or even compassion to a more productive kind of understanding. I think sometimes when you haven't seen it, you can't really even imagine how it can be done.
What really frustrates me is watching/listening to discussion of music, because I am forced to listen to the talking at 1x because the music sounds wrong (and is wrong) at anything other than 1x.
Ideally it should be done while encoding.
Likewise, YouTube’s “premium” feature of not displaying ads is laughable when displaying content is literally an internal browser function.
I pay anyway, because I was going to pay for an on-demand streaming music service anyway.
Maybe it’s just a matter of practice.
It's not rare among the blind in general.
Unless you're completely technologically illiterate, the kind of person who has no idea how to install an app or sign up for an online account, you're probably doing something of the sort.
I'm not even sure what to say, but discoveries like this are why I use hackernews, I'd never have known this otherwise.
I can easily understand Eloquence (the speech synthesizer he's using) at that speed, but I struggled a bit with this one.
You have two modes: "focus mode", where you can edit text in text fields and keys are passed straight to the browser, and "browse mode", where keys move a virtual cursor around the page.
In browse mode, navigating with just arrow keys all the time would be just as slow as you might imagine, so you use single-key keyboard shortcuts to move by role, E.G. to the next heading, button, table or unvisited link.
The keyboard layout is optimized for memorizability and not efficiency, you use the actual arrow keys instead of hjkl for example, but the concepts are eerily similar.
There are a couple of other approaches to solve this problem, Mac OS's Voice Over is much more Emacs-like for example, and each approach has its own pros and cons, but that's definitely one way to do it.
I'm not getting my hopes up though given apple's history with Siri, which is truly awful.
This has been the typical pattern for Apple for the last few years. The flashy features are announced at WWDC, accessibility has a dedicated, earlier press release. Before this practice, accessibility announcements would usually be tucked in some WWDC slide that most people wouldn't even notice.
I just would not wanna promise anything. Except “available for download this Friday“ once the gold master is passing tests.
After a few more years of Thanksgivings and Christmases and Mothers' Days, we'll finally train her up to a reasonable speed lmao.
RIP kid https://youtu.be/fnH7AIwhpik
Even better, fire up Orca (or whatever screenreader application your OS comes with) yourself and try to use your computer while shutting your eyes, kind of eye-opening (no pun intended) what kind of experience these sort of users typically get. And also, you quickly start to understand why they set the speech rate for their voice synthesizer to be so fast, it's almost unbearable navigating applications (and particularly lists) otherwise.
Unfortunately it seems impossible to get all that much funding for accessibility work :/ I wonder what ever happened to the Newton accessibility bus intended to supplement Wayland...
Hm, never heard about it, but now I'm wondering too. I just finished implementing proper accessibility support for my native app toolkit for Linux, macOS and Windows, but only done it for X11 so far, I was just gonna get started with Wayland. What is the accessibility story on Wayland, couldn't people rely on the same protocols as with X11? That was my impression, but haven't really dig into yet.
There are apps I use semi-regularly that less-experienced screen reader users thought were inaccessible, and I couldn't even explain what they were doing wrong from memory. The ways of working around accessibility issues are just so ingrained in me that all I can usually remember is "yeah I did this somehow, but it was six months ago and I have absolutely no idea which specific tricks I needed for this one."
I imagine that for coding it also helps deal with the fundamental problem of an ephemeral stream rather than a persistent document that you can navigate visually in multiple dimensions. Working memory is limited, and getting more text in in a short period of time probably helps you work within that better. I also imagine that working with text via audio all the time gradually stretches and improves memory.
You can show a lot more info on a screen than you can transmit through speech in a short period of time. That doesn't mean you read faster than you listen, just that sighted people essentially use their eyeballs as an "input device" to decide what information to look at.
If there's an object on the screen that you want to examine but that you don't need to click, you can just "navigate to it" with your eyeballs, without ever touching a mouse or keyboard. We don't have that luxury.
This means we need a much more efficient system for navigating what's on the screen, but that only gets you so far. Eventually, the easiest way to deal with this problem is just to increase the bandwidth of your channel, and you do that by increasing the speech rate.
Whether that control you see visually is actually accessible to a blind user is a different matter entirely. Further, it maxes out at 2x, but a blind person would typically screen read at the equivalent of 3-6x.
Related, it seems like YouTube recently paywalled speed increase beyond 2x. Another way in which it's not cheap to lose sight, I guess.
Seems like it would be a win-win to have a user setting to opt out of video in exchange for ungating that feature.
True.
We can frame it even more strongly: "default societal practices actively discriminate against people with disabilities; they intentionally, consciously choose to make life harder for people who're disadvantaged".
Pretty sure there's enough blind people who don't listen to voice at insane speeds, because they listen in their non-native second language or for whatever other reason. What's wrong in using lowest common denominator that's 100% accessible to those people as well as people who want faster speeds? Unlike "too fast", "too slow" doesn't get entirely inaccessible, it's just boring.
Such a random reason to criticize for.
Some blind people listen to things at superhuman speeds, but not all blind people. Using a normal reading speed is a sensible choice for an ad trying to appeal to blind people since you don't want to intimidate those who don't use superhuman speeds.
Going from that to "heh a sighted person made this because it's normal speed" is simply incorrect.
It was the sort of statement an HNer might make to showcase some trivia they have about some other group, but they oversold it.
Yes, for lots of reasons. It takes practice to get up to a high speed with a given TTS. People who go blind later in life are just beginning, and it can take a long time for them to get up to really high speeds. You may also need to reset somewhat when you change from one TTS to another. And blind people's ears are subject to problems just like anyone else's; if your hearing isn't great you may need slower speeds or higher volumes or both. That's why even though most people use screenreaders at much higher speeds, the defaults when you turn on a new device are painfully slow. You have to set a conservative default so people with less experience/worse ears/whatever can get by.
Anyway I don't think it's a criticism. It's just noting that it doesn't depict how most people will use end up using it, and if you're curious about what typical usage sounds like, you should look for another example.
It's like how in videos that teach people a foreign language, everyone speaks slowly and uses simple words, even though native speakers don't talk like that at all. The GP is simply saying that an actual blind person would be way more efficient at it, but they made the video with inefficient settings so sighted people could understand what was going on.