Curious about one thing though, how does it handle switching between languages? I work with both Greek and English daily and local models usually struggle with that.
Great work, bookmarking this.
> The excellent on device capabilities makes one wonder if these are the basis for the models that will be deployed in New Siri under the deal with Apple….
https://www.latent.space/p/ainews-gemma-4-the-best-small-mul...
If I run this with internet connected it works flawlessly. Even if I disconnect my internet afterwards it still goes on working fine.
Why there has to be an internet connection established at the time I open the localhost site when all of this should be working purely on device?
Despite of this, I am really impressed that this actually works so fast with video input on my M4 Pro 48 GB.
For now what I did was:
- Tested in Chrome/Safari/Firefox on Tahoe.
- Followed the quick start install instructions from github repo
- Everything worked
- Closed terminal
- Disconnected internet (Wifi off)
- Opened terminal
- Started server again (uv run server.py)
- Opened localhost in browser, it asked for camera/mic normally, granted access, saw camera live feed but "loading..." at bottom center of the site and AI did not listen/respond
- Reproduced this about 3 times with switching between wifi on/off before starting the server, always the same (working with internet; not working without)
- Figured it also works fine if I start the server with internet connected and disconnect it afterwards
Gemma 4 is kinda too heavyweight even with E2B. I am sticking with qwen 0.8B at the moment.
I’ve been looking for a good video summarizing / understanding model!
In regards to the video capability, I haven't tested it myself, but here's a benchmark/comparison from Google [0]
sure, maybe it's still frame-by-frame but so fast and so often that the model retains a rolling context of what's going on and can answer cleanly temporal questions.
"how packages were delivered over the last hour", etc.
Video: https://www.youtube.com/live/WuCxWJhrkIM
Generated writeup: https://taonexus.com/publicfiles/apr2026/pirate-gemma-journa...
"You will have to unlock your iphone first" is kind of a deal-breaker when you are in the middle of mixing polyurethane resin and have gloves and a mask on.
More and more I find that we have the technology, but the supposedly "tech" companies are the gatekeepers, preventing us from using the technological advances and holding us back years behind the state of the art.
I'll be trying this out on my Macbook, looks very promising!
It's clear Tim Cook doesn't ever try to use Siri wearing gloves. Or ever, for that matter :-)
Yes same with RSS readers being dropped by large companies. Worked too good I guess!
It's truly absurd how the Google voice assistant USED to work properly for setting timers, playing music, etc, and then they had to break it 15 times and finally replace it with much slower AI that only kinda does what you want. I'm done.
Selfhosted is the way to go if you want to keep your sanity. My wife has basically given up on any Google/Apple voice assistants being able to do anything useful above "set a 10 minute timer".