Everything runs locally on your machine, nothing gets uploaded. No accounts, no subscriptions, no telemetry.
It ships as a single binary for Linux, macOS, and Windows. On first launch it sets up its own isolated Python environment and downloads the ML models it needs - no manual installation of dependencies required.
My two biggest drivers for the creation of this were:
The lack of karaoke coverage for niche, avant-garde, and local tracks.
Nostalgia for the good old cheesy karaoke backgrounds with flowing rivers, city panoramas, etc.
Some highlights: Stem separation using the UVR Karaoke model (preserves backing vocals) or Demucs
Automatic lyrics via WhisperX transcription, or fetched from LRCLIB when available
Pitch scoring with player profiles and scoreboards
Gamepad support and TV-friendly UI scaling for party setups
GPU acceleration on NVIDIA (CUDA) and Apple Silicon (CoreML/MPS)
Built with Rust and the Bevy engine
The whole stack is open source. No premium tier, no "open core" - just the app.Feedback and contributions welcome.
Some quick feedback:
- Needs a way to skip for-/backwards during playback to validate the result
- Sentences seem to be recognized (first letter has uppercasing), but periods aren't added
- Needs an option to edit results from the track analysis
Thanks for keeping it FOSS!indeed, I'm running to two problems on the analyzer side: 1. align model sliding off (especially w/ chorus/back vocals present) 2. transcript skipping parts of lyrics in lyrics-heavy tracks (I tried a lot of russian rap, lol)
happy for contributions as I'm not that experienced w/ machine learning side of the project, mostly it was emperical "tweak the parameters and look what is changed"
Questions for you:
1. What CUDA capability level is necessary for Nvidia GPU accelleration to work?
3. Are there any plans to support iGPU/NPU accelleration on AMD and Intel? Asking because those chips are most common in the mini computers sold at low cost these days.
My family members who love Karaoke and will be happy to try this. Looking forward to it!
AI is making whole categories of projects viable that simply weren't before. Not because they were technically impossible, but because they were too time-consuming for a niche audience to justify the effort.
Thanks for the cool project! (testing now)
The homepage still exists, but it looks like many of the other pages like the blog and wiki are long gone. It hasn't been active in probably over a decade.
I've worked on a small toy project with a similar purpose in the past [1], though it's not nearly as polished as yours, and I've made some questionable decisions here and there.
I have questions about pitch tracking. It seems you do track the pitch for scoring, and there's a line at the top of the screen that seems related but that I can't figure out. For my use case, an important feature of karaoke apps is displaying how "high" the next note should be sung, or at least some hints. Is it something your app can do and I just haven't figured it out? Or would it be a feature request?
Would it be possible to process songs on one device, and then use the result in another, or even multiple? Or would it be possible to run as separate server / client?
I ask mainly because the device I connect to my TV is definitely not the most powerful one, so it would be nice if I can preprocess the songs elsewhere.
you can use + / - buttons on the keyboard to change the level of guidance according to your preference, generally there is a controls legend in the top right corner
There’s also a program for automatically downloading the songs: https://github.com/bohning/usdb_syncer
A couple of immediate small pieces of feedback:
* The colour scheme on the queue/nn% buttons is really low contrast - white on pale yellow is very hard to read
* the 'models' button (bottom left) - I assumed this would give me details about which models are available, and the sizes, but instead deleted the downloaded models without warning. Maybe add a 'are you sure you want to...' check?
You can do this from their huge catalog of songs, using their official app or their web client: https://www.karafun.com/web/
Plus, they have music quizzes you can play with many people using smartphones as remote controls. It's super fun for parties where people don't want to sing all the time.
Impressive, very nice. Now let's see my death metal collection.
Just joking! Very nice, thanks for open-sourcing it.
We already do this for ingesting podcasts and cutting their clips with text being highlighted as people speak. AssemblyAI also supports speaker diarization.
For videos recorded using our own livestreaming studio, we can bypass all this by using Web STT and TTS APIs resulting in perfect timing and diarization without the need for server side models.
How come this is trying to install its own vendored dependencies, including executable binaries, instead of checking for what's already installed? That approach can lead to both security and performance issues.
Edit: the Python download isn't failing, but rather the application itself is looking for the executable interpreter in `lib` rather than `bin` once the download completes. I built the release tarball in the git repo, and I'm pretty amazed that such a basic error could make it into release code.
Further edit: I tried using the build script in the tarball rather than just doing a `cargo build -r`, and it started trying to install Docker containers! Docker to build a desktop application! What is going on here?
Which is to say, I don't blame the author for wanting a single installation that his app can manage and rely on, even though I wish it was different.
Plenty of software come with their own Python runtime. Even Blender uses its own Python runtime. I can name so many apps with embedded Python runtime: Blender, Houdini, Bitwig, Substance Painter, Krita, etc. Checking for what's already installed isn't the norm. In Krita's case, it uses installed Python to build it... and in the building process it builds another Python runtime for its own!
This app should have probably bundled the runtime instead of downloading a new one though.
> install its own vendored dependencies
> lead to both security and performance issues
npm install and pip -r theoretically have the same kind of security issue. How many projects on github run this kind of command during build process? My guess is in the order of millions.
And thankfully it's not how it works. If it were it'd break plugin ecosystems of many apps completely.
Just yesterday, I went to try out some cool new AI thing that was here on the front page of HN. It's written in Python. Great, I thought, that means I can put it into a virtualenv and just rm the whole tree when I'm done and my system will be exactly in the same state it was previously.
But sadly... no... the first time I ran it, this Python program started downloading and installing Node/NPM, and all kinds of other stuff to my machine WITHOUT even asking for permission. Sorry app developers, but my machine and my home directory are my workplace. They are curated property, you are NOT allowed to just install whatever you wish.
I expect this kind of behavior from programs whose only supported installation method is a curlpipe. (And I do avoid those.) I do not expect it from programs that claim to be installable by pip, or ship their own binaries. These NEED to be called out as vulnerable to supply-chain attacks at worst and extremely disrepectful to users at best.
"Why does this new software do X?" is probably answered by "the vibe worked on my system"
I've been sympathetic to your viewpoint, and I can see why this kind of thing is becoming more common.
The idea that users can reliably supply their own vendor libs/execs for applications is a bit of a fantasy. Devs working on fixing issues caused by the user having a strange issue due to the version of Python or whatever that they have installed is largely a waste of time when the application can "simply" ship with the exact dependencies it expects. This is especially true when it comes to open source work. Dealing with weird edge cases because the user has a version of FFMPEG installed that, for whatever reason, is missing h264, is work that nobody asked for. Given that the audience of this kind of app is a general one (not specific at all to devs) then it doesn't make sense to require other system packages to be present; if things like Python and FFMPEG are not required and will be downloaded anyway as part of the app install process, then there's no point in not always doing that. If you think about it, it's hardly different from any other sort of software dependency. The dependencies are just relatively bigger.
Personally, I have no desire for my applications to use other executables on my system unless I request that they do so explicitly. I'm sympathetic to the idea from a mere efficiency perspective, especially when it comes to developer tooling. But a karaoke app? No offense, but why care? A Python interpreter will be anywhere between 50 and 200 megabytes. FFMPEG is even smaller, especially if you don't enable every single feature and codec. Compared to how ridiculously bloated your average basic mobile app is (without anything like a built in JIT), bundling a desktop application with something like Python provides a lot of power relative to the number of bytes added.
That's why package managers and OS repos exist. Users shouldn't have to even be aware of this sort of stuff. In this case, though, when the application starts trying to download and install its own dependencies at runtime, instead of everything already being sorted out at build time, the user is made aware of dependency resolution, and now has to deal with the issues involved.
> This is especially true when it comes to open source work. Dealing with weird edge cases because the user has a version of FFMPEG installed that, for whatever reason, is missing h264, is work that nobody asked for.
And that's what config tests at build time solve for, and have solved for decades.
Because the person who vibecoded this had no idea they should have been doing that.
That said, an optional “use system environment if available” mode could make sense for advanced users. A PR for that would be welcome, as long as it also handles the real complexity involved: platform differences, Python package compatibility, GPU backends, and missing system/compiler flags.
That's not a very reasonable justification, considering that dynamic linking of dependencies has been industry standard in software designed for "non-technical users" for the past thirty years or so, and is basically a solved problem.
I can understand having a downloadable archive that already includes things like FFMpeg and Python for Windows users (with everything already included in appropriate locations, so no runtime downloads necessary).
But this is an especially bad practice for Linux, since most of the vendored dependencies are already installed by default on pretty much every Linux distro, and package managers are designed to sort out and install appropriate dependencies on behalf users, so that the "non-technical" among them aren't exposed to the massive risks of having application software retrieve and execute arbitrary binaries from the internet.
The only thing it somewhat makes sense for would be the AI models it's retrieving, but even that ought to be implemented via a separate download/update script and not just baked into the main application runtime without even prompting the user that it's about to download a huge dataset.
> A PR for that would be welcome, as long as it also handles the real complexity involved: platform differences, Python package compatibility, GPU backends, and missing system/compiler flags.
These are the sort of things that config scripts at build time are designed to handle. It's already using Cargo here, which should be able to handle all of this just fine, so it's very perplexing to see that it isn't being used for this purpose, and what should be build-time dependency resolution is instead being palmed off to the application itself at runtime. That is an extremely strange -- and potentially dangerous -- approach.
So it has, and I've been hating the excess complexity it brings for most of that thirty years! I'm glad to see the recent swing back toward self-contained executables. Where this author went wrong was not in vendoring the app's exact dependencies, which is a good idea, but in trying to download them and install them separately on first launch, rather than including them in the app bundle, where they can remain isolated from the rest of the system.
It reduces complexity compared to the administrative and security mess of every application having its own version of every library, let alone its own version of external tools and interpreters.
> I'm glad to see the recent swing back toward self-contained executables.
I wish there was one. Static linking is a great solution for this. Instead, we're seeing dynamically linked libraries being bundled alongside of executables in a way that increases complexity vastly.
> Where this author went wrong was not in vendoring the app's exact dependencies, which is a good idea, but in trying to download them and install them separately on first launch, rather than including them in the app bundle, where they can remain isolated from the rest of the system.
Agreed. Dependency resolution at build time is normal. Dependency resolution in user mode at runtime is crazy.
"Normal" users wouldn't even encounter anything here, as they'd just install prebuilt binary packages with all of the dependencies already sorted out. As things stand, the application trying to install its own dependencies at runtime is creating a whole new class of user-facing issues to generate escalations (such as the app's failure to locate the Python interpreter it itself had just installed).
There is more risk in the shenanigans people who package software for distros do. Kdenlive suffered from big damage to their reputation due to all of the crashes packagers added by using incorrect versions of dependencies.
>"Normal" users wouldn't even encounter anything here, as they'd just install prebuilt binary packages with all of the dependencies already sorted out.
That's the benefit of just shipping what the developer released instead of swapping out dependencies under developers' feet.
thanks for your feedback and reports, I'd be happy if they are added as issues on github.
as said in the separate comment, I really wanted an app to be as "grandma-proof" as possible, therefore I really wanted to have one binary that does the magic for you. it's a karaoke app, not a tool that is aimed at engineers.
we can indeed look at the local packages before downloading an executable, it's just not done yet but might be added in the future.
I've built this project out of passion and it's 100% open-source and free, so please keep this in mind when criticizing.
Probably the best way to do that is to design, build, and distribute it like any other normal desktop application, and not come up with idiosyncratic and experimental methods for invoking bog-standard libraries and language interpreters.
On Windows, just include the necessary binaries as part of the application distribution itself, in hardcoded paths, without any runtime download of executables from unclear sources.
On Linux, use system defaults resolved at build time through a normal config script -- any "grandma users" on Linux will end up installing from distro repos, AppImage, Flatpak, etc, all of which have their own methods for handling dependencies, and is definitely not something the application should be trying to do by itself post-install.
I'm not experienced in building desktop apps per-se, so I went with the thing that looked reasonable to me. all your comments are valid tho. I'll take a look how can I resolve this in the future.
cheers!
If someone on here would direct at me the insinuation that a flaw in my software was the result of me having "no idea" about what I am doing, we would not be having a civil discussion.
In my view personal attacks should be flagged, but I don't have that ability because my account does not have enough Karma.
If you can't tell AI slop from handwritten code, that's your problem. I won't censor myself because of your opinions.
Personal attacks are still against the rules of this site, and that's why you, or in this case the commenter before you, should have censored themselves. This is not a matter of opinions.
Besides, even before LLMs, it's not like anyone ever said "you shouldn't have open sourced this, we can't learn from your code". We just didn't bother reading that code.
Telling off people who contribute is not OK.
It's a useful tool and I built it myself, with my own ten fingers, using my brain. That's more than vibe coders will ever do.
Meanwhile, your blog says in big text "I don't care for the joy of programming", so I don't consider your opinions on software development anywhere near relevant.
always assumes internet is connected
always assumes everything is trusted