SANA-WM, a 2.6B open-source world model for 1-minute 720p video
30 points by mjgil 2 hours ago | 12 comments
mejutoco 22 minutes ago
They all look like video games. I guess Unreal Engine is used to create synthetic data for training.
replypferdone 41 minutes ago
First video with the guy walking the mountain in snow has consistency issues with the cave entrance. Which is "expected" at this model size?!
replyLeonard_of_Q 22 minutes ago
Most videos seem to have some issues like that, e.g. the book on the table in the library video takes up different shapes every now and then.
replyThe 'Refiner' effect seems to do the opposite if the examples are representative as in all cases the 1-st stage images look better than the 'refined' ones. Less clutter, more realistic, less 'cowbell' for those who know the phrase.
Fischgericht 57 minutes ago
So, where is the download? I can't find it on Github, and on your web page the download button is disabled.
replyAlso, will this run on RTX 4090 with 24GB memory?
Thank you!
bobkb 54 minutes ago
The trouble is the lack of training available to these models compared to the ones like Seedance and Kling who seems to be tapping into their unlimited video inventory. Many models like LTX is technically good but when it comes to slightly different camera movements or the subject interacting with objects they struggle. For a recent example we had to use sample videos generated by closed source models and then use the same for final video.
replyvessenes 48 minutes ago
I tend to think of these NV Labs models as architectural demos and ‘free razor blades’ — they’re more intended to inform internal R&D, get customers something that lets them do what they want quickly, and enhance the state of the art.
replyIn this case, what looks interesting is the one minute coherence and the massive speedup - they claim 36x over open models with similar capabilities. You can tell they aren’t aiming for state of the art visuals — looks very SD 1.5 in terms of the output quality.
jaspanglia 2 hours ago
The most exciting part is that it’s open-source — innovation is going to compound fast.
reply