SANA-WM, a 2.6B open-source world model for 1-minute 720p video
30 points by mjgil 2 hours ago | 12 comments

mejutoco 22 minutes ago
They all look like video games. I guess Unreal Engine is used to create synthetic data for training.
reply
pferdone 41 minutes ago
First video with the guy walking the mountain in snow has consistency issues with the cave entrance. Which is "expected" at this model size?!
reply
Leonard_of_Q 22 minutes ago
Most videos seem to have some issues like that, e.g. the book on the table in the library video takes up different shapes every now and then.

The 'Refiner' effect seems to do the opposite if the examples are representative as in all cases the 1-st stage images look better than the 'refined' ones. Less clutter, more realistic, less 'cowbell' for those who know the phrase.

reply
Fischgericht 57 minutes ago
So, where is the download? I can't find it on Github, and on your web page the download button is disabled.

Also, will this run on RTX 4090 with 24GB memory?

Thank you!

reply
mjgil 54 minutes ago
Scroll down and there are more videos --- seems like models will be there "soon".
reply
bobkb 54 minutes ago
The trouble is the lack of training available to these models compared to the ones like Seedance and Kling who seems to be tapping into their unlimited video inventory. Many models like LTX is technically good but when it comes to slightly different camera movements or the subject interacting with objects they struggle. For a recent example we had to use sample videos generated by closed source models and then use the same for final video.
reply
vessenes 48 minutes ago
I tend to think of these NV Labs models as architectural demos and ‘free razor blades’ — they’re more intended to inform internal R&D, get customers something that lets them do what they want quickly, and enhance the state of the art.

In this case, what looks interesting is the one minute coherence and the massive speedup - they claim 36x over open models with similar capabilities. You can tell they aren’t aiming for state of the art visuals — looks very SD 1.5 in terms of the output quality.

reply
jaspanglia 2 hours ago
The most exciting part is that it’s open-source — innovation is going to compound fast.
reply
rvz 2 hours ago
Given that is where everything is going, why not just get there faster by open-sourcing Seedance 2.0, Happyhorse, Veo 3 and all the others.
reply
mjgil 2 hours ago
[flagged]
reply
pferdone 43 minutes ago
Who wrote your comment?
reply
semiquaver 42 minutes ago
Stop posting slop.
reply
mjgil 38 minutes ago
less security issues with slop
reply