I also wonder how many hours were wasted by people who had to use inferior technology because Disney kept it secret. Cutting out animals and objects from the background 1 frame at a time seems so mindnumbingly boring.
The splitter would have to be behind the lens, so it would require a custom camera setup (probably a longer lens-to-sensor distance than most lenses are designed for too), but I can't think of any other issues.
The Disney process had the filter essentially built into the beam splitter, but afaik, nobody knows how to make that happen again (or nobody who knows how, knows it's a desirable thing). Seems like the optics might be cumbersome, but the results seem wortwhile.
Also, you need still need careful lighting, you don't want your foreground illuminated by sodium vapor, but I wonder if you could light the background screen from behind (like a rear projection setup) to reduce the amount of sodium vapor light that reflects from the foreground to the camera.
https://accucoatinc.com/technical-notes/beamsplitter-coating...
I have no idea about that specific company I just picked it after a search for "beam splitter"
After I saw that video on the sodium vapor illumination process I was curious as to what if you could instead use near-IR light as the mask illumination. In theory you would have a perfect mask(as in the disney process) and no color interference. I found that frequency gated beam splitters are a fairly common scientific instrument.
As for the IR idea, I wonder if there's something like a crowdfunding/crowdsourcing site for ideas where the person who had the idea doesn't really want to do it, but leaves it open to others to try. You said you "don't really have the budget to try it out", but let's say even if you had the money, it wouldn't be a priority for you, as you're not an expert or you have better things to do or whatever. Is there a place to just shout ideas into and see if any market-oriented entity would take it upon themselves to try doing it? Besides forums full of ideas like "tinder but for X" and such crap? Because, imagine if your idea really is a great one. A couple hours from now it would be buried in HN.
I suspect the real reason is that digital green screen in the hands of experienced people is "good enough" vs the complication of needing a double camera and beam splitting prism rig and such.
I could also imagine using polarized light as the backdrop as well.
Its a transformer, with a CNN refiner after. Specifically, a ViT using the Hiera architecture (https://github.com/facebookresearch/hiera)
The Hiera ViT has dual decoder heads, one for the alpha and one for the RGD foreground, and then a small CNN refiner network to solve some artifacting in the output from the Hiera model.
I'd be very interested to see a long form tech talk of Niko explaining his process of learning ML ropes and building this model.
It saddens me that we're wasting so much of that potential on those stupid stochastic parrots that solve all those non-problems that no one has ever had. It saddens me even more that so many people are absolutely sure that LLMs are "smart", or that they can "think", or even that they're somewhat conscious. And that even if they're not quite that, one more order of magnitude of scale will definitely give us an AGI. Oh that didn't help? Then one more, that will definitely be it.
One real problem that LLMs have solved is that they made natural language processing as a discipline obsolete. They also usually don't suck at summarizing long texts, except when they sometimes do. But that's it, really.
An alternative approach (such as that used by the sodium lighting on Mary Poppins) is that you create two images per frame -- the core image and a mask. The mask is a black and white image where the white pixels are the pixels to keep and the black pixels the ones to discard. Shades of gray indicate blended pixels.
For the mask approach you are filming a perfect alpha channel to apply to the footage that doesn't have the issues of greenscreen. The problem is that this requires specialist, licensed equipment and perfect filming conditions.
The new approach is to take advantage of image/video models to train a model that can produce the alpha channel mask for a given frame (and thus an entire recording) when just given greenscreen footage.
The use of CGI in the training data allows the input image and mask to be perfect without having to spend hundreds of hours creating that data. It's also easier to modify and create variations to test different cases such as reflective or soft edges.
Thus, you have the greenscreen input footage, the expected processed output and alpha channel mask. You can then apply traditional neural net training techniques on the data using the expected image/alpha channel as the target. For example, you can compute the difference on each of the alpha channel output neurons from the expected result, then apply backpropagation to compute the differences through the neural network, and then nudge the neuron weights in the computed gradient direction. Repeat that process across a distribution of the test images over multiple passes until the network no longer changes significantly between passes.
But now that is problem is solved a director will come along and say... I want a scene with a big glass of water and the camera will zoom in on it and will see the monster refracted through the glass.
That distortion to the new background would have to be added in by the artist.
They train their model in a pretty straightforward way, it can also be used to capture the distortion as well, just use a non-monochrome (possibly moving) background optimized for this. It's a matter of effort and attention to detail during training (uneven green screen lighting, reflections, etc), not fundamental impossibility
Since they added it a year or so ago it has been game-changing. I'm cutting out portraits every day and having a magical tool that cuts out the subject with perfect hair cut out with a single click is sci-fi.
Here's a demo of Photoshop's tool:
https://www.youtube.com/watch?v=SNVJN6PKeGQ
(the other magical Photoshop tool is the one that removes reflections from windows, which is even more insane when you reverse it and tell it you only want the reflection and not what's on the other side of the glass)
Still Python unfortunately.
https://en.wikipedia.org/wiki/ZCam (Demo: https://www.youtube.com/watch?v=s7Kcmx29RCE )
The problem is that the vast majority of people on set have no clue what is going on in post. To the point, when the budget is big enough, a post supervisor is present on production days to give input so "fixing it in post" is minimized. When there is no budget, you'll see situations just like in the first 30 seconds of TFA's video. A single lamp lighting the background so you can easily see the light falling off and the shadows from wrinkles where the screen was just pulled out of the bag 10 minutes before shooting. People just don't realize how much light a green screen takes. They also fail to have enough space so they can pull the talent far enough off the wall to avoid the green reflecting back onto the talent's skin.
TL;DR They solved something to make post less expensive because they cut corners during production.
FWIW having watched the entire thing, they never blamed bad production staff or unavoidable constraints. Those are things that anyone working with others experiences when making anything, whether it's YouTube videos or enterprise software products. My TLDR is: "Chroma keying is an fragile and imperfect art at best, and can become a clusterf#@k for any number of reasons. CorridorKey can automatically create world-class chroma keys even for some of the most traditionally-challenging scenarios."
From their own 'LLM handover' doc: https://github.com/nikopueringer/CorridorKey/blob/main/docs/...
> Be Proactive: The user is highly technical (a VFX professional/coder). Skip basic tutorials and dive straight into advanced implementation, but be sure to document math thoroughly.
You don't hear architects get hounded because they say they "built" some building even though it was definitely the guys swinging hammers that built it. But yet, somehow because he didn't artisanally hand-craft the code, he needs to caveat that he didn't actually build it?
theory: make the mask out of non-visable light
illuminate the backing screen in near Infra-Red light. (after a bit of thought I chose near-IR as opposed to near-UV for hopefully obvious reasons)
point two cameras at a splitting prism with a near IR pass filter(I have confirmed that such thing exists and is commercially available)
Leave the 90 degree(unaltered path) camera untouched, this is the visible camera.
Remove the IR filter from the 180 degree(filter path) camera, this is the mask camera.
Now you get a perfect non-color shifting mask(in theory), The splitting prism would hurt light intake. It might be worth it to try putting the cameras really close together , pointed same direction, no prism, and see if that is close enough.
The sodium vapor light process was the best tech in the 1950s, Sodium vapor lights were used because they deliver a very pure single wavelength light. But we can do better now. leds natively illuminate with a single wavelength(and we have to put a lot of engineering into them not doing this) and we have cameras that can view frequencies that the eye cannot. put this together and in theory you can do the single frequency illuminated backing sheet mask(green screen) with a frequency that is not visible to the human eye and therefore does not interfere with any of the colors in the final shot.
Camera sensors can pick up a little near-IR so they have have a filter to block it. If that filter was removed and a filter to block visable light was used in place you would have a camera that can only see non-visable light. Poorly, the camera was not engineered to operate in this bandwidth, but it might be good enough for a mask. A mask that does not interfere with any visible colors.
At least for cheap sensors in phones and security cameras that engineering consists of installing an IR filter. They pick it up just fine but we often don't want them to.
Keep in mind that sensors are inherently monochrome. They use multiple input pixels per output pixel with various filters in order to determine information about color.
The sensitivity to red light decreases quickly at wavelengths greater than 650 nm, but light can still be perceived if it is strong enough, up to around 780 nm.
Many so-called near-IR LEDs may actually be somewhere around 750 nm, so they are still visible on a dark background, even if they are perceived as extremely dim.
On the other hand, there are many near infrared LEDs around 900 nm and those are really invisible. Near-infrared LEDs around 1300 nm or around 1550 nm are also completely invisible.
An invisible near-infrared laser beam could become visible due to double-photon absorption, but if a beam of such intensity as to cause double-photon absorption hits your retina, there are more serious things to worry about.
Shoot the scene in 48 or 96 fps. Sync the set lighting to odd frames. Every odd frame, the set lights are on. Every even frame, set lights are off.
For the backing screen, do the reverse. Even frames, the backing screen is on. Odd frames, backing screen is off.
There you go. Mask / normal shot / Mask normal shot / Mask ... you get the idea.
Of course, motion will cause normal image and mask go out of sync, but I bet that can be remedied by interpolating a new frame between every mask frame. Plus, when you mix it down to 24fps you can introduce as much motion blur and shutter angle "emulation" as you want.
- It'll bleed on fast motion. Hair in the wind would just not work.
- Incandescent lights are out.
You could solve both by having two ghost frames shot very close to the real frame (no need to evenly space the frames, after-all) and using strobing a high powered laser.
You'd need very fast sensor or another one optically on the same position.
In any case, if you actually have a scene bright for 1/24th of a second and then dark for 1/24th of a second, repeating, you're well within photosensitive epilepsy range. Don't do that to your actors unless you've discussed it with them and with your insurance company first.
( https://www.nukepedia.com/tools/gizmos/time/vectorframeblend... )
Incandescent lights flicker at twice your AC power frequency -- to a decent approximation, their power is proportional to V^2. But this is input power -- the cooling of the filament is slowish and the modulation depth is low. Most people aren't bothered by this.
Fluorescent lights with old or very crappy "magnetic" ballasts flicker at twice the mains frequency, with deep modulation. The effect on people varies from moderate to extremely unpleasant, and it's extra bad if anything is moving quickly (gyms, etc). There are even studies showing that office workers perform worse under such lighting even if they don't experience personally perceptible symptoms. The effect is so severe that people invented the "electronic ballast", which flickers at much, much higher frequency and avoids low-frequency components. Phew. (The light might still be a nasty color, but the temporal output is okay.)
"Driverless LEDs" are deeply modulated at twice the mains frequency. These are very nasty.
If you actually have a light that flickers at the AC power frequency (certain LED sources in a two-brightness diode-dimmed kitchen appliance fixture will do this, as will driverless LEDs with certain types of failures), then it's extra nasty.
There are plenty of people around who find (depending on the actual waveform) 60Hz flicker intolerable and 120Hz flicker extremely unpleasant. And there are plenty of people who can often perceive flicker under appropriate circumstances up to at least several hundred Hz and even into the low kHz with certain shapes of light sources. You can read up on IEEE 1789 to find a standard based on actual research on what lighting waveforms should look like.
The effect of 120 Hz flicker is bad enough that energy codes in some places (e.g. California) have started to require that LED sources minimize this flicker, but, sadly, it's poorly enforced.
IIRC, the end that's negative looks orange, because the electrons emitted from the filament haven't gotten up to speed yet and can't ionize the mercury atoms at that end to the highest states.
If you didn't do this, you'd see 60 Hz strobing when you looked at one end.
Anyway, an old HN submission I still use when buying light bulbs: https://news.ycombinator.com/item?id=14023196
Artifacts?
I bet that can be remedied by interpolating a new frame between every mask frame. Plus, when you mix it down to 24fps you can introduce as much motion blur and shutter angle "emulation" as you want.
Motion blur can also be very forgiving. You are more likely to notice artifacts in still or slow moving scenes and then the problem goes away.
There were a large number of lights around it and each one was blinked on for an instant while the camera shot at an insanely high frame rate - something like 288 frames per second with twelve lights.
This meant that after the fact you could pick any one of the twelve frames for that 1/24th of a second, to choose the angle the light was hitting at.