It Took Me 30 Years to Solve This VFX Problem – Green Screen Problem [video]

(youtube.com)

120 points | by yincrash 4 days ago

12 comments

Springtime 2 hours ago
In an earlier video they made a couple years back about Disney's sodium vapor technique Paul Debevec suggested he was considering creating a dataset using a similar premise: filming enough perfectly masked references to be able to train models to achieve better keying. So it was interesting seeing Corridor tackle this by instead using synthetic data.
[-]
- somat 2 hours ago
  With regards to the sodium vapor process, an idea has been percolating in the back of my head ever since I saw that video. But I don't really have the budget to try it out.
  theory: make the mask out of non-visable light
  illuminate the backing screen in near Infra-Red light. (after a bit of thought I chose near-IR as opposed to near-UV for hopefully obvious reasons)
  point two cameras at a splitting prism with a near IR pass filter(I have confirmed that such thing exists and is commercially available)
  Leave the 90 degree(unaltered path) camera untouched, this is the visible camera.
  Remove the IR filter from the 180 degree(filter path) camera, this is the mask camera.
  Now you get a perfect non-color shifting mask(in theory), The splitting prism would hurt light intake. It might be worth it to try putting the cameras really close together , pointed same direction, no prism, and see if that is close enough.
  [-]
  - overvale 2 hours ago
    Debevec tried a version of this: https://arxiv.org/abs/2306.13702
  - wiml 1 hour ago
    This approach was used in the 1950s/60s with ultraviolet light (rather than IR) to create a traveling matte. I'm not sure why visible-light techniques won out. Easier to make sure that the illumination is set up correctly, maybe?
  - actionfromafar 2 hours ago
    I'll do you one better, which requires no special cameras (most have IR filters) nor double cameras or prisms.
    Shoot the scene in 48 or 96 fps. Sync the set lighting to odd frames. Every odd frame, the set lights are on. Every even frame, set lights are off.
    For the backing screen, do the reverse. Even frames, the backing screen is on. Odd frames, backing screen is off.
    There you go. Mask / normal shot / Mask normal shot / Mask ... you get the idea.
    Of course, motion will cause normal image and mask go out of sync, but I bet that can be remedied by interpolating a new frame between every mask frame. Plus, when you mix it down to 24fps you can introduce as much motion blur and shutter angle "emulation" as you want.
    [-]
    - ryandamm 1 hour ago
      This is called “ghost frame” and already exists in Red cameras and virtual production wall tools like Disguise.
    - amluto 1 hour ago
      Surely this makes your actors feel sick? And wouldn’t it make your motion blur look dashed and also cause artifacts at the edge of the mask if there’s a lot of motion?
      [-]
      - throwway120385 1 hour ago
        You could strobe at some multiple of the sensor frame rate as long as your strobes are continuous through the integration period of the sensor and the lighting fades very quickly. This probably wouldn't work with incandescents but people strobe LEDs a lot to boost the instantaneous illumination without going past the continuous power rating in the datasheet.
      - kibibu 1 hour ago
        Incandescent and fluorescent lights already flicker at your AC power frequency. Just gotta be higher than that
        [-]
        joecool1029 10 minutes ago
        phosphors and capacitors are a thing that mask that, so is high frequency switching way above this rate…
        Anyway, an old HN submission I still use when buying light bulbs: https://news.ycombinator.com/item?id=14023196
  - diacritical 2 hours ago
    Don't humans and other warm objects also radiate IR?
    [-]
    - somat 2 hours ago
      That is far-IR, thermal stuff, Near-IR, 700 nanometer-ish is right below red in human vision.
      Camera sensors can pick up a little near-IR so they have have a filter to block it. If that filter was removed and a filter to block visable light was used in place you would have a camera that can only see non-visable light. Poorly, the camera was not engineered to operate in this bandwidth, but it might be good enough for a mask. A mask that does not interfere with any visible colors.
      [-]
      - fc417fc802 44 minutes ago
        > Poorly, the camera was not engineered to operate in this bandwidth
        At least for cheap sensors in phones and security cameras that engineering consists of installing an IR filter. They pick it up just fine but we often don't want them to.
        Keep in mind that sensors are inherently monochrome. They use multiple input pixels per output pixel with various filters in order to determine information about color.
      - throwway120385 1 hour ago
        You can actually dimly perceive near-IR LEDs -- they'll glow slightly red in darkness.
        [-]
        diacritical 51 minutes ago
        I remember reading some people can perceive some near IR, but mostly that near-IR LEDs actually leak some red themselves due to imperfections in manufacturing or something?
dylan604 1 hour ago
The sad thing about this is the problems encountered during post from the production team saying "fix it post" during the shoot. I've been on set for green screen shoots where the lighting was not done properly. I watched the gaffer walk across the set taking readings from his meter before saying the lighting was good. I flip on the waveform and told him it was not even (which never goes down well when camera dept tells the gaffer it's not right). He put up an argument, went back and took measurements again before repeating it was good. I flipped the screen around and showed him where it was obviously not even. A third set of meter readings and he starts adjust lights. Once the footage was in post, the fx team commented about how easy the keys were because of the even lighting.
The problem is that the vast majority of people on set have no clue what is going on in post. To the point, when the budget is big enough, a post supervisor is present on production days to give input so "fixing it in post" is minimized. When there is no budget, you'll see situations just like in the first 30 seconds of TFA's video. A single lamp lighting the background so you can easily see the light falling off and the shadows from wrinkles where the screen was just pulled out of the bag 10 minutes before shooting. People just don't realize how much light a green screen takes. They also fail to have enough space so they can pull the talent far enough off the wall to avoid the green reflecting back onto the talent's skin.
TL;DR They solved something to make post less expensive because they cut corners during production.
[-]
- weinzierl 48 minutes ago
  I fully agree but I think for them making it possible to cut corners during production is the whole point. Think about it: The choice is between 5 minutes of work plus a one time purchase of a decent GPU and a big room with a complex lighting setup with a post supervisor present. Now, quality of the end result will not be the same, for sure. You and me would opt for the quality setup whenever we can, but many others won't.
  [-]
  - dylan604 24 minutes ago
    If you're on such a low production budget that you just physically do not have the lamps to light a screen, then you really have to ask if green screen is the right option. Maybe flip it and shoot black limbo so you do not need lights, and the lights you do have can be better used as key lights for separation. You also don't have to worry about the color cast from your light screen. Essentially, you just need a garbage matte for the key, and then clean up what might be getting keyed that you don't actually need. Detecting foreground subject from background is so capable now that a screen isn't necessary, and matte clean up is pretty much unnecessary. Of course you lose street cred of not being able to say you used green screen, but who cares as long as the shot works out.
- CharlesW 43 minutes ago
  > TL;DR They solved something to make post less expensive because they cut corners during production.
  FWIW having watched the entire thing, they never blamed bad production staff or unavoidable constraints. Those are things that anyone working with others experiences when making anything, whether it's YouTube videos or enterprise software products. My TLDR is: "Chroma keying is an fragile and imperfect art at best, and can become a clusterf#@k for any number of reasons. CorridorKey can automatically create world-class chroma keys even for some of the most traditionally-challenging scenarios."
- bbstats 18 minutes ago
  ... did you watch the video?
amelius 17 minutes ago
Is it a coincidence that the result is stable between subsequent frames?
diacritical 2 hours ago
From ~04:10 till 05:00 they talk about sodium-vapor lights and how Disney has the exclusive rights to use it. From what I read the knowledge on how to make them is a trade secret, so it's not patented. Seems weird that it would be hard to recreate something from the 1950's.
I also wonder how many hours were wasted by people who had to use inferior technology because Disney kept it secret. Cutting out animals and objects from the background 1 frame at a time seems so mindnumbingly boring.
[-]
- meatmanek 1 hour ago
  The lights are relatively easy to get. iirc (it's been a bit since I watched their full video on the subject[1]) the hard part to find was the splitter that sends the sodium-vapor light to one camera and everything else to another camera.
  1. https://www.youtube.com/watch?v=UQuIVsNzqDk
  [-]
  - aidenn0 1 hour ago
    It would seem to me to be relatively easy to build something like that if you're okay shooting with effectively a full stop less light (just split the image with a half-silvered reflector and use a dichroic filter to pass the sodium-vapor light one one side.
    The splitter would have to be behind the lens, so it would require a custom camera setup (probably a longer lens-to-sensor distance than most lenses are designed for too), but I can't think of any other issues.
  - diacritical 1 hour ago
    Yup, I wanted to say that the prisms are hard to recreate, not the light itself.
- jasonwatkinspdx 2 hours ago
  Yeah, that's just nonsense. We used sodium vapor monochromatic bulbs in my high school physics class to duplicate the double slit experiment.
  I suspect the real reason is that digital green screen in the hands of experienced people is "good enough" vs the complication of needing a double camera and beam splitting prism rig and such.
vsviridov 2 hours ago
The community has managed to drastically lower hardware requirements, but so far I think only Nvidia cards are supported, so as an AMD owner I'm still missing out :(
comex 2 hours ago
See also this video comparing Corridor Key to traditional keyers:
https://www.youtube.com/watch?v=abNygtFqYR8
amelius 1 hour ago
There's still a bug: the glass with water does not distort the checker pattern in the background at 24:12.
[-]
- jweir 55 minutes ago
  True, but with visual art there is what is correct and what looks correct. When things are moving and the area small no one is going to notice.
  But now that is problem is solved a director will come along and say... I want a scene with a big glass of water and the camera will zoom in on it and will see the monster refracted through the glass.
  [-]
  - gmueckl 33 minutes ago
    At that point it's better to do the glass entirely in post.
- catapart 47 minutes ago
  I wouldn't call it a bug. This is a first step, not a final step. Maintaining the refraction might be more realistic, but it's not necessarily what the creator wants.
- orbital-decay 55 minutes ago
  Sure, because they used monotone backgrounds and never really captured any distortion.
- CharlesW 1 hour ago
  When you watch the video it becomes pretty clear why it wouldn't be able to do that, although it's fun to think about how a future iteration or alternative might be able to credibly (if you don't look too hard) mimic that someday.
- DrewADesign 52 minutes ago
  You’d have to track it, render it, and comp it in. It’s not ridiculously difficult, but there’s no way that’s going to happen automatically.
  [-]
  - orbital-decay 48 minutes ago
    >there’s no way that’s going to happen automatically
    They train their model in a pretty straightforward way, it can also be used to capture the distortion as well, just use a non-monochrome (possibly moving) background optimized for this. It's a matter of effort and attention to detail during training (uneven green screen lighting, reflections, etc), not fundamental impossibility
    [-]
    - amelius 24 minutes ago
      Yes. But the main issue is in the way they formulate the problem. Their output is always a transparency mask, which of course will never handle distortions.
superjan 2 hours ago
Watched this a few days ago. The video is light on technical details, except maybe that they used CGI to generate training data.
[-]
- rhdunn 1 hour ago
  The idea behind a greenscreen is that you can make that green colour transparent in the frames of footage allowing you to blend that with some other background or other layered footage. This has issues like not always having a uniform colour, difficulty with things like hair, and lighting affecting some edges. These have to be manually cleaned up frame-by-frame, which takes a lot of time that is mostly busy work.
  An alternative approach (such as that used by the sodium lighting on Mary Poppins) is that you create two images per frame -- the core image and a mask. The mask is a black and white image where the white pixels are the pixels to keep and the black pixels the ones to discard. Shades of gray indicate blended pixels.
  For the mask approach you are filming a perfect alpha channel to apply to the footage that doesn't have the issues of greenscreen. The problem is that this requires specialist, licensed equipment and perfect filming conditions.
  The new approach is to take advantage of image/video models to train a model that can produce the alpha channel mask for a given frame (and thus an entire recording) when just given greenscreen footage.
  The use of CGI in the training data allows the input image and mask to be perfect without having to spend hundreds of hours creating that data. It's also easier to modify and create variations to test different cases such as reflective or soft edges.
  Thus, you have the greenscreen input footage, the expected processed output and alpha channel mask. You can then apply traditional neural net training techniques on the data using the expected image/alpha channel as the target. For example, you can compute the difference on each of the alpha channel output neurons from the expected result, then apply backpropagation to compute the differences through the neural network, and then nudge the neuron weights in the computed gradient direction. Repeat that process across a distribution of the test images over multiple passes until the network no longer changes significantly between passes.
IshKebab 2 hours ago
Pretty impressive results! Seems like someone has even made a GUI for it: https://github.com/edenaion/EZ-CorridorKey
Still Python unfortunately.
[-]
- BoredPositron 1 hour ago
  Like 90% of the other tooling in VFX...
  [-]
  - IshKebab 32 minutes ago
    Is it? That's a shame. I assumed this is Python because of Pytorch.
ralusek 2 hours ago
I'm a software engineer that, like the vast majority of you, uses AI/agents in my workflow every day. That being said, I have to admit that it feels a little weird to hear someone who does not write code say that they built something, without even mentioning that they had an agent build it (unless I missed that).
[-]
- adamtaylor_13 1 hour ago
  This is interesting. I had the exact opposite reaction.
  You don't hear architects get hounded because they say they "built" some building even though it was definitely the guys swinging hammers that built it. But yet, somehow because he didn't artisanally hand-craft the code, he needs to caveat that he didn't actually build it?
  [-]
  - plopz 31 minutes ago
    architects i know say "i designed that", "i worked on that", "i specified that" or "i chose that", they don't say "i built that"
  - kalaksi 51 minutes ago
    Maybe it's a language thing. Architects saying they built something sounds a bit off to me. In my native language, and in everyday language, I don't think people would use "built" like that. I don't know how architects talk with each other, though.
- tekacs 1 hour ago
  Worth bearing in mind that people in VFX are often relatively technical.
  From their own 'LLM handover' doc: https://github.com/nikopueringer/CorridorKey/blob/main/docs/...
  > Be Proactive: The user is highly technical (a VFX professional/coder). Skip basic tutorials and dive straight into advanced implementation, but be sure to document math thoroughly.
- jrm4 1 hour ago
  I mean, the heading of the video says "he solved the problem," which I think is wise to pay a lot of attention to.
Computer0 2 hours ago
Looking forward to trying it out, 8gb of vram or unified memory required!