Note that the input here is not any arbitrary video, but a radiance field, a type of data structure that's halfway between photos and a 3D model:
"The radiance field representation can be seen as a 5D function that maps any 3D location x and viewing direction d to volume density σ and RGB color c."
The radiance field is typically reconstructed from multiple photos of a scene, which implies that either the subject must be static or you need a fancy camera rig to capture the views simultaneously.
We don't have any content pipelines for creating and editing radiance fields, so movies produced with this approach are a ways off. But one could imagine AR glasses in a couple of years that capture the different views as you move around, then the radiance field is built in the cloud and you can create artistic re-renders of what you just saw.
> We don't have any content pipelines for creating and editing radiance fields
I presume radiance fields can be generated from computer graphic scenes? It would be awesome to choose your own style while playing a video game! Real-time generation is unlikely, but game replay videos in a surreal style would be trés cool.
Yes, but it's an extremely expensive representation for rendering.
For stylization, I'd imagine you could get close-enough results with a screen space technique if you can get some additional custom render pass buffers from the engine (material properties or whatever is most useful for the specific look). That's feasible in realtime on current hardware.
Just yesterday someone asked me if "AI style transfer"* would work for 3d models or video. I said "if the current speed of progress holds, i think we will see this within the year".
And here it is.
"What a time to be alive!"
(*) this isn't style transfer on video, as far as i understand, this is styling the neural radiance field before rendering it to images.
Adaptations of image style transfer specifically for video came out pretty quick after the image ones, and there's been a few iterations. For the most part they use the same ideas, but add some extra steps to "stabilize" the result of the stylization across time so that it doesn't jump all over the place from frame to frame.
For 3D, it's more complicated as there isn't an obvious translation of image-based stylization to the 3D realm - if we are talking 3D mesh models, I've seen similar ideas applied to generating textures as well as geometry.
In this case, the novelty is combining the stylization with the NERF, which does result in a special type of 3D model (not a mesh, but a volumetric representation that lends itself well to rendering). There isn't really any video aspect to it, other than that videos of the result from different viewpoints are a good way of visualizing the result.
In my opinion, this looks like a 2D style transfer, not a true 3D style transfer. In the horse statue example, a human artist would add a separator line between the white box and the grey background, to illustrate the depth disparity. But this approach seems to somewhat ignore depth cues when styling.
Radiance fields really feel like they're taking off. Reminds me of GANs a few year ago, where the number of papers seemed to start growing exponentially one day.
I always pictured eventually Augmented Reality would have artistic overlays, reinterpretations of the world, and this sure does seem like a feasible implementation of that.
@ak92501 posted this one last week. This isn't to say "it's old news" (it isn't!) but that if you're on Twitter and like summaries of ML papers, they're a great follow.
Neat, but it's not transferring the style of cubism or impressionism or Ukiyo-e in any meaningful way into 3d expression. It's just slapping a texture on top that uses the same colours.
I don't think consumer GPUs can render this in real-time yet. I've seen around 5 to 10fps framerates on a 3080 just rendering monoscopic output (from memory)
the decoding doesnt use a neural function but this is actually surprisingly impressive at transmitting the feeling of being "there" try the one demo with the goat hanging out by the camera, I found it surprisingly nice
This was my impression as well. It seems to have some sort of object permanence when the camera is moved rather than just convoluting a single frame; seems significantly more difficult to me as a layman.
Yep. There has been some work to figure out frame-to-frame coherence on a sequence of images in 2D. But, I think this skips over that problem by working in 3D.
"The radiance field representation can be seen as a 5D function that maps any 3D location x and viewing direction d to volume density σ and RGB color c."
The radiance field is typically reconstructed from multiple photos of a scene, which implies that either the subject must be static or you need a fancy camera rig to capture the views simultaneously.
We don't have any content pipelines for creating and editing radiance fields, so movies produced with this approach are a ways off. But one could imagine AR glasses in a couple of years that capture the different views as you move around, then the radiance field is built in the cloud and you can create artistic re-renders of what you just saw.