18
votes
Nvidia replaced video codecs with a neural network
Link information
This data is scraped automatically and may be incorrect.
- Title
- Inventing Virtual Meetings of Tomorrow with NVIDIA AI Research
- Authors
- NVIDIA Developer
- Duration
- 2:27
- Published
- Oct 5 2020
So there is something important they didn't talk about. How much processing power does this require? If I have a meeting with 4 people, my machine now has to run 4 machine learning process simultaneously. Thats a lot of cpu/gpu cycles. Everyone else also has to run those 4 processes, which means work is now being duplicated.
I feel like this would use way more energy than a codec that requires a little more bandwidth would, and that wouldn't be good for the environment. I don't want to have to buy a bit-coin mining rig just to have a meeting with more than 4 people.
Yeah, at work lately we’ve had video calls with dozens of people for all-hands meetings. This may be cool tech for capturing keyframes, but I think newer video codecs like AV1 etc. are the real, practical solution for improving quality with lower bandwidth once they are more widely available and have hardware support.
I couldn't help but notice that they compared their tech to a 15 year old video codec instead of something modern.
Do any of the current video conferencing solutions use a modern codec? Zoom uses H.264 still, and I think only Google Duo uses AV1 but that's smartphones only.
Based on the NVIDIA MAXINE page, it's fully run on the cloud so it should have minimal impact on local processing power.
What?
So my machine sends a regular codec compressed video stream to NVIDIA, who then compresses with "AI" then it sends it out to everyone else? So people with crappy internet will still be limited by their upload speed. That kind of defeats the usefulness of this product.
The video is from the perspective of someone receiving video with bad bandwidth so the benefits are on their end. It's great for people with bad bandwidth but there's only so much that can be done when taking information from someone with bad bandwidth.
Nvidia's leading the charge in upscaling though, considering how well DLSS and Nvidia Shield have done in that, so perhaps the person with good bandwidth can get a better quality upscaled feed from this same solution down the line.
So potentially important distinction here: typically the cost in ML comes from training the model (which is done on a cluster of very expensive GPU to train the model, e.g., your A100's of the world), not actually feeding input through an already trained model. Once you have a trained model model, it is significantly less costly to feed input through it to get output (in this case, feeding keypoints of the face through a GANN to get the generated image). So: put away the bit-coin rig for the moment.
That said, it would still be interesting to compare the energy costs of running the GANN versus doing video compression using h.264 (which BTW, is fairly compute intensive). But I wouldn't expect it to be orders of magnitude of difference.
GPT3 recommends a GPU with at least 12GB of memory, and that's a pre-trained model. ML is still expensive after training compared to non ML software.
To be pedantic, this is mostly true with neural networks, and other forms of ML which directly model the posterior probability rather than the underlying distributions.
QDA and LDA, for instance, take waay longer to run inference than to train.
I suspect that NVidia would be fine with selling high-end GPU's for video conferencing, but in the longer run, the performance of this general approach will likely improve and we can't really predict how much in advance.
Google and Apple have obvious incentives to get this approach to work on smartphones, if it's possible.
This is impressive, but the examples look really...creepy. Especially the guy who's face got re-aligned so that it was facing the viewer.
The CGI face looked way more interesting to me.
If you don't watch the video - @ChrisMessina on twitter did a great four image comparison between codecs and their AI at only 0.12kbps bandwidth