18 votes

Nvidia replaced video codecs with a neural network

Posted October 6, 2020 by macleod

Tags: algorithms, artificial intelligence, software, video, neural networks, nvidia, videos, nvidia developer, short watch, source.youtube

https://www.youtube.com/watch?v=NqmMnjJ6GEg

Link information

This data is scraped automatically and may be incorrect.

Title: Inventing Virtual Meetings of Tomorrow with NVIDIA AI Research
Authors: NVIDIA Developer
Duration: 2:27
Published: Oct 5 2020

13 comments

[11]
Grendel
October 6, 2020
Link
So there is something important they didn't talk about. How much processing power does this require? If I have a meeting with 4 people, my machine now has to run 4 machine learning process...

So there is something important they didn't talk about. How much processing power does this require? If I have a meeting with 4 people, my machine now has to run 4 machine learning process simultaneously. Thats a lot of cpu/gpu cycles. Everyone else also has to run those 4 processes, which means work is now being duplicated.

I feel like this would use way more energy than a codec that requires a little more bandwidth would, and that wouldn't be good for the environment. I don't want to have to buy a bit-coin mining rig just to have a meeting with more than 4 people.

7 votes
1. [3]
  onyxleopard
  October 6, 2020
  Link Parent
  Yeah, at work lately we’ve had video calls with dozens of people for all-hands meetings. This may be cool tech for capturing keyframes, but I think newer video codecs like AV1 etc. are the real,...
  
  Yeah, at work lately we’ve had video calls with dozens of people for all-hands meetings. This may be cool tech for capturing keyframes, but I think newer video codecs like AV1 etc. are the real, practical solution for improving quality with lower bandwidth once they are more widely available and have hardware support.
  
  4 votes
  1. [2]
    babypuncher
    October 6, 2020
    Link Parent
    I couldn't help but notice that they compared their tech to a 15 year old video codec instead of something modern.
    
    I couldn't help but notice that they compared their tech to a 15 year old video codec instead of something modern.
    
    5 votes
    
    TheJorro
    October 6, 2020
    Link Parent
    Do any of the current video conferencing solutions use a modern codec? Zoom uses H.264 still, and I think only Google Duo uses AV1 but that's smartphones only.
    
    Do any of the current video conferencing solutions use a modern codec? Zoom uses H.264 still, and I think only Google Duo uses AV1 but that's smartphones only.
    
    1 vote
2. [3]
  TheJorro
  October 6, 2020
  Link Parent
  Based on the NVIDIA MAXINE page, it's fully run on the cloud so it should have minimal impact on local processing power.
  
  Based on the NVIDIA MAXINE page, it's fully run on the cloud so it should have minimal impact on local processing power.
  
  4 votes
  1. [2]
    Grendel
    October 6, 2020
    Link Parent
    What? So my machine sends a regular codec compressed video stream to NVIDIA, who then compresses with "AI" then it sends it out to everyone else? So people with crappy internet will still be...
    
    What?
    So my machine sends a regular codec compressed video stream to NVIDIA, who then compresses with "AI" then it sends it out to everyone else? So people with crappy internet will still be limited by their upload speed. That kind of defeats the usefulness of this product.
    
    2 votes
    
    TheJorro
    October 6, 2020
    Link Parent
    The video is from the perspective of someone receiving video with bad bandwidth so the benefits are on their end. It's great for people with bad bandwidth but there's only so much that can be done...
    
    The video is from the perspective of someone receiving video with bad bandwidth so the benefits are on their end. It's great for people with bad bandwidth but there's only so much that can be done when taking information from someone with bad bandwidth.
    
    Nvidia's leading the charge in upscaling though, considering how well DLSS and Nvidia Shield have done in that, so perhaps the person with good bandwidth can get a better quality upscaled feed from this same solution down the line.
    
    1 vote
3. [3]
  arghdos
  October 6, 2020
  Link Parent
  So potentially important distinction here: typically the cost in ML comes from training the model (which is done on a cluster of very expensive GPU to train the model, e.g., your A100's of the...
  
  my machine now has to run 4 machine learning process simultaneously
  
  So potentially important distinction here: typically the cost in ML comes from training the model (which is done on a cluster of very expensive GPU to train the model, e.g., your A100's of the world), not actually feeding input through an already trained model. Once you have a trained model model, it is significantly less costly to feed input through it to get output (in this case, feeding keypoints of the face through a GANN to get the generated image). So: put away the bit-coin rig for the moment.
  
  That said, it would still be interesting to compare the energy costs of running the GANN versus doing video compression using h.264 (which BTW, is fairly compute intensive). But I wouldn't expect it to be orders of magnitude of difference.
  
  3 votes
  1. Grendel
    October 6, 2020
    Link Parent
    GPT3 recommends a GPU with at least 12GB of memory, and that's a pre-trained model. ML is still expensive after training compared to non ML software.
    
    GPT3 recommends a GPU with at least 12GB of memory, and that's a pre-trained model. ML is still expensive after training compared to non ML software.
    
    2 votes
  2. stu2b50
    October 6, 2020
    Link Parent
    To be pedantic, this is mostly true with neural networks, and other forms of ML which directly model the posterior probability rather than the underlying distributions. QDA and LDA, for instance,...
    
    To be pedantic, this is mostly true with neural networks, and other forms of ML which directly model the posterior probability rather than the underlying distributions.
    
    QDA and LDA, for instance, take waay longer to run inference than to train.
    
    1 vote
4. skybrian
  October 6, 2020
  Link Parent
  I suspect that NVidia would be fine with selling high-end GPU's for video conferencing, but in the longer run, the performance of this general approach will likely improve and we can't really...
  
  I suspect that NVidia would be fine with selling high-end GPU's for video conferencing, but in the longer run, the performance of this general approach will likely improve and we can't really predict how much in advance.
  
  Google and Apple have obvious incentives to get this approach to work on smartphones, if it's possible.
  
  1 vote
babypuncher
October 6, 2020
Link
This is impressive, but the examples look really...creepy. Especially the guy who's face got re-aligned so that it was facing the viewer. The CGI face looked way more interesting to me.

This is impressive, but the examples look really...creepy. Especially the guy who's face got re-aligned so that it was facing the viewer.

The CGI face looked way more interesting to me.

6 votes
macleod (OP)
October 6, 2020
Link
If you don't watch the video - @ChrisMessina on twitter did a great four image comparison between codecs and their AI at only 0.12kbps bandwidth

If you don't watch the video - @ChrisMessina on twitter did a great four image comparison between codecs and their AI at only 0.12kbps bandwidth

2 votes