18 votes

Nvidia replaced video codecs with a neural network

13 comments

  1. [11]
    Grendel
    Link
    So there is something important they didn't talk about. How much processing power does this require? If I have a meeting with 4 people, my machine now has to run 4 machine learning process...

    So there is something important they didn't talk about. How much processing power does this require? If I have a meeting with 4 people, my machine now has to run 4 machine learning process simultaneously. Thats a lot of cpu/gpu cycles. Everyone else also has to run those 4 processes, which means work is now being duplicated.

    I feel like this would use way more energy than a codec that requires a little more bandwidth would, and that wouldn't be good for the environment. I don't want to have to buy a bit-coin mining rig just to have a meeting with more than 4 people.

    7 votes
    1. [3]
      onyxleopard
      Link Parent
      Yeah, at work lately we’ve had video calls with dozens of people for all-hands meetings. This may be cool tech for capturing keyframes, but I think newer video codecs like AV1 etc. are the real,...

      Yeah, at work lately we’ve had video calls with dozens of people for all-hands meetings. This may be cool tech for capturing keyframes, but I think newer video codecs like AV1 etc. are the real, practical solution for improving quality with lower bandwidth once they are more widely available and have hardware support.

      4 votes
      1. [2]
        babypuncher
        Link Parent
        I couldn't help but notice that they compared their tech to a 15 year old video codec instead of something modern.

        I couldn't help but notice that they compared their tech to a 15 year old video codec instead of something modern.

        5 votes
        1. TheJorro
          Link Parent
          Do any of the current video conferencing solutions use a modern codec? Zoom uses H.264 still, and I think only Google Duo uses AV1 but that's smartphones only.

          Do any of the current video conferencing solutions use a modern codec? Zoom uses H.264 still, and I think only Google Duo uses AV1 but that's smartphones only.

          1 vote
    2. [3]
      TheJorro
      Link Parent
      Based on the NVIDIA MAXINE page, it's fully run on the cloud so it should have minimal impact on local processing power.

      Based on the NVIDIA MAXINE page, it's fully run on the cloud so it should have minimal impact on local processing power.

      4 votes
      1. [2]
        Grendel
        Link Parent
        What? So my machine sends a regular codec compressed video stream to NVIDIA, who then compresses with "AI" then it sends it out to everyone else? So people with crappy internet will still be...

        What?
        So my machine sends a regular codec compressed video stream to NVIDIA, who then compresses with "AI" then it sends it out to everyone else? So people with crappy internet will still be limited by their upload speed. That kind of defeats the usefulness of this product.

        2 votes
        1. TheJorro
          Link Parent
          The video is from the perspective of someone receiving video with bad bandwidth so the benefits are on their end. It's great for people with bad bandwidth but there's only so much that can be done...

          The video is from the perspective of someone receiving video with bad bandwidth so the benefits are on their end. It's great for people with bad bandwidth but there's only so much that can be done when taking information from someone with bad bandwidth.

          Nvidia's leading the charge in upscaling though, considering how well DLSS and Nvidia Shield have done in that, so perhaps the person with good bandwidth can get a better quality upscaled feed from this same solution down the line.

          1 vote
    3. [3]
      arghdos
      Link Parent
      So potentially important distinction here: typically the cost in ML comes from training the model (which is done on a cluster of very expensive GPU to train the model, e.g., your A100's of the...

      my machine now has to run 4 machine learning process simultaneously

      So potentially important distinction here: typically the cost in ML comes from training the model (which is done on a cluster of very expensive GPU to train the model, e.g., your A100's of the world), not actually feeding input through an already trained model. Once you have a trained model model, it is significantly less costly to feed input through it to get output (in this case, feeding keypoints of the face through a GANN to get the generated image). So: put away the bit-coin rig for the moment.

      That said, it would still be interesting to compare the energy costs of running the GANN versus doing video compression using h.264 (which BTW, is fairly compute intensive). But I wouldn't expect it to be orders of magnitude of difference.

      3 votes
      1. Grendel
        Link Parent
        GPT3 recommends a GPU with at least 12GB of memory, and that's a pre-trained model. ML is still expensive after training compared to non ML software.

        GPT3 recommends a GPU with at least 12GB of memory, and that's a pre-trained model. ML is still expensive after training compared to non ML software.

        2 votes
      2. stu2b50
        Link Parent
        To be pedantic, this is mostly true with neural networks, and other forms of ML which directly model the posterior probability rather than the underlying distributions. QDA and LDA, for instance,...

        To be pedantic, this is mostly true with neural networks, and other forms of ML which directly model the posterior probability rather than the underlying distributions.

        QDA and LDA, for instance, take waay longer to run inference than to train.

        1 vote
    4. skybrian
      Link Parent
      I suspect that NVidia would be fine with selling high-end GPU's for video conferencing, but in the longer run, the performance of this general approach will likely improve and we can't really...

      I suspect that NVidia would be fine with selling high-end GPU's for video conferencing, but in the longer run, the performance of this general approach will likely improve and we can't really predict how much in advance.

      Google and Apple have obvious incentives to get this approach to work on smartphones, if it's possible.

      1 vote
  2. babypuncher
    Link
    This is impressive, but the examples look really...creepy. Especially the guy who's face got re-aligned so that it was facing the viewer. The CGI face looked way more interesting to me.

    This is impressive, but the examples look really...creepy. Especially the guy who's face got re-aligned so that it was facing the viewer.

    The CGI face looked way more interesting to me.

    6 votes
  3. drannex
    Link
    If you don't watch the video - @ChrisMessina on twitter did a great four image comparison between codecs and their AI at only 0.12kbps bandwidth

    If you don't watch the video - @ChrisMessina on twitter did a great four image comparison between codecs and their AI at only 0.12kbps bandwidth

    2 votes