17 votes

Is YouTube's use of AI upscaling for Shorts unethical?

11 comments

  1. [8]
    culturedleftfoot
    Link
    This was uploaded back in August, and YouTube subsequently responded on Twitter to clarify that it was testing AI upscaling, not altering content. I've changed the title here in an attempt to...

    This was uploaded back in August, and YouTube subsequently responded on Twitter to clarify that it was testing AI upscaling, not altering content. I've changed the title here in an attempt to avoid sensationalism while retaining valid concern.

    I try my best to avoid YouTube Shorts for all sorts of reasons (that they have in common with other social media video feeds like TikTok, Instagram, etc.), but one of them is the proliferation of what seems to me like AI-generated content. In my rare forays into YT shorts (before coming across this video), I see many with the same kind of smeary effect Rhett mentions, but surely a lot of them are actually real videos that have had some sort of weird filter intentionally applied over them. However, I don't like it, so even seeing videos that are upscaled and have an unintentionally similar filtered result is disgusting to me. Add that on top of mountains of clickbait and propaganda and you can argue that YouTube has, at minimum, been irresponsible with this. It's easy to be paranoid - for example, try searching for Burkina Faso president Ibrahim Traoré on YT, for any first-party reporting, and see how much crap you have to trawl through. By the time I finally found some, I started wondering if I was imagining that telltale AI character stiffness in him or not, even in regular, non-shorts videos.

    This gets to what I think may be the core question on the ethics of AI use, which is mentioned in the latter part of the video - the inevitable erosion of trust when humans cannot easily determine whether they are interfacing with AI-generated content in any format.

    The trust of my audience is the most important thing that I have as a YouTube creator. Whether or not you like my content or agree with what I have to say, underneath all of that is this underlying foundation that you trust what I'm making and what I'm saying and what I'm doing is truly me. It is my real opinion. It is my real thought. It is my real work. and replacing or enhancing my work without my consent or knowledge with some kind of AI upscaling system. Not only I think erodes that trust with the audience, but it also erodes my trust in the platform of YouTube. If they're going to roll this kind of thing out without my knowledge or permission, what else are they doing in the background with my content and with my data in order to optimize the platform or to increase safety?

    My hope is that we implement some sort of requirement that any published media discloses whether it uses AI and maybe even how much, in the same way that food products containing GMO ingredients must be labeled accordingly. How this can be practically applied and/or enforced, I have absolutely no clue... we're heading down the road where everyone and everything will need a label. Still, there need to be guardrails for the fundamental assumptions we have utilized for millennia to navigate human society. I'd even go as far as making it illegal for any chatbot, AI agent, etc. to use first-person pronouns.

    10 votes
    1. [7]
      OBLIVIATER
      (edited )
      Link Parent
      Ok to be fully honest, would anyone actually get upset about this issue if it wasn't reported as "AI Upscaling"? If it was just called "Upscaling" or "enhanced transcoding" or something, people...

      Edit: After seeing the results of this upscaling, if its "AI" or not, it doesn't matter. It just looks ugly and shouldn't be done without creators permission.

      Ok to be fully honest, would anyone actually get upset about this issue if it wasn't reported as "AI Upscaling"? If it was just called "Upscaling" or "enhanced transcoding" or something, people would probably either be unaware, or think that it's a cool feature. "AI" upscaling has existed for quite a while in one form or another, and has quietly been a powerful tool for a lot of purposes.

      The question is valid though, if Youtube is using this as an excuse to train AI on all youtube shorts that definitely is a misleading and scummy thing to do; but I'd have to see any scrap of proof that its actually happening, not just "what if they were doing this?"

      I get that everyone is rightfully a little touchy over AI since pretty much every major company has trawled the internet and stolen zettabytes of data to train their AI on, but there isn't even any real evidence that this is that. Youtube claims its traditional machine learning upscaling and explicitly not generative AI which is neither new nor damaging to artists/creators. I think it's vitally important that we don't just call everything "AI" until it completely loses it's meaning (even though companies have been doing exactly that as a marketing buzzword.) It's just muddying the already very muddy waters and confusing people who don't have any idea of what's actually going on and are just following whatever the youtuber says.

      Youtube absolutely should make this an Opt-in service though, simply because people are so wary of it. No reason to give people a service they don't want.

      15 votes
      1. [5]
        Greg
        (edited )
        Link Parent
        I agree with you on the broad context - companies have called anything and everything AI, and now plenty of perfectly useful tools are getting caught in the backlash when they really shouldn’t be....

        I agree with you on the broad context - companies have called anything and everything AI, and now plenty of perfectly useful tools are getting caught in the backlash when they really shouldn’t be.

        That tweet is nonsensical though, and actually makes me a lot less confident in what they’re doing (assuming that’s an official account?):

        GenAI typically refers to technologies like transformers and large language models, which are relatively new

        Debatable, because everything gets an AI label stuck on it now, but I see what he’s saying.

        This isn't using GenAI or doing any upscaling

        If it’s not upscaling, then it’s explicitly doing something designed to alter the content of the frames in-place. I’d argue that’s significantly worse in terms of respecting creator intent, and significantly less likely to be a technical step to reduce transcoding artefacts or similar. Actual upscaling, i.e. just making it bigger and using machine learning to take an educated guess at how to fill in the extra pixels, would be a lot more defensible IMO.

        It's using the kind of machine learning you experience with computational photography on smartphones, for example, and it's not changing the resolution

        I’d be amazed if that’s not a transformer model of some kind, which he defined as GenAI a few lines above. If it happens to not be a transformer because it’s, say, a state space model or a unet instead then they’re being intentionally obtuse because that’s just an implementation detail.

        And computational photography is important partly because it’s used as a way to interpret sensor data into a usable image - that’s the bit you can’t turn off. YouTube doesn’t have the original sensor data, they have the finished video.

        When computational photography is done to alter the resulting image, we call that a filter, or facetune, or yassify, or image enhancement in general. And again, while I think the “AI or not AI” conversation is misleading, I definitely don’t think YouTube should be applying filters to people’s videos without the option to turn it off - whether it’s a transformer model, a handwritten algorithm, or a bunch of interns in the basement with rolls of film and airbrushes!


        [Edit] Just to really hammer the point: the different approaches to computational photography, and/or the ability to store the raw format data and alter that processing in post, is one of the major selling points and areas of differentiation between the various phone brands and between higher end cameras.

        Whether or not it’s “AI” (and I’d bet good money that it is, not by my own personal definition, but by the definition Google have decided to use across their other marketing), it’s unilaterally overruling a choice that the creator probably spent very good money on making when they selected their camera setup and editing workflow.

        8 votes
        1. [2]
          Macha
          Link Parent
          To play devil's advocate here: how is an upscaling filter that different to YouTube reencoding videos in newer algorithms or allowing uses to full screen the video when their screen resolution...

          To play devil's advocate here: how is an upscaling filter that different to YouTube reencoding videos in newer algorithms or allowing uses to full screen the video when their screen resolution differs from the video resolution? Those compression algorithms and scaling algorithms also encode biases and choices about what's most important to keep and what can be lost.

          It's not like people never noticed that either, there was lots of talk of e.g. webp or av1 compression smoothening film grain because the film grain is high frequency data that compresses well and the algorithm designer's definition of perceptual quality considered that an ok trade off for file size.

          They'll also re-encode high bitrate videos at lower bitrates to save on storage and bandwidth costs, which is a problem if you want to show details like snow or rain, for example.

          And iirc YouTube's original recommended upload file was like flv containers with 360p h263 video content and MP3 audio. These are mostly converted to av1 with opus audio as the "primary" format and then potentially transcoded to h264 and aac depending on the receiving devices compatibility. And some of those older videos end up looking even poorer quality and not as the creator envisioned them than they might have been from all these transcoding steps. Sure some of it is that we were all so amazed at online video being a thing and everything on the internet had jpeg compression artifacts that we didn't notice some noisy line boundaries on Dr octagonapus or whatever meme videos we were watching in 2010, but also some of those older videos (maybe not quite going back into the 360p era) people have downloaded copies of or had older screenshots to compare to see that the quality has deteriorated from YouTube's reencodes over the years.

          7 votes
          1. Greg
            Link Parent
            Like I said, I think upscaling would be a lot more defensible - they're explicitly saying it isn't upscaling, which means for whatever reason they're intentionally altering the actual visual...

            Like I said, I think upscaling would be a lot more defensible - they're explicitly saying it isn't upscaling, which means for whatever reason they're intentionally altering the actual visual content.

            You're always going to be making technical tradeoffs with encoding, resolution, etc. when distributing, and those are potentially going to butt heads with the creative vision, but to some extent that's a necessity of the medium. It's the same even with cinema distribution, going pretty much all the way back to the start: technical and financial limitations put in place by the distributors existed, creators were sometimes frustrated by them, and on occasion the creators with more leverage actually managed to get the distributors to make improvements (THX is a great example, even though it's audio rather than visual).

            The big difference here is that YouTube appear to be making changes with the intent of altering the creative work, not as a side effect of technical limitation. It could still be purely technical - I did mention that as a possibility below - but if it were I'd kind of expect them to come out and say that, and I'm skeptical that things we're seeing like face filtering are happening as a result of a process with purely technical intent.

            Based on their statements and the type of changes we're seeing, it seems more likely to me that they're deliberately altering content to be more algorithm-friendly.

            1 vote
        2. [2]
          OBLIVIATER
          Link Parent
          Yes, after seeing the results I agree this shouldn't have been done to people's videos without their permission. I don't think that because of any ethical reasons though, I just think it looks bad...

          Yes, after seeing the results I agree this shouldn't have been done to people's videos without their permission. I don't think that because of any ethical reasons though, I just think it looks bad haha

          3 votes
          1. Greg
            Link Parent
            Right with you there! For all that I think the deeper debates around consent, content use, and the rest are important, a big part of me is also screaming "you went to all that effort, spent all...

            Right with you there! For all that I think the deeper debates around consent, content use, and the rest are important, a big part of me is also screaming "you went to all that effort, spent all that money, pissed off all these people, and the results aren't even good?!"

            2 votes
      2. Diff
        (edited )
        Link Parent
        Fully, yes. People's attention was drawn to this not by an announcement (it was an entirely silent launch), but by the actual visual effects it has. The technique they're doing is not just...

        Fully, yes. People's attention was drawn to this not by an announcement (it was an entirely silent launch), but by the actual visual effects it has. The technique they're doing is not just upscaling, it's also adjusting contrast, vibrance, sharpness, and it has a mild but noticeable impact on content that was not intended. For most content, this can be noticed as just an obnoxious oversharpening effect combined with a smeariness of unclear detail. For some content, this highlights edges that were supposed to be subtle in a way that actually reduces visual clarity or just negatively impacts the aesthetics. And although I said it wasn't "just" upscaling just now, it also doesn't really appear to actually be any kind of upscaling at all, as it's donating obnoxious generative-AI-esque smears, smudges, and nonsensical detail to already-high-resolution videos.

        I follow one artist, SavannahXYZ, who posts claymation-style 3D models that often have subtle fingerprint details in the materials. This AI upscaling, particularly the oversharpening and edge contrast boosting, makes that subtle detail leap out at you. The effect is honestly (mildly) unsettling.

        I've noticed for some types of animation, particularly animation that's primarily line art, where the edge highlighting ironically makes it quite difficult to actually even see the lines from a distance, as the brightening effect around the darker line art (again, from a distance) averages out into blending right in with the background color.

        So, yes, people noticed it in the first place because it was affecting their content in ways that no other video host does. In ways that no longer faithfully represent their artistic vision. Some videos, styles, and lighting scenarios are affected worse than others by this automatically-applied and impossible-to-remove technique. The AI is just the shit cherry on top for many of those artists.

        6 votes
  2. [2]
    Greg
    Link
    I've been wondering a bit about why YouTube might be doing this, so here's what I've got so far, from least to most nefarious possibility: They literally just didn't think about the drawbacks, and...

    I've been wondering a bit about why YouTube might be doing this, so here's what I've got so far, from least to most nefarious possibility:

    • They literally just didn't think about the drawbacks, and didn't consider making it opt in (or even opt out) "oh, we'll clean up everyone's videos for them, that's a great idea"
    • This somehow saves compute or bandwidth - that's what the original video suggests might be the case. Maybe transcoding at 240p and then upscaling turns out more efficient than transcoding at original resolution? Although I'd be surprised on that one, considering the availability of transcode ASICs and the overhead of upscaling... Perhaps some kind of content-aware encoder that's significantly more aggressive than the usual options?
    • Their data says that applying a filter to the videos improves metrics in some way (higher click through, better retention, whatever - "attractive creators get more clicks, so we'll make all our creators artificially more attractive"), so it's in YouTube's interests to process all content regardless of what the creators actually want
    • They're A/B testing user response to different filtering settings, with the intention of refining the filters to improve metrics as in the previous point, and allowing creators to opt out would reduce YouTube's pool of data to do this with
    • They're A/B testing user response to different filtering settings, to create user preference training data for Veo and/or other models they're developing, and it has no benefit to either YouTube as a whole or the individual creators
    • They're actively and intentionally making all videos on the platform look a little bit more artificial, making it harder for users to spot and complain about videos that are entirely artificial
    5 votes
    1. starcrossed_hero
      Link Parent
      Your most nefarious possibility is immediately what jumped to my mind. Intentionally making it very hard to tell what is generated content opens a lot of doors for malicious actors to put out...

      Your most nefarious possibility is immediately what jumped to my mind. Intentionally making it very hard to tell what is generated content opens a lot of doors for malicious actors to put out intentionally harmful content (political actors making opposition look bad, etc...) and for YouTube specifically to start generating their own content that they can directly profit from without any creators to split revenue.

      3 votes
  3. skybrian
    Link
    It seems like this is wrong for the same reason that applying an instagram filter to a user’s photos without the user’s consent would be wrong. Authoral intent should be respected and it’s...

    It seems like this is wrong for the same reason that applying an instagram filter to a user’s photos without the user’s consent would be wrong. Authoral intent should be respected and it’s disconcerting that someone at YouTube didn’t see anything wrong with it.

    I wonder how much controversy it resulted in internally? More than decade ago when I worked for Google, there would be big internal controversies about such things, but perhaps nowadays people are more resigned.

    I did a cursory news search and didn’t find any followup. Did they roll it back?

    1 vote