28 votes

The Finals uses AI text-to-speech because it can produce lines 'in just a matter of hours rather than months', baffles actual voice actors

26 comments

  1. [5]
    Minithra
    Link
    I dunno about the quoted "I've knocked out whole games in two hours, it's not hard" thing... what voice was that? And is it better than a TTS at that point? And if the difference isn't that big,...

    I dunno about the quoted "I've knocked out whole games in two hours, it's not hard" thing... what voice was that? And is it better than a TTS at that point? And if the difference isn't that big, 100% companies will pick the cheaper option.

    But perhaps my view of it is tainted by all the BG3 voice actor content I've watched, where they're actually acting, with motion capture and everything, rather than sitting still and recording.

    18 votes
    1. [3]
      GunnarRunnar
      Link Parent
      No, that's where voice actors' value is. It's not just about vocalizing written lines (duh). Also "voice" is in my understanding the least protected expression skill, sadly. And this type of...

      But perhaps my view of it is tainted by all the BG3 voice actor content I've watched, where they're actually acting, with motion capture and everything, rather than sitting still and recording.

      No, that's where voice actors' value is. It's not just about vocalizing written lines (duh).

      Also "voice" is in my understanding the least protected expression skill, sadly. And this type of competitive game is the most fitting application for AI text-to-speech nonsense since the game is not actually about that, it's there just for the sportsy flavor and there needs to be a lot of it. I also have no doubt that this helps with the development turnaround, which is probably most important in GaaS genre. Still, "get over here", "double kill" and stuff like that is iconic, partly because of the voice actors, so even in the competitive scene voice actors do bring value.

      13 votes
      1. [2]
        stu2b50
        Link Parent
        Yep. There's very strong legal precedent that voices cannot be copyrighted in most jurisdictions. For good reason, though.

        Also "voice" is in my understanding the least protected expression skill

        Yep. There's very strong legal precedent that voices cannot be copyrighted in most jurisdictions. For good reason, though.

        6 votes
        1. unkz
          Link Parent
          Could you imagine being unfortunate enough to be born with a voice that is already copyrighted?

          Could you imagine being unfortunate enough to be born with a voice that is already copyrighted?

          1 vote
    2. babypuncher
      (edited )
      Link Parent
      Time spent in the recording booth is only a fraction of the time spent coordinating the whole endeavor. But this game isn't a story-driven game. It's not even single player. The voice work amounts...

      Time spent in the recording booth is only a fraction of the time spent coordinating the whole endeavor.

      But this game isn't a story-driven game. It's not even single player. The voice work amounts to announcers commenting on the state of the match.

      Something a little unique that this game does is give real time sports-style commentary on the match, complete with a bunch of randomly assigned team names for the 3 or 4 teams competing at a given time. I could see AI being a big deal if they want to add a ton of different unique team names and not have to worry about how many different permutations of all the banter that results in.

      9 votes
  2. Nemoder
    Link
    One of the games I was involved with production on also used text-to-speech as a placeholder and while the recording sessions for the real voice actors was fast the actual scheduling was not. They...

    One of the games I was involved with production on also used text-to-speech as a placeholder and while the recording sessions for the real voice actors was fast the actual scheduling was not. They first had to co-ordinate with the publisher who was covering the cost of it, then the publisher had to juggle this project with several others, then they had to co-ordinate with the agency representing the actors. All this took several weeks to happen. If AI at the time had been of high enough quality I could definitely see the temptation to use that instead.

    13 votes
  3. stu2b50
    Link
    On a tangential note, I think people underestimate how good generated voices are with some curation. For example, many fans really dislike the voice actor Capcom had for Ada in REmake 2, so there...

    On a tangential note, I think people underestimate how good generated voices are with some curation. For example, many fans really dislike the voice actor Capcom had for Ada in REmake 2, so there are mods that replace the voice with generated ones. Here's a video for demonstration:

    https://www.youtube.com/watch?v=t1WkKT424cU

    I legitimately think the "Undercover" variant sounds better than the actual voice work. Of course, this is an outlier in that it's such a bad performance people went through the effort of making a mod that replaces it, but these are also just unpaid modders

    12 votes
  4. [11]
    lou
    Link
    I saw the gameplay video and it is very obvious those are AI voices lacking emotion and flavor. React video for lack of a better source.

    I saw the gameplay video and it is very obvious those are AI voices lacking emotion and flavor.

    React video for lack of a better source.

    8 votes
    1. [10]
      CptBluebear
      Link Parent
      It's also largely unimportant to the flow of the game. The voices are talking about teamwipes, respawns, and that's pretty much it. I don't think I agree with AI voice generation in general, but...

      It's also largely unimportant to the flow of the game. The voices are talking about teamwipes, respawns, and that's pretty much it.

      I don't think I agree with AI voice generation in general, but if there's any game where quality of acting doesn't quite matter, it's this one.

      15 votes
      1. [9]
        KapteinB
        Link Parent
        If done well, I think the AI commentators in this game could actually add something really cool, that voice actors wouldn't be able to replicate; reacting to emergent gameplay. When it's a team...

        If done well, I think the AI commentators in this game could actually add something really cool, that voice actors wouldn't be able to replicate; reacting to emergent gameplay. When it's a team wipe, the commentator can announce which player (by gamertag!) performed the final kill, and how.

        8 votes
        1. [6]
          CptBluebear
          Link Parent
          I think you're right, it shows promise. I just don't like people's voices being used by an AI. If this is happening, and it probably is, what I would rather see is that a voice-actor "sells" their...

          I think you're right, it shows promise.

          I just don't like people's voices being used by an AI. If this is happening, and it probably is, what I would rather see is that a voice-actor "sells" their voice for a particular game, where they personally train the AI model on their speech mannerisms so when the game needs a new line they use the model this specific actor provided, and pay them royalties every time they use their voice.

          This internet-crawled voice sourcing is bad, it literally takes your voice as their own. I have fewer qualms about art as you could argue it's inspiration (at best), but this is something you can't protect yourself from.

          4 votes
          1. [2]
            lou
            Link Parent
            I assumed that was the case here.

            I just don't like people's voices being used by an AI. If this is happening, and it probably is, what I would rather see is that a voice-actor "sells" their voice for a particular game

            I assumed that was the case here.

            6 votes
            1. CptBluebear
              Link Parent
              I reread the article and you're right. This studio doesn't look to fully replace voice actors, which is a good thing.

              I reread the article and you're right. This studio doesn't look to fully replace voice actors, which is a good thing.

              2 votes
          2. [3]
            GunnarRunnar
            Link Parent
            This to me seems like the ethical, responsible and fair way to use this technology. It's also just smart. You get both the performance and can extend that announcer personality to carry the...

            I just don't like people's voices being used by an AI. If this is happening, and it probably is, what I would rather see is that a voice-actor "sells" their voice for a particular game, where they personally train the AI model on their speech mannerisms so when the game needs a new line they use the model this specific actor provided, and pay them royalties every time they use their voice.

            This to me seems like the ethical, responsible and fair way to use this technology. It's also just smart. You get both the performance and can extend that announcer personality to carry the matches dynamically.

            3 votes
            1. [2]
              CptBluebear
              Link Parent
              I think it's the only option there is for voice actors to be totally honest. At least for studios looking to employ AI voices this can be done cheaper but still keep someone's voice their own....

              I think it's the only option there is for voice actors to be totally honest. At least for studios looking to employ AI voices this can be done cheaper but still keep someone's voice their own. There's little more personal than someone's voice.

              Of course, the full range acting such as in BG3 will still be used as it simply provides more than AI, especially at this point.

              1 vote
              1. GunnarRunnar
                Link Parent
                Until proven wrong, I don't think you can get a performance similar to a real actor interpreting their character and lines. There's so much emotion you can get through just voice alone and...

                Until proven wrong, I don't think you can get a performance similar to a real actor interpreting their character and lines. There's so much emotion you can get through just voice alone and improvisations/improvements to the written dialogue are something AI probably can't do correctly. We've come a long way since Oblivion, it's not just about spitting text to voice.

                Of course if you're just looking for talking heads, I can't really argue with that. But you do get what you pay for.

                It'll be interesting if a new profession, something between voice director and actor (probably AI voice engineer or something), will pop up with this stuff.

                2 votes
        2. [3]
          Comment deleted by author
          Link Parent
          1. OBLIVIATER
            Link Parent
            This is an online multiplayer game, an internet connection is required for it already.

            This is an online multiplayer game, an internet connection is required for it already.

            5 votes
          2. CptBluebear
            Link Parent
            It's an online multiplayer shooter so I don't actually see any issues. Aside from the obvious racial slurs people will force the AI to loudly announce someone by.

            It's an online multiplayer shooter so I don't actually see any issues. Aside from the obvious racial slurs people will force the AI to loudly announce someone by.

            3 votes
  5. [5]
    canekicker
    Link
    Weird, you'd think it would be the opposite. Like short, high strain voice work for stuff like grunts, pain noises etc would seem to be more easily done via AI with very little drop in quality? At...

    We use AI with a few exceptions, so all the contestant voices like the barks and voiceover commentators are AI text-to-speech." Miscellaneous voiceover stuff—grunting, pain noises, vaulting over objects—is otherwise done in-house.

    Weird, you'd think it would be the opposite. Like short, high strain voice work for stuff like grunts, pain noises etc would seem to be more easily done via AI with very little drop in quality? At the same time, I agree this is a little uncanny valley-ish and I don't know how commentary has improved in video games but this all sounds kind of like mid 90s NBA-Jam-ish commentary. It's not great but I'm not sure if it'll be super distracting to me.

    1 vote
    1. [4]
      stu2b50
      Link Parent
      The difference is that you don't expect substantial variety from grunting. Fake commentary is a famously difficult problem because you need to have a lot of variety to maintain the illusion, the...

      The difference is that you don't expect substantial variety from grunting. Fake commentary is a famously difficult problem because you need to have a lot of variety to maintain the illusion, the voice lines have to be non-trivially long, and most players will only hear a fraction (in order to maintain the illusion of live commentary), so you have to spend a lot of money and effort for marginal gains.

      7 votes
      1. [3]
        canekicker
        Link Parent
        Gotcha so the devs are maybe thinking spend less money and time on getting something that's a 6/10 via AI vs spending lots of money and lots of time on something that maybe 7/10? Still seems odd...

        Gotcha so the devs are maybe thinking spend less money and time on getting something that's a 6/10 via AI vs spending lots of money and lots of time on something that maybe 7/10? Still seems odd that they wouldn't just use the same process for grunts, like why have voice actors do the easy work when you have some pipeline to do the harder stuff.

        1. [2]
          stu2b50
          Link Parent
          Who says you need voice actors? Just record the intern dealing with jira for the first time and you have all the grunts you need. It’s about breadth. You can record grunts and yells in a day. They...

          Who says you need voice actors? Just record the intern dealing with jira for the first time and you have all the grunts you need.

          It’s about breadth. You can record grunts and yells in a day. They don’t even need to be particularly high quality voice actors to make grunts. Same for foley work.

          There’s a huge difference between recording 8 grunt noises and recording 500 voice lines, no?

          4 votes
          1. MimicSquid
            Link Parent
            Also, pretty much everyone knows exactly what sounds humans make when grunting in exertion, when they're in pain, etc. I expect that those sorts of sounds are the ones that'll be the last to be AI...

            Also, pretty much everyone knows exactly what sounds humans make when grunting in exertion, when they're in pain, etc. I expect that those sorts of sounds are the ones that'll be the last to be AI generated, simply because of their familiarity. It's like hands. Everyone knows exactly what hands look like, since there's pretty much always a set in front of them, and so subtle wrongness stands out.

            2 votes
  6. [2]
    hamstergeddon
    Link
    I'll preface this by saying I know the technology isn't there to completely replace voice actors yet without uncanny valley stuff. But we're very close and I would not be surprised if it were up...

    I'll preface this by saying I know the technology isn't there to completely replace voice actors yet without uncanny valley stuff. But we're very close and I would not be surprised if it were up to snuff by the time the next gen of consoles roll out. Hell, some of the text-to-speech filters on TikTok aren't immediately detectable as fake until you start hearing them from multiple different accounts. I'm not in the industry (voice acting or AI), but as an enduser it certainly feels like we're on the verge of nearly perfecting AI voices.

    With that in mind, I'm so torn on this. As a big fan of voices acting and voice actors I am incredibly opposed to this. But I also know that the reason we see at most a couple dozen uniquely voiced characters in games is because actual voice actors are expensive and the process is time-consuming. Not to mention games that take a hybrid approach with voice and text. I find all of that to be very immersion-breaking. AI could someday easily, quickly, and cheaply resolve that entirely.

    I think the way forward is a compromise. Just some random ideas:

    • VAs license their voice for AI use on a per-game basis
    • Union-mandated requirements that x% of voices in a game must be human (I know VAs are covered under SAG/AFTRA, but I'm not sure if that covers games)
    • Main characters be voiced by people, but background characters, random NPCs, etc. can be AI (Admittedly I can see this causing games to have very few main characters as a workaround)

    I have the utmost respect for VAs and the work they do, but I also think allowing AI to handle some of the background and minor stuff would improve game quality for the players.

    1 vote
    1. unkz
      Link Parent
      All of those ideas rub me the wrong way. I'm super excited about a future where one person with an idea can take it from zero to a full world with thousands of distinct, highly interactive NPCs...

      All of those ideas rub me the wrong way. I'm super excited about a future where one person with an idea can take it from zero to a full world with thousands of distinct, highly interactive NPCs that are speaking adaptive content rather than reading scripts, without spending a dime on voice actors, motion capture, script writers, etc.

      In particular, the last two options I think really miss the mark in terms of what I see coming. Human actors can read scripts but they can't adapt their dialogue. Future gaming is absolutely going to be shifting towards dynamically generated content, which you just can't handle without AI voices. I can see how maybe it could, from a strictly technological viewpoint, work with VAs licensing their voices, but that just sounds like paying people to dig holes and then fill them in again.

      12 votes
  7. UP8
    Link
    The real weak area in video games and interactive fiction is dialogue. Scenes in a game can be as good as a Hollywood movie but they are not really interactive or at best “choose your own...

    The real weak area in video games and interactive fiction is dialogue. Scenes in a game can be as good as a Hollywood movie but they are not really interactive or at best “choose your own adventure” interactive.

    I would love to see some breakthrough where you can have a truly interactive conversation with an NPC the way you can have a conversation with a cast member at Disney. Really good TTS would be part of that.