15 votes

Apple introduces new AI-based audiobook narration service

12 comments

  1. [2]
    stu2b50
    Link
    I'd definitely recommend giving the samples on Apple's page a listen. They're very good - there's some signals that might be tells but it could also just be audio artifacts. If I weren't paying...

    I'd definitely recommend giving the samples on Apple's page a listen. They're very good - there's some signals that might be tells but it could also just be audio artifacts. If I weren't paying attention there'd be no way I could tell it wasn't a recording of a human. The samples even nail the kind of reading cadence and intonation that text to speech usually screws up. Given that this is a process which seems to involve manual intervention, I wonder if they're also annotating sentences with the correct intonation for that purpose.

    9 votes
    1. TheJorro
      Link Parent
      This sounds incredible. I wonder if it does require some manual intervention at some points, I imagine a lot of names in the fantasy genre would give bots some trouble.

      This sounds incredible. I wonder if it does require some manual intervention at some points, I imagine a lot of names in the fantasy genre would give bots some trouble.

      4 votes
  2. Greg
    Link
    It’s interesting that the voices are categorised by genre of content rather than by any property of the voice itself! I wouldn’t have thought of it off the cuff, but it makes sense that training...

    It’s interesting that the voices are categorised by genre of content rather than by any property of the voice itself! I wouldn’t have thought of it off the cuff, but it makes sense that training from their existing corpus of audiobooks would embed a style of narration into each model above and beyond just the pronunciation of words, and that the style would be most consistent within a given genre. I’m guessing that’s why sci-fi and fantasy are explicitly excluded for now, as well: more intra-genre variability in how things are read, more outliers, and worse output as a result.

    I skipped down to listening to all four before reading anything, and the first two were just slightly “uncanny valley” to me. I wouldn’t have picked them out as machine generated (anything I’m used to hearing from a computer is still on the firmly artificial side of the valley), but to my British ear they sounded like the American-ness was exaggerated just past being natural. I was ready to write it off as not quite there yet, and then I tried the second two and absolutely wouldn’t be able to distinguish them from human, which is extremely impressive.

    Putting those two thoughts together, I’m now wondering whether it’s an artefact of the system on those first ones or whether I’d consider the average human romance novel narrator to also sound a little caricatured as well in a blind test!

    5 votes
  3. MimicSquid
    Link
    I feel like the cadence of the sentences wasn't necessarily what I would have done, with some odd starts and stops that didn't quite match my idea of proper flow. That said, I'm not sure I would...

    I feel like the cadence of the sentences wasn't necessarily what I would have done, with some odd starts and stops that didn't quite match my idea of proper flow. That said, I'm not sure I would have been able to pick this out as an AI as compared to an inexpert cold reading.

    Compare that to this cold reading of a bit of webfiction. By comparison, there's much more in the way of flow, and the reader regularly reexamines the tempo to make sure it's coming through well.

    But the very fact I'm comparing this AI voice to a professional means that they're roughly comparable, and there's only more improvements on the horizon.

    3 votes
  4. Wes
    Link
    /r/AudioBooks seems to believe that Mitchell (who is my favourite voice) was trained on Ray Porter.

    /r/AudioBooks seems to believe that Mitchell (who is my favourite voice) was trained on Ray Porter.

    1 vote
  5. [6]
    Akir
    Link
    I'd love to know more about the research that goes into this. It hasn't really been getting a lot of news coverage but text-to-speach algorythms have gotten really advanced over the past few...

    I'd love to know more about the research that goes into this. It hasn't really been getting a lot of news coverage but text-to-speach algorythms have gotten really advanced over the past few decades. It's gotten to the point where I honestly think people have a disconnect with the fact that their favorite digital assistants are using those algorithms.

    AFIAK and IMHO, Apple already has some of the best tech. If you've got an iPhone you should really look at the non-default voices you can choose for Siri and have it dictate some arbitrary text. I particularly like American English Voice 3, which sounds almost exactly like Al Letson.

    Edit: On second look, it looks like this isn't actually an AI product; they're just developing more specialized voice models for specific genres and having good old fashioned human intervention to fix any 'bugs'. The article doesn't even mention AI anywhere.

    1. [5]
      Greg
      Link Parent
      How do you mean “isn’t an AI product”? I’d been assuming deep neural net trained on their existing audiobook corpus, but I definitely just jumped to that conclusion based on the fact that...

      How do you mean “isn’t an AI product”? I’d been assuming deep neural net trained on their existing audiobook corpus, but I definitely just jumped to that conclusion based on the fact that basically all the major leaps I’ve heard about recently have been ML driven!

      2 votes
      1. [4]
        Akir
        Link Parent
        Isn't it enough that they don't mention AI even once? This is from a company who has been labeling practically everything as AI for the past few years. Beyond that, the page actually does describe...

        Isn't it enough that they don't mention AI even once? This is from a company who has been labeling practically everything as AI for the past few years.

        Beyond that, the page actually does describe what their service is:

        Apple Books digital narration brings together advanced speech synthesis technology with important work by teams of linguists, quality control specialists, and audio engineers to produce high-quality audiobooks from an ebook file.

        Of course I can't say for sure if AI is involved in any capacity, but it seems like they are advertising their human-crafted services rather than just tossing it through their computer system and publishing the result.

        1 vote
        1. [2]
          stu2b50
          Link Parent
          AI is a fairly nebulous term, but generally any model that is a neural network is considered "AI" per modern nomenclature, however silly that may be. And there is 0 question about whether or not...

          AI is a fairly nebulous term, but generally any model that is a neural network is considered "AI" per modern nomenclature, however silly that may be. And there is 0 question about whether or not neural networks were involved - it'd be like wondering if a new car had wheels. Speech synthesis before neural networks barely resembles the modern incarnation, and usually involved cobbling together prerecorded syllables with some DSP to smooth things over.

          It's also widely reported as AI. For instance

          Gizmodo, Apple Quietly Rolls Out AI-Narrated Audiobooks

          The Verge, Apple Books quietly launches AI-narrated audiobooks

          Techcrunch, Apple launches AI-powered book narrations

          Engagdet, Apple's new audiobook narration service uses AI voices

          3 votes
          1. Adys
            Link Parent
            It sounds to me like apple is tacitly acknowledging that AI is re-becoming a term that focuses on the intelligence aspect. Deploying ML for things like voice generation, face recognition, motion...

            It sounds to me like apple is tacitly acknowledging that AI is re-becoming a term that focuses on the intelligence aspect.

            Deploying ML for things like voice generation, face recognition, motion detection and what have you; that isn’t what people think of when they think “AI”. Now that we have AIs that can supplant artists and writers, the term is moving towards describing tasks that actually focus more on the intelligence aspect, than the domain specific problem solve.

        2. Greg
          Link Parent
          Oh, I'd understood it to mean that the "advanced speech synthesis technology" is the AI improvement here that makes the whole thing plausible. To me, having humans in the loop doesn't undermine...

          Oh, I'd understood it to mean that the "advanced speech synthesis technology" is the AI improvement here that makes the whole thing plausible. To me, having humans in the loop doesn't undermine that - it's something we've been doing in other areas at my own company over the last few months, bringing in neural net tools that can get tasks 90% of the way done and then handing them across to a human for cleanup and verification.

          A lot of things become viable to do when they take someone 30 minutes in a way that they wouldn't have been if it took them most of the day, but it's only possible to hand off that work when the tooling passes a certain baseline of quality: nobody wants audiobooks read by the TikTok voice, however much human cleanup you do after the fact.

          I agree with @stu2b50 on the terminology being fuzzy and often a bit silly, though; I was one of those holdouts who insisted on exclusively saying "machine learning" for a good few years, and I stand by that being a much more descriptive term for what's actually going on, but I've had to slightly begrudgingly accept that the common usage ship has sailed and "AI" is more understandable to more people. I'll die on the hill of linguistic prescriptivism before I start saying "on premise", though...

          2 votes
  6. tomf
    Link
    I will gladly replace Scott Brick with this. He’s the worst.

    I will gladly replace Scott Brick with this. He’s the worst.