Disregarding the fact that the CEO apparently admitting they did it, is his voice really so distinct? I don’t know who Jeff Geerling is, and this is the first time I have ever watched one of his...
Disregarding the fact that the CEO apparently admitting they did it, is his voice really so distinct? I don’t know who Jeff Geerling is, and this is the first time I have ever watched one of his videos, but there was nothing in his voice or intonation that made it sound particularly unique to me.
I’m not trying to play down the morals of this topic, but I’m interested in knowing if I was the only one who had these thoughts.
Generally, each voice is thought to be as unique as fingerprints. At least thus far, all voices tested have been different, just as all fingerprints tested have been different. Whether or not...
Whether or not those differences are easily aurally notable is a different question, but I think that generally is more about the listener than the speaker. Some people are good at recognition, and some people are not both for faces and voices. If lots of voices sound the same, one might be experiencing something like Phonagnosia, which is "a disturbance in the recognition of familiar voices and the impairment of voice discrimination abilities in which the affected individual does not suffer from comprehension deficits".
Personally, I don't have a hard time distinguishing voices. This guy's voice sounds unique to me, so I think that someone stealing his vocal footprint is pretty bad, especially since they admitted that they just did it for traction. It's reminiscent of Altman stealing Scarlet Johansson's voice, and I would hope that we will see some precedent set that would prevent it in the future.
Do you know enough about transformation between analog to digital to tell me if there are any breadcrumbs or other signs left behind to know if this process has happened ? What I’m wondering is:...
Do you know enough about transformation between analog to digital to tell me if there are any breadcrumbs or other signs left behind to know if this process has happened ?
What I’m wondering is: at some point someone has to record from an analog to digital setting (microphone to drive (?)). This process must look distinctly different, in the data, than something created “100% digitally” with no analog component. Or am I misunderstanding some critical component of how audio is captured and created ?
The voice generation here is almost certainly a machine learning model, trained on other recordings of voices. If there are any artifacts from the process of digitally recording voices, they’ll be...
The voice generation here is almost certainly a machine learning model, trained on other recordings of voices. If there are any artifacts from the process of digitally recording voices, they’ll be replicated along with the actual signal.
There may be artifacts that the machine learning model produces, but these are unintentional and will be fixed by a better model.
I was asking a question about the voice acquisition process in general, not this specific instance. Do you know if it is possible to know if something was created “analogly” vs “digitally” only ?...
I was asking a question about the voice acquisition process in general, not this specific instance.
Do you know if it is possible to know if something was created “analogly” vs “digitally” only ?
I suppose I’m asking a dumb question because any recording of a voice(regardless of source) is inherently digital ?
The problem is what the machine learning model is programmed to try and replicate. It isn’t trying to replicate the general concept of voice, it’s replicating specific voice samples it was trained...
The problem is what the machine learning model is programmed to try and replicate. It isn’t trying to replicate the general concept of voice, it’s replicating specific voice samples it was trained on. If there exists some marker like you suppose, the ML model will happily replicate those exact markers.
Map makers used to add things called “trap streets”. They were streets or other map markings that didn’t exist in the real world. If another map maker did their own research, their maps would have those trap streets because they didn’t actually exist. But if other map makers simply copied the first map maker, they would have the trap streets. That way the first map maker would know the second map maker copied them.
I want to make an analogy here, but it doesn’t quite work perfectly, so stick with it. Let’s say all human made maps had a particular trap street. If you generate a map using satellite images, that trap street would not show up because it doesn’t exist. Therefore if that trap street exists, you know the map was created by a human map maker, not a program from satellite images. Now someone creates an AI to build maps. All of the training data is from the human maps, not the satellite maps. So all of its training data includes this trap street. The AI will happily output maps that include that trap street. Therefore if a map has that trap street, you can no longer be sure that it was created by a human.
In that metaphor, the human trap street is your supposition that something exists in analog to digital recordings that a pure digital recording can’t produce. That may very well be true. But the AI models aren’t producing a pure digital output. They are producing an output that emulates the input. If the inputs have that something, then the outputs will too.
I did have similar thoughts. I had never heard of the guy or the company that stole his voice. It is clear from the article that in that little subculture, his name and voice mean something...
I did have similar thoughts. I had never heard of the guy or the company that stole his voice. It is clear from the article that in that little subculture, his name and voice mean something though, and that the company sought to profit from using his voice without his permission.
I think the smallness of this is actually more disturbing to me. Normal people just don't have the resources to fight this sort of thing, and lack the notoriety to make others care. So the thieves will get away with it until it is just commonplace and one more bit of our person over which we no longer have authority.
(Edit: removed apostrophe)
I expect it'll be trivial to create blends of two+ voices instead of just copying one. Considering two different kinds of intentions - cheap commercial content, and leveraging the likeness of...
I expect it'll be trivial to create blends of two+ voices instead of just copying one. Considering two different kinds of intentions - cheap commercial content, and leveraging the likeness of someone to trade off their brand - I think the former is going to be pervasive. For the latter, California just passed a bill trying to address this, and yeah, per usual smaller creatives likely won't fare well.
Yep. He carries weight. When looking for a 10GbE switch with SFP+/RJ45 interfaces I ran into this one on Amazon: https://www.amazon.com/dp/B0CR5S161P?ref=ppx_yo2ov_dt_b_fed_asin_title Jeff left a...
No, you're not, and in general for "voices", it's really an open question whether or not "voices" are "things" that people can own (and therefore be stolen). Actually, in general they aren't. A...
No, you're not, and in general for "voices", it's really an open question whether or not "voices" are "things" that people can own (and therefore be stolen). Actually, in general they aren't.
A "voice" isn't a specific sound. It's, in effect, a class of sounds - your "voice" can make an infinite amount of sounds. Do you own all those sounds? What about two people who sound alike? What if you intentionally make your voice sound like another voice? It is still a possible output of your "voice", after all. Is an Elvis impressionist "stealing" something from him? The "Elvis voice" is just possible outputs from two people's voicebox, why does one own it and one doesn't?
If you invent an instrument, do you have copyright over all sounds that instrument can make? Generally, no.
Voices traditionally have not been copyrightable for that reason.
The AI used part of his videos (copyright infringement) and is using his voice (right to publicity infringement) and his voice is unique enough for a fan to mistake the fake for him. Voices are...
The AI used part of his videos (copyright infringement) and is using his voice (right to publicity infringement) and his voice is unique enough for a fan to mistake the fake for him. Voices are not instruments, they're part of what makes you unique and not possible to perfectly copy by others using their own voice.
Elvis was a good example. If you used Elvis songs to create something new, that's a copyright infringement as it uses copyrighted material. An impersonator isn't a copy, they use his likeness which is a violation of Right to Publicity which doesn't apply to musical instruments. Right to Publicity means you cannot profit off someone's likeness without their permission and it extends postmortem for 70 years. The Elvis Presley estate actively goes after impersonators who profit off Elvis without paying them for a license.
There are some special cases, though. There was one piece of precedent where a company wanted a specific voice, they refused, then hired a sound alike. That company lost the case. So basically,...
There are some special cases, though. There was one piece of precedent where a company wanted a specific voice, they refused, then hired a sound alike. That company lost the case.
So basically, you can't show intent to want to use someone's voice but work around it later on.
That wasn't ownership of the voice, but impersonation that they lost the case for. Basically, if you copy Taylor Swift's voice to try to make it seem like Taylor Swift is saying something (like...
That wasn't ownership of the voice, but impersonation that they lost the case for.
Basically, if you copy Taylor Swift's voice to try to make it seem like Taylor Swift is saying something (like endorsing you for president), that's not OK, or at least it can be argued as such.
If you copy Taylor Swift's voice to try to make a new hit single that is not associated with Taylor Swift - is what it is, legally.
I don't think the subject matter has much impact on whether or not it's considered impersonation. The case I'm referencing was simply over Ford singing about selling cars. Not as serious as a...
if you copy Taylor Swift's voice to try to make it seem like Taylor Swift is saying something (like endorsing you for president), that's not OK, or at least it can be argued as such.
I don't think the subject matter has much impact on whether or not it's considered impersonation. The case I'm referencing was simply over Ford singing about selling cars. Not as serious as a political endorsement, but enough to be considered impersonation and lose.
What this topic's concern over does indeed seem to fall under impersonation. Especially since Geerling did in fact work with Elecrow at one point. That's where this gets murky.
If you copy Taylor Swift's voice to try to make a new hit single that is not associated with Taylor Swift - is what it is, legally.
Thinking about it in a cynical way: You can copy TSwift's voice until you outright say "yeah I wanted to use her voice but I couldn't". And then you're in deep legal troubles. That's what made the Johannsen debacle so terrible from a legal perspective. Altman allegedly wanted to use her voice but she denied the job.
But honestly, the impersonation from Trump was bad enough that we probably should just clamp down hard on this.
Voices should just be protected the same way faces are. It's the audible equivalent. If you can't use someone's image in a certain way it makes sense to not be allowed to use their voice that way...
Voices should just be protected the same way faces are. It's the audible equivalent. If you can't use someone's image in a certain way it makes sense to not be allowed to use their voice that way either.
I have watched Jeffs videos before and I thought it was pretty clear. That being said, I found out through Jeffs video, so I was primed to believe that, but still.
I have watched Jeffs videos before and I thought it was pretty clear. That being said, I found out through Jeffs video, so I was primed to believe that, but still.
As usual, society (and all of society's frameworks, including legal, moral, and other general expectation frameworks) are lagging behind advancement. I was at a convention recently. One of the...
As usual, society (and all of society's frameworks, including legal, moral, and other general expectation frameworks) are lagging behind advancement.
I was at a convention recently. One of the panels was a video game voice actor panel. Those VGVAs are currently on strike, specifically over AI voice technology. Which is fine; go Union, labor strong, I mean that sincerely. But one of the VGAVAs really rubbed me the wrong way.
Because she had been selected as their spokesperson over the strike issue for the panel. She said so, while the others at the table nodded and applauded her. So first thing, straight off the bat, she sits down and is like "we don't want to put the genie back in the bottle, but..." and proceeded to explain for the next five minutes how they wanted to do exactly that. Ban voice technology, ban any technology for a voice that isn't a human talking into a microphone.
So it's obviously a complicated issue. Insofar as there's no solution that doesn't result in damage. The question is, what kind of damage.
"Stealing" someone's exact voice seems pretty wrong to me. If you want Mark Meers or James Earl Jones (RIP), then hire that person. Don't try to hire them, find out they want more than you're willing to pay, and then fake their voice. Same for likeness. If you want Keanu Reeves or Julia Roberts, then hire Reeves or Roberts; don't digitally create them and expect them to just accept that.
However, if you're happy with a Jones-esque voice, that I have no problem with. Some deep voice with clear annunciation ... even after you tried and failed to hire Jones, I'm fine with that. My line is "did you create, from scratch in the technology, an artificial voice"? If you did, I'm fine with that. You could have hired a VA, but instead you used AI.
I'm sorry that people who've made a living talking into microphones being recorded are going to see some of their career prospects dry up. And I don't blame them, even a tiny bit, for being angry about it. But they want to literally ban revolutionary technology simply because they want to keep their place in the loop; taking money to do what they do. Even though we now have other ways to do what they do.
The real line in all this, in "AI", is the human factor. If all you want is a background "human" in the movie, right now you have to hire a person to stand there while the camera's rolling. If you have an ad or a game or anything really, and you just need a voice talking, you've had to hire a human to talk. Now, with these newfangled AI techs that are coming online, you have this other option. Give the AI the lines, it speaks them, and off you go.
The trick is, right now, the tech's new. You mostly get what you get when you feed a line into the AI vocalizer and it spits back the spoken version.
Can the AI technology be directed like a human can be directed? Right now, not so much. But, and here's the key that I see very few AI-detractors admit; AI is pretty much the worst it's ever gonna be right now. It'll improve. If creatives need the ability to tweak voices (or body language/etc for a whole human image), that'll get reflected in what the AI developers focus on, and it'll improve. At some point, the director will be able to say "more anger, go again" and get that from the AI, and so on.
So right now, if you want a fully human performance, you still need a human. Voice acting is a skill. There are good voice actors who can be evocative and enthralling, delivering a whole performance with voice alone. There are also voice actors who phone it in.
The phoners are going to be the first ones who lose out. The greatest voice actors are the most likely to continue to have a place in a world where producers for cartoons and games and any other voiceover projects have access to AI voice technology. The voice actors who can really act will bring value to their roles, and will stand out. It's just they won't have a dozen or more other lesser actors supporting them.
That'll have consequences. Right now we see it with writers. Since the 00s Hollywood productions have hired fewer and fewer writers. Some hire only the showrunner and expect that person to write everything. I've always thought it was the absolute stupidest place to cut corners, since paying the writers for a six or nine month stint staffing the writer's room is definitely not the most expensive, burdensome part of a production, but that's what most producers and studios and networks did. Cheap out on writers.
And one of the knock-on effects is it's become harder to find "superstar" writers to step into showrunner roles. Fewer writers working productions means they have fewer opportunities to get experience, make connections, and move up with both to become the next generation.
That'll happen with VAs. Inevitably. VAs don't move into showrunner roles, but they'll have fewer available roles to cut their teeth on. Which is unfortunate, but again, genies and bottles. Orchestras and bands used to pay live music, now we have technology that takes one performance and can play it to the entire world, on demand, as often as any one person in the world wants. Who wants to go back to no music unless that live band is following you around so you can hear your song?
Right, didn't think so.
Creative collaborative labor has always been about first impressions and networking. A struggling actor, voice or other, has a few minutes at an audition to make that impression. Maybe ten minutes at a mixer or party or similar to make an impression. That impression, if it's a good one, gets them hired for a creative role. Any era of Hollywood or really any creative industry concentration anywhere in the world, is filled with a staggering number of hopeful-creatives who dreamed of being an actor or writer or some kind of artist who didn't make the impression. And thus, didn't get hired, didn't become famous, didn't change the creative world.
Casting director is one of the unsung hard jobs in the creative industry. You have to bring in dozens and dozens of people to find someone for a single role, and usually you're "settling" for the best fit, rather than the rarity where you find the perfect no notes oh-my-God you're born to play this.
So does AI technology make it harder to be an actor, a voice actor, any creative? Yeah. It'll inevitable reduce all the mundane roles. Take them off the table, because the AI will be generating the voice or the background human. And it's a little sad, yes, because there are countless stories of jobber actors who spent sometimes decades just grinding out day player parts as "Man #3" or "Woman in the Shop" before they finally made that impression that elevated them so they could become known.
Creative work has never been about what's fair. It's about what audiences respond to. Just because you wrote it, or filmed it, or drew it, or painted it, or voiced it, or performed it; just because you did the thing doesn't mean it's a thing audiences will adore. Casting is not about "fair", it's about what the creatives in charge of the project think will best suit their project.
Lord of the Rings famously used CGI orcs for the Twin Towers siege. Sure they had camera ready actors dressed up in orc garb for close shots, but the scenes were sold by thousands upon thousands of "orcs" besieging Helm's Deep.
Not even a decade before, a project like Braveheart had to hire 1600 members of the Irish Army to populate battle scenes. Just because if you wanted a human in frame, you had to have an actual human to put in frame. And Braveheart (along with most any similar film) still had to use camera tricks to replicate those 1600 into many more for some of the largest sequences in the story.
We're fast approaching a creative place where you can just wave a tech wand and poof, a human's in frame. When Weta Digital created the Helm's Deep sequences, I don't recall the sky falling. Sure they had five hundred extras as orcs, but five hundred isn't thousands. Further, the only thing I remember hearing about those sequences was "amazing" and "awesome" and stuff like that. Audiences loved the result that went into the film, felt it sold the feeling of a tiny little holdout fortress being assaulted by an overwhelming number of orcs.
Which was the point. To make the audience feel, see, believe, exactly that.
We're not at the point where a fully digital film will spring forth from one creator sitting at a computer, manipulating a very mature version of these early stage AI technologies. Where one person can have the vision, write the script and story, tweak and adjust the performances, and polish it all into a final result that others look at and are like "wow, very nice." But it'll happen. The technology is the worst it'll ever be right now. But that's the thing about technology.
The ideal solution is they get residuals whenever their voice gets used or sampled, similar to the music industry. They benefit from past work by sitting around (or compound funds with other...
But they want to literally ban revolutionary technology simply because they want to keep their place in the loop; taking money to do what they do. Even though we now have other ways to do what they do.
The ideal solution is they get residuals whenever their voice gets used or sampled, similar to the music industry. They benefit from past work by sitting around (or compound funds with other work), and businesses get significantly less expenses (in theory) from synthesizing voices instead of going through the process of recruiting, directing, and editing VA sessions. Win/win.
But since businesses would rather die than consider residuals, I do tend to lean more on the side of heavily regulating the technology. technology is made to make people's lives easier, and companies made it clearer than ever that they simply want to steal from society. Rotten apples in a barrel.
If all you want is a background "human" in the movie, right now you have to hire a person to stand there while the camera's rolling. If you have an ad or a game or anything really, and you just need a voice talking, you've had to hire a human to talk. Now, with these newfangled AI techs that are coming online, you have this other option. Give the AI the lines, it speaks them, and off you go.
That's what makes it different from other tech that sought to improve instead of iterate. A car doesn't need horses to operate, a computer didn't necessarily require a typewriter.
In this case, you can in fact generate a random human to stand there. They don't because it's cheaper to pay a person $1000 (I'm high balling it to sell the point) or something than pay a small team of 5 artists maybe $20/hr to spend 20 hours (I'm low balling it to sell the point) making a realistic enough looking person. Likewise, TTY's voice samples were already paid for. Technology scales, so that sort of CG work doesn't make sense until you get to the Twin towers level of scaling. Even then, it's still some team getting paid (probably not enough, but still contributing to the economy).
AI, Not so much. Some tech company scraped media from a time that could not consent to scraping (heck, some companies already didn't consent to robots.txt), then sold it as a package to some company that wanted to go around the human element despite clearly needing it. You can argue the AI company making money, but I also lived through the 2010's and know exactly how the playbook is going to play:
run super cheap AI tooling, at non-sustainable levels to box out competition
capture the market and make them rely on it
give the tools less features and paywall previously included features
Make the money later on with little competition to oppose you
Likely, lay off 90% of the workers who helped make your product that monopoly with a core crew and lower labor maintaining.
No one wins here except some coporate executives in the end. I'd rather not sit by again and let them endhittify yet another piece of technology. Maybe I can't do anything but I won't just quietly sit by this time.
Who wants to go back to no music unless that live band is following you around so you can hear your song?
This is also different. portable music did not in fact replace the singer, it became a commodity. Built their brand and got people excited to see the real thing live. Concerts for live music sell out so fast these days that ticket scalping is not just commonplace, but expected. And that got bad enough that the law is looking into it as we speak with Ticketmaster's monopoly. The real shame is how artists almost make no money from that concert, but that's another rabbit hole for another day.
Creative work has never been about what's fair. It's about what audiences respond to.
you can't control how audiences respond, but you can control how that labor is paid. It's tough for indies (always has been), but those aren't the people utilizing AI on the scale the large companies are. People failing to get in may honestly be better than getting in and having the life choked out of you because no one wants to "be fair" to their labor.
If that affects consumers, that's unfortunate. But that's the cost of business. But creatives who do qualify also need to pay rent and survive.
This is revisionism. Music recordings did, in fact, replace in-person music, for a sufficient definition of "replace". Just the same way that photography replaced portraits, and automobiles...
This is also different. portable music did not in fact replace the singer
This is revisionism. Music recordings did, in fact, replace in-person music, for a sufficient definition of "replace". Just the same way that photography replaced portraits, and automobiles replaced horse-drawn carriages, for a sufficient definition of "replace". In each case, the new technology increased the availability of the underlying service, but it also obviated the old technology. So it will likely be with anything that can be generated with AI---that is to say, eventually, anything. Human-created art is being replaced.
We're clearly talking on different frequencies, so "revisionism" is an unnecessary swipe. Sure, In the same way TV "replaced" the movie theatre, you can argue that "portable music replaced...
Music recordings did, in fact, replace in-person music, for a sufficient definition of "replace"
We're clearly talking on different frequencies, so "revisionism" is an unnecessary swipe.
Sure, In the same way TV "replaced" the movie theatre, you can argue that "portable music replaced concerts". That can be a sufficient definition. But I was talking in a financial sense, not a holistic one. I didn't want to be dramatic and say "killed".
In each case, the new technology increased the availability of the underlying service, but it also obviated the old technology.
evidence seems to suggest against the notion that concerts are declining in the 21st century:
Took a worldwide pandemic to do more damage than portable music probably will ever do. and even then it recovered and soared in 2 years (other venues like theatres cannot claim the same). So I would not colloquially say that "portable music replaced concerts".
But sure, I wouldn't use concerts to prove that AI won't replace human anything. It does give a nugget of suggestion that enough people care about the human element to still make some sectors profitable though.
Apologies for my harshness. This is an emotional topic for me, basically because I feel like the vast majority of people are living in (partially-willful) denial about an unpleasant reality. But...
Apologies for my harshness. This is an emotional topic for me, basically because I feel like the vast majority of people are living in (partially-willful) denial about an unpleasant reality.
But talking solely about concerts arguably demonstrates my point. Before the existence of recorded music, the majority of live music wouldn't have taken the form of a concert. I'm thinking of things like amateurs playing live music at parties, or a saloon piano player, or street performers. Sure, those things still exist, but they're shadows of what they used to be. Recorded music essentially killed all of those forms.
Well it's been decades since we've had radio so it's hard to say how big an impact it was from first person experience. But I more feel the idea of live music shifted rather than became obsolete....
I'm thinking of things like amateurs playing live music at parties, or a saloon piano player, or street performers. Sure, those things still exist, but they're shadows of what they used to be.
Well it's been decades since we've had radio so it's hard to say how big an impact it was from first person experience. But I more feel the idea of live music shifted rather than became obsolete.
There are still live music nights at some cafes, even in my small suburb. But It's not expected; it's usually something that marks either a higher end cafe, or one that simply wants to use a certain vibe. Likewise, you may not have a musician for a normal birthday party, but for a sweet 16 or Quinceñera you would consider some live music.
Live music for those venues both devalued and became considered a luxury at the same time, so it's an interesting paradox. It will be harder for a new musician to find those gigs, but it's usually a great sign if you can get signed up, as opposed to being an expected thing. That's one reason concerts still sell out to this day. That human value is still valuable, even if the competition has become much fiercer.
Yeah, I'm honestly getting really frustrated about all the people saying "forget these people, this is the future at stake!" People are getting real harm when their voice is stolen. You have to...
Yeah, I'm honestly getting really frustrated about all the people saying "forget these people, this is the future at stake!" People are getting real harm when their voice is stolen.
You have to pay money to use Hatsune Miku. You can buy the software behind the voice but still not be able to use Miku's voice because that voice belongs to someone who gets paid when people buy the rights to use that voice. Does Saki Fujita not deserve to be paid for a voice that millions of people have listened to and enjoyed?
If we have a copyright system that protects giant corporations and not individuals, then the copyright system should be torn down and replaced with something that does.
Mostly just raising awareness for the software. And you are right, it wasn't exactly relevant to your comment, but it is relevant to the larger discussion. Damage can be done to people from merely...
Mostly just raising awareness for the software. And you are right, it wasn't exactly relevant to your comment, but it is relevant to the larger discussion.
Damage can be done to people from merely approximating their voice. But to the larger "artificial voices replacing real people's jobs" discussion, I think I've provided a good example of Hatsune Miku doing just fine despite free alternatives.
I do not think it is necessary nor inevitable that society adapts its moral and legal frameworks to permit whatever technological changes money is able to produce and push. We've done that too...
As usual, society (and all of society's frameworks, including legal, moral, and other general expectation frameworks) are lagging behind advancement.
I do not think it is necessary nor inevitable that society adapts its moral and legal frameworks to permit whatever technological changes money is able to produce and push. We've done that too many times already. We can just turn on it this time. Looking to the future you point at - is an infinity of creative work, produced with the minimum price possible paid to people to make it but sold to the maximum number of people, actually good? Or is it happening anyway and "good" must simply change to encompass it when it comes?
I like creative works, so that seems like a good thing to me. If I can't distinguish between a generative AI and a human-made work, it makes zero difference to me. The ability to generate whatever...
I like creative works, so that seems like a good thing to me. If I can't distinguish between a generative AI and a human-made work, it makes zero difference to me. The ability to generate whatever art I desire as long as a I can describe it good enough is valuable to me as someone that enjoys art.
Of course, it makes a world of difference to creatives that want to profit off their work, but it doesn't harm their ability to create. I don't buy the philosophical arguments that art only derives value from human creative work because natural vistas can be just as beautiful.
I’m with you. I feel art, and other aspects of life need to be de-pretensioned (if that’s a word). Many people are fine with a “knock-off” until they find out it’s a knock-off. I accept the...
I’m with you. I feel art, and other aspects of life need to be de-pretensioned (if that’s a word).
Many people are fine with a “knock-off” until they find out it’s a knock-off. I accept the phenomenon and feelings are real, but doesn’t make it any less absurd.
It harkens to things like - sommeliers not being able to tell red wine from red tinted wine, when double blinded in tests, or things like the knock-off oil painting capital of the world, Dafen Oil Painting Village, or knock off Eames Chairs that still sell for thousands of dollars. People are weird and love the stories we tell ourselves to give art and our lives value.
I'm sympathetic to the philosophical argument but I'm not making that argument, I'm making a practical argument. For one thing it seems like for these machines to create better and better art -...
I'm sympathetic to the philosophical argument but I'm not making that argument, I'm making a practical argument. For one thing it seems like for these machines to create better and better art - more expressive, more skilled, more natural to the human eye, closer to the request - requires more and more art made by actual humans to train it. That's labour which should be fairly compensated.
Second and more broadly, I'm not saying that we might prefer creative works made by humans because the works made by AI are empty and soulless, I'm saying we might prefer it because there are humans who can do it and want to be paid to do it. I'm very, very sceptical that the human labour freed up from market demands for art will be employed for equal or comparable pay to that given for the artwork. Artists have a variety of knowledge and skills which they'd probably prefer to use to get paid, than learning to code or whatever. In this future world where AI makes all art, no person could get paid for making art even if they wanted to and were good enough. Of course they could make art as a hobby and for a lot of people, that's enough - a whole lot of artists are already in this position, whatever their skillset, where they have to make art in the time they aren't working. But we'd lose all the artists who are, through the ability to sell their art, able to make art their full-time vocation. I think it is good that artists can get paid to make art.
(I wonder what would happen to art residencies and grants and fellowships and other funds, that allow artists to exercise their creative skill without having to worry about commerce at all? Because I think that's better, and I'm even more sceptical it'd continue to exist when AI can generate art of the highest calibre.)
Finally, of course, these art-making machines will be vastly cheaper to operate than paying people for their labour. Most of those savings are going to accrete to the companies that own them, and most of the AI models will be locked up by those comapnies so that you still have to pay to generate the images. For all the elimination of scarcity of art, it still won't be anything close to free. Because why would it? The goal isn't to usher in an age where anyone can realise their imagination with the help of a computer, it's to make money by fulfilling a demand (in this case, for art) for the highest price with the lowest costs possible. And I'm really dubious how that's any better than the state we're in now where I have to pay a person to make me art I couldn't make myself.
I will say though my actual point wasn't as far as "this is the view we as a society must take about AI producing creative works", it was just that the fact that AI can, or will be able to, replace artists doesn't make it inevitable. We should think about the societal effects it will have, how we want to address them, or if we want to reject them. We shouldn't just let it happen like it's a force of nature, because it's not. To take a quote from the original comment-
This feels like a very capitalist mindset where you're concerned about people being able to profit off what they like doing. That's a valid concern, but on a societal scale it would be better to...
This feels like a very capitalist mindset where you're concerned about people being able to profit off what they like doing. That's a valid concern, but on a societal scale it would be better to free up resources and make things cheaper and easier to produce. Solar panels put coal miners out of the job, and the world is better for it because cheap electricity makes everything more affordable.
Living standards increase as our productivity increases. Based on what I currently know, generative models are completely incapable of handling a full end-to-end film production on their own. There are still humans involved in every step of the process. If anything, my artist friends say the ability to rapidly generate concepts and references has improved their workflow.
Some of the demand that AI tools are meeting was also going unmet before. I'm not a good artist, so I can't quickly make a photorealistic cowboy wizard. With the tools we have now, I can generate some okay concept images that spark my imagination. I wouldn't have paid a freelancer to draw what I want, so no-one lost any work. From what I can tell, furry artists still get plenty of commissions too. If anything, some people value human made art even more now. Handmade things can develop a true luxury market when the cheap stuff becomes widely accessible.
Just want to spread some awareness on this point. What you're talking about is already here. I have used ElevenLabs (mentioned by Jeff Geerling in his followup post). They have a Speech to Speech...
Can the AI technology be directed like a human can be directed? Right now, not so much. But [AI will] improve. If creatives need the ability to tweak voices (or body language/etc for a whole human image), [...] At some point, the director will be able to say "more anger, go again" and get that from the AI, and so on.
So right now, if you want a fully human performance, you still need a human.
Just want to spread some awareness on this point. What you're talking about is already here.
I have used ElevenLabs (mentioned by Jeff Geerling in his followup post). They have a Speech to Speech feature. You can provide voice input, and have the AI tech output voice output, but a different voice, including all the intonation, emotion, pace, etc. So, with respect to what I quoted above, you can hire a single (or very, very few) high-calibre voice actor(s), then have them voice all the lines of the N characters in your creative work. A middle-aged male voice actor provides the input, and out comes, say, the voice of the 10-year-old daughter of the main character. Differences between the input voice and output voice don't really matter, whether difference in age, timbre, pitch, or accent.
I’ve been saying for a little while that one of the major things still being figured out in ML-based tech is the UI, and that’s a great example of it starting to get solved! Right now a lot of...
I’ve been saying for a little while that one of the major things still being figured out in ML-based tech is the UI, and that’s a great example of it starting to get solved!
Right now a lot of platforms are still in “text to X” mode, where X could be speech, image, other text, whatever. Problem is, I’d be hard pressed to accurately direct a human through text alone to capture the nuance of what I wanted, let alone a machine. Visual work needs the ability to sketch ideas or highlight errors or just physically point at the bit to change, motion needs storyboards and camera control and the ability to adjust in 3D space, and voice needs the ability to audibly say “no, try it like this…”.
It’s partly a technical barrier in the sense that the non-text input needs to be understood by the generative model, but that’s one where the research papers and open source demos have been pretty consistently ahead of the commercial products. The next step is getting those technical capabilities into actual usable products rather than proof of concept code and opaque Gradio wrappers - and I think we’re in for another wave of progress in the field as people find ways to interact with these models that reduce the significant number of failures coming from inadequate communication with the underlying models.
Disregarding the fact that the CEO apparently admitting they did it, is his voice really so distinct? I don’t know who Jeff Geerling is, and this is the first time I have ever watched one of his videos, but there was nothing in his voice or intonation that made it sound particularly unique to me.
I’m not trying to play down the morals of this topic, but I’m interested in knowing if I was the only one who had these thoughts.
Generally, each voice is thought to be as unique as fingerprints. At least thus far, all voices tested have been different, just as all fingerprints tested have been different.
Whether or not those differences are easily aurally notable is a different question, but I think that generally is more about the listener than the speaker. Some people are good at recognition, and some people are not both for faces and voices. If lots of voices sound the same, one might be experiencing something like Phonagnosia, which is "a disturbance in the recognition of familiar voices and the impairment of voice discrimination abilities in which the affected individual does not suffer from comprehension deficits".
Personally, I don't have a hard time distinguishing voices. This guy's voice sounds unique to me, so I think that someone stealing his vocal footprint is pretty bad, especially since they admitted that they just did it for traction. It's reminiscent of Altman stealing Scarlet Johansson's voice, and I would hope that we will see some precedent set that would prevent it in the future.
Do you know enough about transformation between analog to digital to tell me if there are any breadcrumbs or other signs left behind to know if this process has happened ?
What I’m wondering is: at some point someone has to record from an analog to digital setting (microphone to drive (?)). This process must look distinctly different, in the data, than something created “100% digitally” with no analog component. Or am I misunderstanding some critical component of how audio is captured and created ?
The voice generation here is almost certainly a machine learning model, trained on other recordings of voices. If there are any artifacts from the process of digitally recording voices, they’ll be replicated along with the actual signal.
There may be artifacts that the machine learning model produces, but these are unintentional and will be fixed by a better model.
I was asking a question about the voice acquisition process in general, not this specific instance.
Do you know if it is possible to know if something was created “analogly” vs “digitally” only ?
I suppose I’m asking a dumb question because any recording of a voice(regardless of source) is inherently digital ?
The problem is what the machine learning model is programmed to try and replicate. It isn’t trying to replicate the general concept of voice, it’s replicating specific voice samples it was trained on. If there exists some marker like you suppose, the ML model will happily replicate those exact markers.
Map makers used to add things called “trap streets”. They were streets or other map markings that didn’t exist in the real world. If another map maker did their own research, their maps would have those trap streets because they didn’t actually exist. But if other map makers simply copied the first map maker, they would have the trap streets. That way the first map maker would know the second map maker copied them.
I want to make an analogy here, but it doesn’t quite work perfectly, so stick with it. Let’s say all human made maps had a particular trap street. If you generate a map using satellite images, that trap street would not show up because it doesn’t exist. Therefore if that trap street exists, you know the map was created by a human map maker, not a program from satellite images. Now someone creates an AI to build maps. All of the training data is from the human maps, not the satellite maps. So all of its training data includes this trap street. The AI will happily output maps that include that trap street. Therefore if a map has that trap street, you can no longer be sure that it was created by a human.
In that metaphor, the human trap street is your supposition that something exists in analog to digital recordings that a pure digital recording can’t produce. That may very well be true. But the AI models aren’t producing a pure digital output. They are producing an output that emulates the input. If the inputs have that something, then the outputs will too.
I did have similar thoughts. I had never heard of the guy or the company that stole his voice. It is clear from the article that in that little subculture, his name and voice mean something though, and that the company sought to profit from using his voice without his permission.
I think the smallness of this is actually more disturbing to me. Normal people just don't have the resources to fight this sort of thing, and lack the notoriety to make others care. So the thieves will get away with it until it is just commonplace and one more bit of our person over which we no longer have authority.
(Edit: removed apostrophe)
I expect it'll be trivial to create blends of two+ voices instead of just copying one. Considering two different kinds of intentions - cheap commercial content, and leveraging the likeness of someone to trade off their brand - I think the former is going to be pervasive. For the latter, California just passed a bill trying to address this, and yeah, per usual smaller creatives likely won't fare well.
Yep. He carries weight. When looking for a 10GbE switch with SFP+/RJ45 interfaces I ran into this one on Amazon: https://www.amazon.com/dp/B0CR5S161P?ref=ppx_yo2ov_dt_b_fed_asin_title
Jeff left a review on it. I now have one in my network.
No, you're not, and in general for "voices", it's really an open question whether or not "voices" are "things" that people can own (and therefore be stolen). Actually, in general they aren't.
A "voice" isn't a specific sound. It's, in effect, a class of sounds - your "voice" can make an infinite amount of sounds. Do you own all those sounds? What about two people who sound alike? What if you intentionally make your voice sound like another voice? It is still a possible output of your "voice", after all. Is an Elvis impressionist "stealing" something from him? The "Elvis voice" is just possible outputs from two people's voicebox, why does one own it and one doesn't?
If you invent an instrument, do you have copyright over all sounds that instrument can make? Generally, no.
Voices traditionally have not been copyrightable for that reason.
The AI used part of his videos (copyright infringement) and is using his voice (right to publicity infringement) and his voice is unique enough for a fan to mistake the fake for him. Voices are not instruments, they're part of what makes you unique and not possible to perfectly copy by others using their own voice.
Elvis was a good example. If you used Elvis songs to create something new, that's a copyright infringement as it uses copyrighted material. An impersonator isn't a copy, they use his likeness which is a violation of Right to Publicity which doesn't apply to musical instruments. Right to Publicity means you cannot profit off someone's likeness without their permission and it extends postmortem for 70 years. The Elvis Presley estate actively goes after impersonators who profit off Elvis without paying them for a license.
There are some special cases, though. There was one piece of precedent where a company wanted a specific voice, they refused, then hired a sound alike. That company lost the case.
So basically, you can't show intent to want to use someone's voice but work around it later on.
That wasn't ownership of the voice, but impersonation that they lost the case for.
Basically, if you copy Taylor Swift's voice to try to make it seem like Taylor Swift is saying something (like endorsing you for president), that's not OK, or at least it can be argued as such.
If you copy Taylor Swift's voice to try to make a new hit single that is not associated with Taylor Swift - is what it is, legally.
I don't think the subject matter has much impact on whether or not it's considered impersonation. The case I'm referencing was simply over Ford singing about selling cars. Not as serious as a political endorsement, but enough to be considered impersonation and lose.
What this topic's concern over does indeed seem to fall under impersonation. Especially since Geerling did in fact work with Elecrow at one point. That's where this gets murky.
Thinking about it in a cynical way: You can copy TSwift's voice until you outright say "yeah I wanted to use her voice but I couldn't". And then you're in deep legal troubles. That's what made the Johannsen debacle so terrible from a legal perspective. Altman allegedly wanted to use her voice but she denied the job.
But honestly, the impersonation from Trump was bad enough that we probably should just clamp down hard on this.
Voices should just be protected the same way faces are. It's the audible equivalent. If you can't use someone's image in a certain way it makes sense to not be allowed to use their voice that way either.
Yeah. There's a big difference between "is not currently illegal" and "should not be legal".
I have watched Jeffs videos before and I thought it was pretty clear. That being said, I found out through Jeffs video, so I was primed to believe that, but still.
As usual, society (and all of society's frameworks, including legal, moral, and other general expectation frameworks) are lagging behind advancement.
I was at a convention recently. One of the panels was a video game voice actor panel. Those VGVAs are currently on strike, specifically over AI voice technology. Which is fine; go Union, labor strong, I mean that sincerely. But one of the VGAVAs really rubbed me the wrong way.
Because she had been selected as their spokesperson over the strike issue for the panel. She said so, while the others at the table nodded and applauded her. So first thing, straight off the bat, she sits down and is like "we don't want to put the genie back in the bottle, but..." and proceeded to explain for the next five minutes how they wanted to do exactly that. Ban voice technology, ban any technology for a voice that isn't a human talking into a microphone.
So it's obviously a complicated issue. Insofar as there's no solution that doesn't result in damage. The question is, what kind of damage.
"Stealing" someone's exact voice seems pretty wrong to me. If you want Mark Meers or James Earl Jones (RIP), then hire that person. Don't try to hire them, find out they want more than you're willing to pay, and then fake their voice. Same for likeness. If you want Keanu Reeves or Julia Roberts, then hire Reeves or Roberts; don't digitally create them and expect them to just accept that.
However, if you're happy with a Jones-esque voice, that I have no problem with. Some deep voice with clear annunciation ... even after you tried and failed to hire Jones, I'm fine with that. My line is "did you create, from scratch in the technology, an artificial voice"? If you did, I'm fine with that. You could have hired a VA, but instead you used AI.
I'm sorry that people who've made a living talking into microphones being recorded are going to see some of their career prospects dry up. And I don't blame them, even a tiny bit, for being angry about it. But they want to literally ban revolutionary technology simply because they want to keep their place in the loop; taking money to do what they do. Even though we now have other ways to do what they do.
The real line in all this, in "AI", is the human factor. If all you want is a background "human" in the movie, right now you have to hire a person to stand there while the camera's rolling. If you have an ad or a game or anything really, and you just need a voice talking, you've had to hire a human to talk. Now, with these newfangled AI techs that are coming online, you have this other option. Give the AI the lines, it speaks them, and off you go.
The trick is, right now, the tech's new. You mostly get what you get when you feed a line into the AI vocalizer and it spits back the spoken version.
Can the AI technology be directed like a human can be directed? Right now, not so much. But, and here's the key that I see very few AI-detractors admit; AI is pretty much the worst it's ever gonna be right now. It'll improve. If creatives need the ability to tweak voices (or body language/etc for a whole human image), that'll get reflected in what the AI developers focus on, and it'll improve. At some point, the director will be able to say "more anger, go again" and get that from the AI, and so on.
So right now, if you want a fully human performance, you still need a human. Voice acting is a skill. There are good voice actors who can be evocative and enthralling, delivering a whole performance with voice alone. There are also voice actors who phone it in.
The phoners are going to be the first ones who lose out. The greatest voice actors are the most likely to continue to have a place in a world where producers for cartoons and games and any other voiceover projects have access to AI voice technology. The voice actors who can really act will bring value to their roles, and will stand out. It's just they won't have a dozen or more other lesser actors supporting them.
That'll have consequences. Right now we see it with writers. Since the 00s Hollywood productions have hired fewer and fewer writers. Some hire only the showrunner and expect that person to write everything. I've always thought it was the absolute stupidest place to cut corners, since paying the writers for a six or nine month stint staffing the writer's room is definitely not the most expensive, burdensome part of a production, but that's what most producers and studios and networks did. Cheap out on writers.
And one of the knock-on effects is it's become harder to find "superstar" writers to step into showrunner roles. Fewer writers working productions means they have fewer opportunities to get experience, make connections, and move up with both to become the next generation.
That'll happen with VAs. Inevitably. VAs don't move into showrunner roles, but they'll have fewer available roles to cut their teeth on. Which is unfortunate, but again, genies and bottles. Orchestras and bands used to pay live music, now we have technology that takes one performance and can play it to the entire world, on demand, as often as any one person in the world wants. Who wants to go back to no music unless that live band is following you around so you can hear your song?
Right, didn't think so.
Creative collaborative labor has always been about first impressions and networking. A struggling actor, voice or other, has a few minutes at an audition to make that impression. Maybe ten minutes at a mixer or party or similar to make an impression. That impression, if it's a good one, gets them hired for a creative role. Any era of Hollywood or really any creative industry concentration anywhere in the world, is filled with a staggering number of hopeful-creatives who dreamed of being an actor or writer or some kind of artist who didn't make the impression. And thus, didn't get hired, didn't become famous, didn't change the creative world.
Casting director is one of the unsung hard jobs in the creative industry. You have to bring in dozens and dozens of people to find someone for a single role, and usually you're "settling" for the best fit, rather than the rarity where you find the perfect no notes oh-my-God you're born to play this.
So does AI technology make it harder to be an actor, a voice actor, any creative? Yeah. It'll inevitable reduce all the mundane roles. Take them off the table, because the AI will be generating the voice or the background human. And it's a little sad, yes, because there are countless stories of jobber actors who spent sometimes decades just grinding out day player parts as "Man #3" or "Woman in the Shop" before they finally made that impression that elevated them so they could become known.
Creative work has never been about what's fair. It's about what audiences respond to. Just because you wrote it, or filmed it, or drew it, or painted it, or voiced it, or performed it; just because you did the thing doesn't mean it's a thing audiences will adore. Casting is not about "fair", it's about what the creatives in charge of the project think will best suit their project.
Lord of the Rings famously used CGI orcs for the Twin Towers siege. Sure they had camera ready actors dressed up in orc garb for close shots, but the scenes were sold by thousands upon thousands of "orcs" besieging Helm's Deep.
Not even a decade before, a project like Braveheart had to hire 1600 members of the Irish Army to populate battle scenes. Just because if you wanted a human in frame, you had to have an actual human to put in frame. And Braveheart (along with most any similar film) still had to use camera tricks to replicate those 1600 into many more for some of the largest sequences in the story.
We're fast approaching a creative place where you can just wave a tech wand and poof, a human's in frame. When Weta Digital created the Helm's Deep sequences, I don't recall the sky falling. Sure they had five hundred extras as orcs, but five hundred isn't thousands. Further, the only thing I remember hearing about those sequences was "amazing" and "awesome" and stuff like that. Audiences loved the result that went into the film, felt it sold the feeling of a tiny little holdout fortress being assaulted by an overwhelming number of orcs.
Which was the point. To make the audience feel, see, believe, exactly that.
We're not at the point where a fully digital film will spring forth from one creator sitting at a computer, manipulating a very mature version of these early stage AI technologies. Where one person can have the vision, write the script and story, tweak and adjust the performances, and polish it all into a final result that others look at and are like "wow, very nice." But it'll happen. The technology is the worst it'll ever be right now. But that's the thing about technology.
It advances.
The ideal solution is they get residuals whenever their voice gets used or sampled, similar to the music industry. They benefit from past work by sitting around (or compound funds with other work), and businesses get significantly less expenses (in theory) from synthesizing voices instead of going through the process of recruiting, directing, and editing VA sessions. Win/win.
But since businesses would rather die than consider residuals, I do tend to lean more on the side of heavily regulating the technology. technology is made to make people's lives easier, and companies made it clearer than ever that they simply want to steal from society. Rotten apples in a barrel.
That's what makes it different from other tech that sought to improve instead of iterate. A car doesn't need horses to operate, a computer didn't necessarily require a typewriter.
In this case, you can in fact generate a random human to stand there. They don't because it's cheaper to pay a person $1000 (I'm high balling it to sell the point) or something than pay a small team of 5 artists maybe $20/hr to spend 20 hours (I'm low balling it to sell the point) making a realistic enough looking person. Likewise, TTY's voice samples were already paid for. Technology scales, so that sort of CG work doesn't make sense until you get to the Twin towers level of scaling. Even then, it's still some team getting paid (probably not enough, but still contributing to the economy).
AI, Not so much. Some tech company scraped media from a time that could not consent to scraping (heck, some companies already didn't consent to robots.txt), then sold it as a package to some company that wanted to go around the human element despite clearly needing it. You can argue the AI company making money, but I also lived through the 2010's and know exactly how the playbook is going to play:
No one wins here except some coporate executives in the end. I'd rather not sit by again and let them endhittify yet another piece of technology. Maybe I can't do anything but I won't just quietly sit by this time.
This is also different. portable music did not in fact replace the singer, it became a commodity. Built their brand and got people excited to see the real thing live. Concerts for live music sell out so fast these days that ticket scalping is not just commonplace, but expected. And that got bad enough that the law is looking into it as we speak with Ticketmaster's monopoly. The real shame is how artists almost make no money from that concert, but that's another rabbit hole for another day.
you can't control how audiences respond, but you can control how that labor is paid. It's tough for indies (always has been), but those aren't the people utilizing AI on the scale the large companies are. People failing to get in may honestly be better than getting in and having the life choked out of you because no one wants to "be fair" to their labor.
If that affects consumers, that's unfortunate. But that's the cost of business. But creatives who do qualify also need to pay rent and survive.
This is revisionism. Music recordings did, in fact, replace in-person music, for a sufficient definition of "replace". Just the same way that photography replaced portraits, and automobiles replaced horse-drawn carriages, for a sufficient definition of "replace". In each case, the new technology increased the availability of the underlying service, but it also obviated the old technology. So it will likely be with anything that can be generated with AI---that is to say, eventually, anything. Human-created art is being replaced.
We're clearly talking on different frequencies, so "revisionism" is an unnecessary swipe.
Sure, In the same way TV "replaced" the movie theatre, you can argue that "portable music replaced concerts". That can be a sufficient definition. But I was talking in a financial sense, not a holistic one. I didn't want to be dramatic and say "killed".
evidence seems to suggest against the notion that concerts are declining in the 21st century:
https://www.statista.com/statistics/193710/concert-revenue-of-live-nation-entertainment-since-2008/
Took a worldwide pandemic to do more damage than portable music probably will ever do. and even then it recovered and soared in 2 years (other venues like theatres cannot claim the same). So I would not colloquially say that "portable music replaced concerts".
But sure, I wouldn't use concerts to prove that AI won't replace human anything. It does give a nugget of suggestion that enough people care about the human element to still make some sectors profitable though.
Apologies for my harshness. This is an emotional topic for me, basically because I feel like the vast majority of people are living in (partially-willful) denial about an unpleasant reality.
But talking solely about concerts arguably demonstrates my point. Before the existence of recorded music, the majority of live music wouldn't have taken the form of a concert. I'm thinking of things like amateurs playing live music at parties, or a saloon piano player, or street performers. Sure, those things still exist, but they're shadows of what they used to be. Recorded music essentially killed all of those forms.
Well it's been decades since we've had radio so it's hard to say how big an impact it was from first person experience. But I more feel the idea of live music shifted rather than became obsolete.
There are still live music nights at some cafes, even in my small suburb. But It's not expected; it's usually something that marks either a higher end cafe, or one that simply wants to use a certain vibe. Likewise, you may not have a musician for a normal birthday party, but for a sweet 16 or Quinceñera you would consider some live music.
Live music for those venues both devalued and became considered a luxury at the same time, so it's an interesting paradox. It will be harder for a new musician to find those gigs, but it's usually a great sign if you can get signed up, as opposed to being an expected thing. That's one reason concerts still sell out to this day. That human value is still valuable, even if the competition has become much fiercer.
Yeah, I'm honestly getting really frustrated about all the people saying "forget these people, this is the future at stake!" People are getting real harm when their voice is stolen.
You have to pay money to use Hatsune Miku. You can buy the software behind the voice but still not be able to use Miku's voice because that voice belongs to someone who gets paid when people buy the rights to use that voice. Does Saki Fujita not deserve to be paid for a voice that millions of people have listened to and enjoyed?
If we have a copyright system that protects giant corporations and not individuals, then the copyright system should be torn down and replaced with something that does.
Enter OpenUTAU and free commercial use licensed voicebanks.
I don’t see how that is relevant. I am talking about a specific person’s voice.
Mostly just raising awareness for the software. And you are right, it wasn't exactly relevant to your comment, but it is relevant to the larger discussion.
Damage can be done to people from merely approximating their voice. But to the larger "artificial voices replacing real people's jobs" discussion, I think I've provided a good example of Hatsune Miku doing just fine despite free alternatives.
I do not think it is necessary nor inevitable that society adapts its moral and legal frameworks to permit whatever technological changes money is able to produce and push. We've done that too many times already. We can just turn on it this time. Looking to the future you point at - is an infinity of creative work, produced with the minimum price possible paid to people to make it but sold to the maximum number of people, actually good? Or is it happening anyway and "good" must simply change to encompass it when it comes?
I like creative works, so that seems like a good thing to me. If I can't distinguish between a generative AI and a human-made work, it makes zero difference to me. The ability to generate whatever art I desire as long as a I can describe it good enough is valuable to me as someone that enjoys art.
Of course, it makes a world of difference to creatives that want to profit off their work, but it doesn't harm their ability to create. I don't buy the philosophical arguments that art only derives value from human creative work because natural vistas can be just as beautiful.
I’m with you. I feel art, and other aspects of life need to be de-pretensioned (if that’s a word).
Many people are fine with a “knock-off” until they find out it’s a knock-off. I accept the phenomenon and feelings are real, but doesn’t make it any less absurd.
It harkens to things like - sommeliers not being able to tell red wine from red tinted wine, when double blinded in tests, or things like the knock-off oil painting capital of the world, Dafen Oil Painting Village, or knock off Eames Chairs that still sell for thousands of dollars. People are weird and love the stories we tell ourselves to give art and our lives value.
I'm sympathetic to the philosophical argument but I'm not making that argument, I'm making a practical argument. For one thing it seems like for these machines to create better and better art - more expressive, more skilled, more natural to the human eye, closer to the request - requires more and more art made by actual humans to train it. That's labour which should be fairly compensated.
Second and more broadly, I'm not saying that we might prefer creative works made by humans because the works made by AI are empty and soulless, I'm saying we might prefer it because there are humans who can do it and want to be paid to do it. I'm very, very sceptical that the human labour freed up from market demands for art will be employed for equal or comparable pay to that given for the artwork. Artists have a variety of knowledge and skills which they'd probably prefer to use to get paid, than learning to code or whatever. In this future world where AI makes all art, no person could get paid for making art even if they wanted to and were good enough. Of course they could make art as a hobby and for a lot of people, that's enough - a whole lot of artists are already in this position, whatever their skillset, where they have to make art in the time they aren't working. But we'd lose all the artists who are, through the ability to sell their art, able to make art their full-time vocation. I think it is good that artists can get paid to make art.
(I wonder what would happen to art residencies and grants and fellowships and other funds, that allow artists to exercise their creative skill without having to worry about commerce at all? Because I think that's better, and I'm even more sceptical it'd continue to exist when AI can generate art of the highest calibre.)
Finally, of course, these art-making machines will be vastly cheaper to operate than paying people for their labour. Most of those savings are going to accrete to the companies that own them, and most of the AI models will be locked up by those comapnies so that you still have to pay to generate the images. For all the elimination of scarcity of art, it still won't be anything close to free. Because why would it? The goal isn't to usher in an age where anyone can realise their imagination with the help of a computer, it's to make money by fulfilling a demand (in this case, for art) for the highest price with the lowest costs possible. And I'm really dubious how that's any better than the state we're in now where I have to pay a person to make me art I couldn't make myself.
I will say though my actual point wasn't as far as "this is the view we as a society must take about AI producing creative works", it was just that the fact that AI can, or will be able to, replace artists doesn't make it inevitable. We should think about the societal effects it will have, how we want to address them, or if we want to reject them. We shouldn't just let it happen like it's a force of nature, because it's not. To take a quote from the original comment-
Well, maybe for once we should see that it is.
This feels like a very capitalist mindset where you're concerned about people being able to profit off what they like doing. That's a valid concern, but on a societal scale it would be better to free up resources and make things cheaper and easier to produce. Solar panels put coal miners out of the job, and the world is better for it because cheap electricity makes everything more affordable.
Living standards increase as our productivity increases. Based on what I currently know, generative models are completely incapable of handling a full end-to-end film production on their own. There are still humans involved in every step of the process. If anything, my artist friends say the ability to rapidly generate concepts and references has improved their workflow.
Some of the demand that AI tools are meeting was also going unmet before. I'm not a good artist, so I can't quickly make a photorealistic cowboy wizard. With the tools we have now, I can generate some okay concept images that spark my imagination. I wouldn't have paid a freelancer to draw what I want, so no-one lost any work. From what I can tell, furry artists still get plenty of commissions too. If anything, some people value human made art even more now. Handmade things can develop a true luxury market when the cheap stuff becomes widely accessible.
Well then I've completely failed to make myself understood 😅
Just want to spread some awareness on this point. What you're talking about is already here.
I have used ElevenLabs (mentioned by Jeff Geerling in his followup post). They have a Speech to Speech feature. You can provide voice input, and have the AI tech output voice output, but a different voice, including all the intonation, emotion, pace, etc. So, with respect to what I quoted above, you can hire a single (or very, very few) high-calibre voice actor(s), then have them voice all the lines of the N characters in your creative work. A middle-aged male voice actor provides the input, and out comes, say, the voice of the 10-year-old daughter of the main character. Differences between the input voice and output voice don't really matter, whether difference in age, timbre, pitch, or accent.
I’ve been saying for a little while that one of the major things still being figured out in ML-based tech is the UI, and that’s a great example of it starting to get solved!
Right now a lot of platforms are still in “text to X” mode, where X could be speech, image, other text, whatever. Problem is, I’d be hard pressed to accurately direct a human through text alone to capture the nuance of what I wanted, let alone a machine. Visual work needs the ability to sketch ideas or highlight errors or just physically point at the bit to change, motion needs storyboards and camera control and the ability to adjust in 3D space, and voice needs the ability to audibly say “no, try it like this…”.
It’s partly a technical barrier in the sense that the non-text input needs to be understood by the generative model, but that’s one where the research papers and open source demos have been pretty consistently ahead of the commercial products. The next step is getting those technical capabilities into actual usable products rather than proof of concept code and opaque Gradio wrappers - and I think we’re in for another wave of progress in the field as people find ways to interact with these models that reduce the significant number of failures coming from inadequate communication with the underlying models.