34
votes
Meta is releasing AudioCraft: Generative AI for audio made simple and available to all
Link information
This data is scraped automatically and may be incorrect.
- Title
- AudioCraft: A simple one-stop shop for audio modeling
- Word count
- 1543 words
Say what you will about Facebook but I gotta give them props for open-sourcing so much of their stuff.
Facebook are the good guys now. Investing in cool tech and open source while Google tries to lock down the web.
No.
https://www.businessinsider.com/meta-exec-said-rebrand-succeeded-in-beating-facebook-papers-coverage-2023-7
https://time.com/6299743/threads-data-collection-privacy/
https://www.mediamatters.org/facebook/meta-still-allowing-misinformation-and-hate-speech-proliferate-threads
Google being the bad guy doesn't make Facebook the good guy.
But Facebook is the good guy. As if Google and friends do not also collect your data.
Facebook itself still sucks but these open source models are the most exciting thing in tech right now while every other company is slamming doors shut.
I'm sure they'll stop eventually, but for not they're on a roll.
If open-sourcing code is your standard of who the "good guys" are, Google, Microsoft, Facebook, and all of the other FAANG companies outside of Apple have been open-sourcing stuff for decades. Android's entire operating system is open source, which is made by Google.
I'm not saying this to prove Google is good, I'm saying "megacorps open sourcing code doesn't inherently make them good guys".
I'm a pretty vocal critic of Apple and their lack of contributions to OSS, but they have been pretty good custodians of CUPS
Even the worst OSS citizen can have a few bright lights. All of the MANGA companies are bad...and a lot of other companies too.
Open sourcing super important AI models that could be worth literally millions if wrapped behind an API.
Working on things that are actually interesting (VR) instead of continually focusing on boring "make more money" projects.
Facebook is the little guy who has everything to lose right now and wants to disrupt the market. They're the Google of the oughts against Microsoft.
Facebook's platform and complete lack of moderation assisted in the genocide of the Rohingya people in Myanmar.
https://en.wikipedia.org/wiki/Rohingya_genocide#Facebook_controversy
I'm begging you to reconsider your misguided stance on Facebook.
You should assign far more blame the actual people who committed genocide and not the internet platform.
Facebook probably could have made a dent, but they aren't country makers and the same thing likely would have happened regardless.
Working on interesting, innovative technology does not make you the good guy. I am sure there are lots of bright people working at Palantir inventing exciting new ways to surveil people.
The tech Facebook is making - VR and Open Source AI - gives access to people that they would've have otherwise had and improves lives by pressing forward instead of restricting them.
https://www.youtube.com/watch?v=KW64FiB0ITg
It's really not that revolutionary. They spent billions on something that's not as good as VRChat.
The quest itself is the big deal. Meta - up until the apple headset that is honestly a non starter due to price, was the only player making real moves in VR for the last two or three years.
You realize Meta acquired Oculus (the makers of Quest headsets) and have kinda stagnated on the hardware since?
I don't recall any revolutionary hardware changes in the past 2-3 years.
The new quest was announced not too long ago as well as the quest pro (which admittedly was an overpriced flop, but at least it was a try)
The original Quest and Quest 2 were actually pretty big hardware improvements, and that was after the acquisition. Their price point, the camera based tracking (meaning you don't need base stations or any other set up), and the ability to run software on device (in addition to being used to display from another device) was a big UX difference for consumers, and accordingly the Quest and Quest 2 are the best selling VR headsets by a large margin.
The audio samples in the blog post are impressive. Granted, I only listened to them through my smartphone speakers, but they definitely sounded like what their text prompts described. I wonder how Foley artists might feel about this.
As someone who takes voice acting gigs on the side, I can admit that generative AI is a looming concern. And having come from a call center job which lost its client due to Google speech recognition, I feel safe to say that that concern is well-founded.
I think voice acting is by far the occupation with the highest actual threat from generative machine learning. Generative voices are very good already, and will only get better. Furthermore, there is already legal precedent that voices cannot be copyrighted - and due to just how they work, it is unlikely they ever will be. So it will be a free for all.
Unlike with other mediums, generative voices can be done with very few samples due to the high frequency and dimensionality of audio. Just a 10s clip can get you startling close results.
Additionally, voice acting is one of the most "practical" of the performance fields - not that there isn't important roles where people care, but compared to other areas there's a litany of roles in an animated show or a video game where quite frankly most of the audience is barely paying attention and will not notice any issues.
I don't think there's really any recourse there. It is what it is. With powerful discriminative models, voices are just another instrument, and in the same way a midi keyboard can pretend to be a piano.
I'd be interested to read more on this. My first thought, because IANAL and I'm just spitballing, is of Tom Waits' successful suit over a sound-alike used in an ad.
Do you happen to know what the precedent is?
There's a lot of case law if you google for it. Example article: https://www.documentjournal.com/2023/04/ai-music-the-weeknd-drake-heart-on-my-sleeve-vocal-deepfake-universal-music-group-umg/#:~:text=%E2%80%9CYou%20cannot%20copyright%20a%20voice,comes%20down%20to%20personality%20rights.
The latter is what the Tom Wait's case is. That wasn't a copyright case, that was an impersonation and defamation case.
When you also think about it, how could you possibly copyright a voice? A voice isn't a specific audio recording. It's an entire class of audio. How can you say that a particular recording of audio is someone's voice, and not someone else's voice? You can change your voice - do you also own all possible voices you can perform? If two people have the same voice, does the oldest have copyright and everyone else automatically in violation?
It's like if you invented an instrument and tried to claim that you own all possible audio recordings of it.
I'm a professional audio engineer and have seen a lot of wild speculation about how different generative AI will impact my profession, but have considered very little of it to be legitimately concerning. This is the closest thing I've seen to raise my concerns, even though I still believe it wont replace me as a professional.
These types of tools are amazing, they'll shave off audio engineers who aren't competent in their use for clients, but there is an enormous gap between sitting in a studio with an engineer or even just discussing a track on the phone and typing descriptors out for an AI. SO much of my success with clients depends on my ability to interpret what that specific client means when they say "Earthy." Because the meaning of these descriptors is largely at the whim of however the client interprets them. So, yes, this will replace the work for engineers who make music no one cares about behind infomercials or under dialogue in a trash tv show, but there is waaaaaaaaaaaaaaay too much nit picking and preference involved in any serious sound development for this tool to bridge the gap and put most engineers out of business.
I will continue to learn how to use these types of tools, because they're like any other tool that modernizes a profession. The professionals who learn to use them keep working, the ones who don't find something else to do.
I'll believe it after I test it myself. (Any product salesperson can spin and tailor results to make something seem good.)
(I realize there's a github repo with source code and a README)
As an amateur musician I’m wondering if there are any of these audio generators that you can give a tune or chord progression to riff on. ABC format maybe? There are lots of folk tunes on thesession.org.
Could you generate a rhythm track? Ideally you could make songs one track at a time, by giving it a mix of the previous tracks and asking it to make another track for an instrument. Or, give it a track and ask it to do some kind of effect on it.
Another interesting use might be generating sound samples for a sampled instrument.
Generating an entire song is of little interest to me (there is no shortage of music on Spotify), but it seems like there are lots of places where a tool like this might help?