This discussion is old hat for the l2t mods, but I'd like to get it written down here on tildes so when the time comes to develop these features we've got a record of all the tricks ready to help...
This discussion is old hat for the l2t mods, but I'd like to get it written down here on tildes so when the time comes to develop these features we've got a record of all the tricks ready to help whoever wants to code it all. It's surprisingly easy to do this now.
First, we're only going to concern ourselves with legal, legit streaming links. That limits the number of sites we need to support to the following...
- Youtube 2. Bandcamp 3. Soundcloud 4. Spotify 5. Google Play
Sure, there are others, but they don't offer free streaming, so they aren't particularly useful for widespread music sharing on social media sites like reddit and tildes. Even on reddit, very little of the music shared ends up coming from pay-for services - it's almost entirely coming from youtube, bandcamp, and soundcloud, in that order. So those are the APIs we need to be dealing with in order to extract useful information. It's also worth noting that over time, some of these will die, and new ones will arise to take their place, and they will change their APIs from time to time breaking services built on top of them.
Yes, sometimes youtube has pirate streams of music. That's their problem to solve, not ours. The closest we could come to 'helping' in that regard would be verifying that the video posted is on the artist and/or label's official channels. This is not easy, but it is possible. Frankly, I don't think it's worth the effort. It's hard to code and will have a messy false positive rate. A lazier solution we've used in listentothis for years is simply having a blacklist for channels that spam/rip/repost artist's music without permission - and we can get you a copy of our blacklist and whitelist if you like, so that isn't starting from scratch.
Getting all the music information about an artist is a two-step process.
The first step is querying the metadata provided by the sites listed above through their API calls. The relevant information we need for this is simply the name of the artist and the name of the track (or album, if it's an album link). There's plenty of other information available (some of which we will want, like the youtube views and various popularity metrics such as plays, scrobbles, listens, monthly listeners, heat indexes) but that information isn't needed unless you intend to start dividing up the music into sub-categories using other ~tilds or #tags. Eventually we will want to do that (subs like listentothis can't really exist without the popularity numbers) but that's a problem for further in the future, once tildes is a lot more active. For now, let's just concentrate on making the sidebar of a music submission take people's breath away.
The second step is using the name of the artist and the name of the track (or album) to lookup information about that artist in public databases.
The motherlode of music data resides in Musicbrainz. This has become the de-facto open-source database of record - you might remember its humble beginnings when it was cddb and freedb, embedded in most cd-ripping tools to provide lookups of the artist/track information. It's grown into a wikipediesque monster since then. It knows almost everything there is to know about every artist who has ever released so much as a single or an EP, and does well even for obscure and new independent artists. It's also being updated by-the-minute with new artist information.
Musicbrainz has a public API, and they allow dozens of queries per minute, so it would be possible to use their free service - but I think that's the wrong way to do this. Musicbrainz does allow you to set up your own copy of their database, and provides scripts to download nightly updates of the data, so it's possible to run this locally. For a hassle-free setup, they do provide a virtual machine that's ready to go, just download and boot it up on your network. The VM also has their full API and web services (looks exactly like their official site), so with the VM you can query it locally through the API just like you would using the remote site (and you could have a failover between the local copy and the official site). Running just a local copy of the database rather than the VM, you won't get the API. The database is around 30GB right now and grows very slowly. Musicbrainz local copies also provide the option of querying their SQL directly, without the need to use the API.
What data can we get out of this monster database?
- Artist search with a confidence rating for best match
- Complete discography and artist bio in excruciating detail
- A fantastic collection of every relevant link to other sites
- The most relevant collection of 'genre tags' available anywhere
Let's also not forget they come with an army of developers and a great support forum. I think the case for using musicbrainz as tildes' prime music authority kinda makes itself. :P
There's really no need for another data source. Musicbrainz doesn't do popularity numbers yet but they are planning to do it soon. The Listenbrainz project is, basically, an attempt to reinvent last.fm as an open-source service. Last.fm itself isn't likely to survive, they've been struggling financially for several years. Listenbrainz hopes to allow people to import that data before the site goes under.
So what do we build out of this mountain of data? Easy - the laziest submission process for music anywhere on the internet.
I think the goal here for the users is to be simply pasting a music link into the submission form, and letting tildes do all of the rest of the work for them. The tags and the title can be auto-populated by the lookup, and then tweaked by the user. That'll give a sense of uniformity to the titles, and it makes submitting on mobile almost effortless.
Once the submission is created, the sidebar can be populated with the musicbrainz information. I think a good start would be to show the name of the artist, the name of the album, the name of the track (if applicable), and the release year - possibly even the record label and genre tags (big bucket generic tags like rock, jazz, folk, nothing overly specific). I'd follow that up with the relevant artist links to their own website, their official bandcamp/youtube/twitter/facebook, and possibly the links to discogz and lyric wiki if present. I'd close it with the artist's bio - just a blurb, that ends with a 'read more on wikipedia' link.
If/when we have all of this working, we can worry about the next step - finding a way to determine the relative popularity of any given submission. That's a far, far harder problem to solve.
Here's a quick link listing to all of the relevant APIs and their documentation for easy reference.
- Musicbrainz XML API
- Youtube API
- Bandcamp API
- Soundcloud API
- Spotify / Echonest API (second only to musicbrainz for raw amount of data)
- Google Play API
Edit: Also, we can run a local copy of the Discogs database which will give us even more. If we have Musicbrainz and Discogs local, that's almost everything without the need to have Tildes connecting to other sites.