How to design a database?
I'm working on an application that allows a user to view playlists belonging to a particular radio show and stream/download/favourite the tracks in them. It has 4 core entities: User, Show, Playlist and Track.
- Each show has multiple playlists (one-to-many)
- Each playlist has multiple tracks (one-to-many)
To be able to reference a playlist belonging to a particular show. I gave those playlists the same uuid as the show they belong to. A few questions though.
- Is this the right/best way to associate data?
- As a track could potentially belong to multiple playlists, I can't take the same approach as I do for (show/playlist) How would be best to handle this? Ideally I would like to have a single "Track" table containing all tracks for all playlists.
For any experienced database designers out there, how would you structure this data? What would you consider in designing the schema and why? If I did go with 4 tables only, presumably there would be performance implications given the potential amount of data in any one of those tables, particularly tracks. If that is the case, how best to structure this kind of thing with performance in mind? Thanks in advance for any help :)
For reference, in case it's of importance, I'm using sqlite3.
Show and playlist being one to many seems fine.
Playlist to track should be a many-to-many relationship. If you're using an ORM, usually they do this boilerplate for you; otherwise, manually you make another table, in which each entry is a relationship (i.e a row would be a primary key, a Playlist foreign key, a track foreign key).
Trust me, there won't be. If there any, add an index. There are few things in this world more rigorously optimized than SQL engines at this point.
There is no way there would ever be enough tracks in your table to ever reach any limitations.
edit:
Just as an anecdote with regards to performance, once when I helped out on lab code, there was 3 joins between 3 tables each with 500 million+ rows, one with 1.5 billion, on sqlite3. After I made on index on the 3 columns that were being joined, queries took < 0.2 seconds.
And that was on some dude's personal computer.
Trust me, do not worry about performance.
edit2:
The bigger issue w.r.g to performance is if OP is intending for this to be used by multiple users. In which case he should switch to Postgres or something other than sqlite. Sqlite has single-threaded writes. It's fine if you're almost always doing reads (which can be threaded).
You mean as a show_id column on the playlist table, right? I would hope the playlists have their own uuid that is different from a show's uuid.
@stu2b50 is right, it should be a many-to-many relationship. You would keep your Track and Playlist tables, and then you would set up another Playlist_Track table to define the mapping between them. This is sometimes called a 'linker' or a 'joiner' table. You would have at least two columns, one for the playlist id and then one for the track id. Each row would represent a track belonging to a playlist, and you would be able to have as many entries as you like. You'll probably want to set up the primary key for this table as a multiple column primary key to avoid duplicate entries (unless you want duplicate entries).
@stu2b50 is also right on performance not mattering -- unless you know you'll need to worry about it, you don't need to worry about it.
Edit: the points other have made about there being "no one right way" are also correct. There are as many ways as you can imagine to structure data. However, for the use case you've described, your approach will work perfectly fine until you find it doesn't.
Personally I'd probably let the
playlist_track
table have its own id column. That way a single playlist could have the same song twice. It would also open the possibility of adding more associated data to the playlist tracks (eg. at what point in the show was a particular track played).Alternatively, another less normalized approach would be to just let there be multiple track rows for any given real world song. That would simplify the schema a bit, but would make it harder to make queries like "how many times have song X been played by the different shows?".
There's no one right way. You could continue with your design, a traditional relational one, by having a many-to-many relationship between Playlists and Tracks.
However, depending on your use case, traditional relationships can be tricky for expanding functionality. Suppose you want to add genre for both playlists and tracks, where each could be in multiple genres. You then have two additional many to many relationships, and things start to get hairy for query complexity and performance.
Enter NoSQL databases. There are many varients, but the common theme is being able to query and index in an unstructured fashion. You could store each playlist as a document with no formal relationship to a track, and then in the user document store a list of their playlists. It can improve performance, but reporting ala traditional RDBMS can be tricky sometimes.
Enter graph databases, which are a sub-set of NoSQL ones. The query language they use is designed primarily around traversing relationships. This is most useful when the relationship between points of data is just as, or more important than the data itself.
There's no one right answer. With clever design, you can do anything in any of them. For your use case, you could likely use any of the three.
One point would be if data integrity is especially important, ACID compliance is very important...don't want accidental losses of data. Initially NoSQL databases had this problem, but that is being rectified by some now.
Thanks everyone for the insightful feedback. It's nice to have a community of more experienced devs to talk to. I very much appreciate that the Tildes community fosters this kind of respectful discourse.
Edit: If this sort of post is better off elsewhere, I'm also happy to shift the discussion there. Perhaps something like exercism.io is a better fit? Is this a space devs would rather not be asked questions in?
Thanks again for the support and offering areas for me to go off and explore on my own and thanks in particular to @stu2b50 and @Flashynuff for the schema suggestions.
I think this is a perfectly fine space to ask questions like this.
Shouldn't this be
PRIMARY KEY(`user_uuid`)
?It seems like this is saying you can only have track per playlist in this table. Is
UNIQUE
really what you want here?Ok great, I don't want to bug people.
You're right it should be user_uuid, corrected, good catch :)
I'm trying to express that each playlist for a given show should have a unique uuid that is not null. Is this incorrect?
Sorry, I should have been more specific. That is correct for the
playlist
table. However, you had something like the following for the playlist_track table (it looks like you've removed your code, so not 100% sure what you had):which would make it so you can only ever create the playlist:track relation once per playlist.
Ah ok, yeah, this is not what I want as the same track could potential belong to multiple different playlists. Thanks for you help :)
There is no universally correct answer, but I would read through the manual for the particular database you are using as it occasionally has some hints as to what would offer the best performance (which, in many cases, also means choosing the right datatype for the specific job). I don't have any experience with sqlite, so I won't be able to help you outside of generic advice.
First, I assume you already know about normalization and also know why you generally don't want to make the data super granular.
Without knowing anything about your data, I would assume it would be easiest to store Show, Playlist, and Track in the same table. You'll be repeating a lot of data, especially if tracks are repeated in a playlist, but it makes querying very easy and generally fast. You may also want to index the Show and Playlist columns since it seems like those are the ones you'll be matching against the most often.
Edit: for the second time in a post by you, I find myself feeling @stu2b50's advice is probably a bit better than mine....
I mean at that point you may as well switch to a NoSQL database if you're going to just avoid the schema entirely.
OP is going to be fine lol with whatever granularity he chooses. I highly doubt he's going to have more than 1 million rows total.
Sometimes I forget that such things exist. :P