5 votes

How to design a database?

Posted August 15, 2020 by milkbones_4_bigelow

Tags: data, database design, sqlite3, python, music, ask.advice

I'm working on an application that allows a user to view playlists belonging to a particular radio show and stream/download/favourite the tracks in them. It has 4 core entities: User, Show, Playlist and Track.

Each show has multiple playlists (one-to-many)
Each playlist has multiple tracks (one-to-many)

To be able to reference a playlist belonging to a particular show. I gave those playlists the same uuid as the show they belong to. A few questions though.

Is this the right/best way to associate data?
As a track could potentially belong to multiple playlists, I can't take the same approach as I do for (show/playlist) How would be best to handle this? Ideally I would like to have a single "Track" table containing all tracks for all playlists.

For any experienced database designers out there, how would you structure this data? What would you consider in designing the schema and why? If I did go with 4 tables only, presumably there would be performance implications given the potential amount of data in any one of those tables, particularly tracks. If that is the case, how best to structure this kind of thing with performance in mind? Thanks in advance for any help :)

For reference, in case it's of importance, I'm using sqlite3.

12 comments

stu2b50
August 15, 2020 (edited August 15, 2020)
Link
Show and playlist being one to many seems fine. Playlist to track should be a many-to-many relationship. If you're using an ORM, usually they do this boilerplate for you; otherwise, manually you...

Show and playlist being one to many seems fine.

Playlist to track should be a many-to-many relationship. If you're using an ORM, usually they do this boilerplate for you; otherwise, manually you make another table, in which each entry is a relationship (i.e a row would be a primary key, a Playlist foreign key, a track foreign key).

f I did go with 4 tables only, presumably there would be performance implications given the potential amount of data in any one of those tables, particularly tracks.

Trust me, there won't be. If there any, add an index. There are few things in this world more rigorously optimized than SQL engines at this point.

There is no way there would ever be enough tracks in your table to ever reach any limitations.

edit:

Just as an anecdote with regards to performance, once when I helped out on lab code, there was 3 joins between 3 tables each with 500 million+ rows, one with 1.5 billion, on sqlite3. After I made on index on the 3 columns that were being joined, queries took < 0.2 seconds.

And that was on some dude's personal computer.

Trust me, do not worry about performance.

edit2:

The bigger issue w.r.g to performance is if OP is intending for this to be used by multiple users. In which case he should switch to Postgres or something other than sqlite. Sqlite has single-threaded writes. It's fine if you're almost always doing reads (which can be threaded).

9 votes
[2]
Flashynuff
August 15, 2020 (edited August 15, 2020)
Link
You mean as a show_id column on the playlist table, right? I would hope the playlists have their own uuid that is different from a show's uuid. @stu2b50 is right, it should be a many-to-many...
To be able to reference a playlist belonging to a particular show. I gave those playlists the same uuid as the show they belong to.

You mean as a show_id column on the playlist table, right? I would hope the playlists have their own uuid that is different from a show's uuid.

As a track could potentially belong to multiple playlists, I can't take the same approach as I do for (show/playlist) How would be best to handle this? Ideally I would like to have a single "Track" table containing all tracks for all playlists.

@stu2b50 is right, it should be a many-to-many relationship. You would keep your Track and Playlist tables, and then you would set up another Playlist_Track table to define the mapping between them. This is sometimes called a 'linker' or a 'joiner' table. You would have at least two columns, one for the playlist id and then one for the track id. Each row would represent a track belonging to a playlist, and you would be able to have as many entries as you like. You'll probably want to set up the primary key for this table as a multiple column primary key to avoid duplicate entries (unless you want duplicate entries).
```
CREATE TABLE playlist_track (
  playlist_id NOT NULL, 
  track_id NOT NULL, 
  PRIMARY KEY (playlist_id, track_id)
);
```
@stu2b50 is also right on performance not mattering -- unless you know you'll need to worry about it, you don't need to worry about it.

Edit: the points other have made about there being "no one right way" are also correct. There are as many ways as you can imagine to structure data. However, for the use case you've described, your approach will work perfectly fine until you find it doesn't.
2 votes
1. archevel
  August 16, 2020
  Link Parent
  Personally I'd probably let the playlist_track table have its own id column. That way a single playlist could have the same song twice. It would also open the possibility of adding more associated...
  
  Personally I'd probably let the playlist_track table have its own id column. That way a single playlist could have the same song twice. It would also open the possibility of adding more associated data to the playlist tracks (eg. at what point in the show was a particular track played).
  
  Alternatively, another less normalized approach would be to just let there be multiple track rows for any given real world song. That would simplify the schema a bit, but would make it harder to make queries like "how many times have song X been played by the different shows?".
  
  3 votes
vord
August 15, 2020
Link
There's no one right way. You could continue with your design, a traditional relational one, by having a many-to-many relationship between Playlists and Tracks. However, depending on your use...

There's no one right way. You could continue with your design, a traditional relational one, by having a many-to-many relationship between Playlists and Tracks.

However, depending on your use case, traditional relationships can be tricky for expanding functionality. Suppose you want to add genre for both playlists and tracks, where each could be in multiple genres. You then have two additional many to many relationships, and things start to get hairy for query complexity and performance.

Enter NoSQL databases. There are many varients, but the common theme is being able to query and index in an unstructured fashion. You could store each playlist as a document with no formal relationship to a track, and then in the user document store a list of their playlists. It can improve performance, but reporting ala traditional RDBMS can be tricky sometimes.

Enter graph databases, which are a sub-set of NoSQL ones. The query language they use is designed primarily around traversing relationships. This is most useful when the relationship between points of data is just as, or more important than the data itself.

There's no one right answer. With clever design, you can do anything in any of them. For your use case, you could likely use any of the three.

One point would be if data integrity is especially important, ACID compliance is very important...don't want accidental losses of data. Initially NoSQL databases had this problem, but that is being rectified by some now.

1 vote
[5]
milkbones_4_bigelow (OP)
August 16, 2020 (edited August 16, 2020)
Link
Thanks everyone for the insightful feedback. It's nice to have a community of more experienced devs to talk to. I very much appreciate that the Tildes community fosters this kind of respectful...

Thanks everyone for the insightful feedback. It's nice to have a community of more experienced devs to talk to. I very much appreciate that the Tildes community fosters this kind of respectful discourse.

Edit: If this sort of post is better off elsewhere, I'm also happy to shift the discussion there. Perhaps something like exercism.io is a better fit? Is this a space devs would rather not be asked questions in?

Thanks again for the support and offering areas for me to go off and explore on my own and thanks in particular to @stu2b50 and @Flashynuff for the schema suggestions.

1 vote
1. [4]
  Flashynuff
  August 17, 2020
  Link Parent
  I think this is a perfectly fine space to ask questions like this. Shouldn't this be PRIMARY KEY(`user_uuid`)? It seems like this is saying you can only have track per playlist in this table. Is...
  
  Perhaps this is a space devs would rather not be asked questions in?
  
  I think this is a perfectly fine space to ask questions like this.
  
  PRIMARY KEY(`uuid`)
  
  Shouldn't this be PRIMARY KEY(`user_uuid`)?
  
  `playlist_uuid` TEXT NOT NULL UNIQUE
  
  It seems like this is saying you can only have track per playlist in this table. Is UNIQUE really what you want here?
  
  1 vote
  1. [3]
    milkbones_4_bigelow (OP)
    August 17, 2020 (edited August 17, 2020)
    Link Parent
    Ok great, I don't want to bug people. You're right it should be user_uuid, corrected, good catch :) I'm trying to express that each playlist for a given show should have a unique uuid that is not...
    
    I think this is a perfectly fine space to ask questions like this.
    
    Ok great, I don't want to bug people.
    
    You're right it should be user_uuid, corrected, good catch :)
    
    playlist_uuid TEXT NOT NULL UNIQUE
    
    I'm trying to express that each playlist for a given show should have a unique uuid that is not null. Is this incorrect?
    
    [2]
    Flashynuff
    August 17, 2020
    Link Parent
    Sorry, I should have been more specific. That is correct for the playlist table. However, you had something like the following for the playlist_track table (it looks like you've removed your code,...
    
    Sorry, I should have been more specific. That is correct for the playlist table. However, you had something like the following for the playlist_track table (it looks like you've removed your code, so not 100% sure what you had):
    
    CREATE TABLE playlist_track ( track_uuid TEXT NOT NULL, playlist_uuid TEXT NOT NULL UNIQUE );
    
    which would make it so you can only ever create the playlist:track relation once per playlist.
    
    milkbones_4_bigelow (OP)
    August 17, 2020
    Link Parent
    Ah ok, yeah, this is not what I want as the same track could potential belong to multiple different playlists. Thanks for you help :)
    
    I'm trying to express that each playlist for a given show should have a unique uuid that is not null. Is this incorrect?
    
    Ah ok, yeah, this is not what I want as the same track could potential belong to multiple different playlists. Thanks for you help :)
    
    1 vote
[3]
Akir
August 15, 2020 (edited August 15, 2020)
Link
There is no universally correct answer, but I would read through the manual for the particular database you are using as it occasionally has some hints as to what would offer the best performance...

There is no universally correct answer, but I would read through the manual for the particular database you are using as it occasionally has some hints as to what would offer the best performance (which, in many cases, also means choosing the right datatype for the specific job). I don't have any experience with sqlite, so I won't be able to help you outside of generic advice.

First, I assume you already know about normalization and also know why you generally don't want to make the data super granular.

Without knowing anything about your data, I would assume it would be easiest to store Show, Playlist, and Track in the same table. You'll be repeating a lot of data, especially if tracks are repeated in a playlist, but it makes querying very easy and generally fast. You may also want to index the Show and Playlist columns since it seems like those are the ones you'll be matching against the most often.

Edit: for the second time in a post by you, I find myself feeling @stu2b50's advice is probably a bit better than mine....
1. [2]
  stu2b50
  August 15, 2020
  Link Parent
  I mean at that point you may as well switch to a NoSQL database if you're going to just avoid the schema entirely. OP is going to be fine lol with whatever granularity he chooses. I highly doubt...
  
  Without knowing anything about your data, I would assume it would be easiest to store Show, Playlist, and Track in the same table.
  
  I mean at that point you may as well switch to a NoSQL database if you're going to just avoid the schema entirely.
  
  OP is going to be fine lol with whatever granularity he chooses. I highly doubt he's going to have more than 1 million rows total.
  
  2 votes
  1. Akir
    August 15, 2020
    Link Parent
    Sometimes I forget that such things exist. :P
    
    I highly doubt he's going to have more than 1 million rows total.
    
    Sometimes I forget that such things exist. :P
    
    3 votes