13 votes

How Do I Make A Database?

Hello everyone!

I've recently got an idea for a Database as a Service I'd like to create. The only issue is - I don't know how to create or host a database!

I've only ever used Mongoose/mLab with Javascript, and a minimal amount of Postgres with Python.

If I'm looking to create a database that will, eventually, be able to store images, songs, and videos, where should I start my homework?

I can create the backend and the frontend with no issue - just stuck on this part here. If it's of any relevance, I most frequently use the MERN stack.

10 comments

  1. [3]
    spit-evil-olive-tips
    Link
    Answering your question sideways - you shouldn't make a database. And you don't actually want to make a database. Writing an actual database is a ridiculous amount of work, and usually only done...

    Answering your question sideways - you shouldn't make a database. And you don't actually want to make a database.

    Writing an actual database is a ridiculous amount of work, and usually only done by masochistic systems people (pdf). If you wanted to do it for your own self-education or fun, by all means go for it, but if your real goal is to build a service that happens to use a database, there are much easier ways to go about it.

    If I'm looking to create a database that will, eventually, be able to store images, songs, and videos

    What you're looking for is a specialized type of database commonly called an object store. These specialize in mapping a small string key (like a filename or a path) to a large binary blob (the file contents). If you've ever used Amazon S3, that's one example of such a system.

    Some relational databases can fulfill this purpose, as well. Postgres, for example, has a system called TOAST that allows storing large file-like blobs directly in the database.

    So you've got a database backend (I'd recommend Postgres; don't get me started on MongoDB) and you store all the normal data you'd need for your service there. Then, for your files, you can either store them directly in Postgres, using TOAST columns, or you can store them in a separate object store, such as S3 or Minio, then store a pointer / breadcrumb to the file in a Postgres column.

    So for the latter approach (Postgres + an object store) you end up with something like:

    > select * from my_files;
    file_id  name     path
    1        foo.txt  s3://some_bucket/1
    2        bar.jpg  s3://some_bucket/2
    

    Then your service code knows that when it adds a row to my_files, it first needs to upload the file to the object store and save the URL to Postgres, and when it wants to read a file, it needs to look up the URL in Postgres then pull it from the object store.

    The downside of this approach compared to TOAST columns is that you now have distributed transaction problems (you have to make sure that if you delete a row from Postgres, you also delete the corresponding file from S3, and so on). The downside of the TOAST approach is that your database size will grow quite quickly - if you have 1 TB worth of files, you probably have 1 GB or less of actual non-file metadata describing those files, but you've gotta pay for the full 1 TB of database space. And everything else being equal, 1 TB of storage in S3 or another object store will be significantly cheaper than 1 TB of Postgres capacity.

    12 votes
    1. Greg
      Link Parent
      This. This is the right answer to both the question asked and the question implied! Having had to solve this exact problem in the past: define that the database is always the source of truth. Use...

      This. This is the right answer to both the question asked and the question implied!

      The downside of this approach compared to TOAST columns is that you now have distributed transaction problems (you have to make sure that if you delete a row from Postgres, you also delete the corresponding file from S3, and so on).

      Having had to solve this exact problem in the past: define that the database is always the source of truth. Use UUID keys on S3 to avoid collisions, never modify or overwrite files in-place (just upload under a new key and update Postgres on success), try to sync deletions where possible (only after the Postgres transaction commits), and run an async task daily to catch and remove any leftover S3 files that get missed. Basically, a DIY eventual consistency model!

      And everything else being equal, 1 TB of storage in S3 or another object store will be significantly cheaper than 1 TB of Postgres capacity.

      At least a 5x price difference per GB, much greater I/O load on your DB disk (which can kill query performance), and increased management overhead and complexity on your DB server. I'd strongly suggest using a dedicated object store for the actual files :)

      5 votes
    2. OhHaiMark
      Link Parent
      I think this is the answer I was looking for! Thanks for taking the time to go into depth with it. I'm somewhat familiar with the S3 approach, as that's how I store, for example, user profile...

      I think this is the answer I was looking for! Thanks for taking the time to go into depth with it.

      I'm somewhat familiar with the S3 approach, as that's how I store, for example, user profile pictures for my social apps.

      I had assumed I was going to need to go into some heavy research for this project - but it seems like I'll just be able to use AWS after all! Thanks for the tips :)

      3 votes
  2. [3]
    the_loophole
    Link
    Doing an optimized database is VERY difficult and usually takes years to make something usable in production. To do something that works fast, you will have to use compiled languages, not...

    Doing an optimized database is VERY difficult and usually takes years to make something usable in production. To do something that works fast, you will have to use compiled languages, not interpreted ones.

    I think I can give you further advice if you tell me what type of database do you want to create (relational, key-value store, graph database, etc.) and what is the most important, what needs to be very fast.

    8 votes
    1. [2]
      Naethure
      Link Parent
      That's...not really true. It's true that getting a database fully optimized is a skill and can be difficult, but the degree to which database optimization is necessary is going to vary greatly by...

      That's...not really true.

      It's true that getting a database fully optimized is a skill and can be difficult, but the degree to which database optimization is necessary is going to vary greatly by use-case. You can definitely get an unoptimized database to perform just fine, especially for a proof-of-concept or small release. There are small optimizations that you can then do to push it significantly farther, and unless you're doing something really stupid, by the time you grow to the point that you'll need extreme optimizations, you can probably afford to spend some time optimizing it (or hire someone who is a database expert). It may surprise you just how much headroom you can get simply by increasing the database server's processing power.

      The compiled vs interpreted language thing is completely incorrect. That has absolutely no bearing on the database, which is accessed via remote calls (either to a different server or at least a different service on the same server). While it's typically true that you can get more performance out of a compiled language when fully optimizing, you significantly overstate the effect. With the exception of some extremely high-performance use-cases and applications with significant resource constraints, performance isn't really a good reason to choose between an interpreted and compiled language per se, especially with the efficiency of modern interpreters (and JIT compilers).

      There's honestly not enough information in this post to start focusing on speed like you are. And the correct answer isn't to pick a type of database and then try to make it work for this purpose, it's to better understand the use-case in order to properly select a type of database.

      But all of this answers a different question to the one OP seems to have asked, which, at least from my read, is just about how to actually host a database! And the answer to that depends entirely on how OP is hosting the webserver. So, @OhHaiMark, to answer that question: Azure or AWS can be excellent options if you're considering hosting in the cloud (this has a lot of benefits, especially to scalability, redundancy, and maintenance/development costs, and is the path I'd most recommend). If you're using some web hosting provider to host the frontend, they may also have a database hosting option (though I wouldn't recommend this option in the long-run). If you are hosting it on a server you control, you would simply download and install the database much like you did your webserver. MERN as a stack actually contains a database: MongoDB, so any guide to setting up a MERN stack should walk you through installing, creating, and connecting to it.

      13 votes
      1. Naethure
        (edited )
        Link Parent
        Also, an extra note to try to answer the root question rather than the database-specific one is that a database may not be the best solution to store files like images, music, or videos. You may...

        Also, an extra note to try to answer the root question rather than the database-specific one is that a database may not be the best solution to store files like images, music, or videos. You may want a hybrid solution with a database containing a file name reference and the relevant metadata your service needs (owner, permissions, timestamps, user-friendly name, description, etc.) and using either files or some form of cloud storage (Azure Blob/Files, AWS S3, etc.) to store the actual files, especially if they can be large.

        8 votes
  3. [2]
    meghan
    Link
    Just for clarification, are you literally asking "how do I make a database" or are you asking "how do I make a database SaaS"?

    Just for clarification, are you literally asking "how do I make a database" or are you asking "how do I make a database SaaS"?

    6 votes
    1. OhHaiMark
      Link Parent
      The former. I can handle building out the rest of the service, and all the stuff on the business-end (should I ever decide to actually pursue this beyond a passion-project). Just unsure of how to...

      The former. I can handle building out the rest of the service, and all the stuff on the business-end (should I ever decide to actually pursue this beyond a passion-project). Just unsure of how to build up the database type I need.

      Ideally, I'm building something akin to Cloudinary which would allow users to store images and videos on my service. I like how stored images are viewable on their console, and the image-transformation services they offer.

      I'm unsure if storing the raw files on my end would be preferable, or if I should just back the service with something like AWS.

  4. mftrhu
    Link
    As in, making a DB engine from scratch, or do you just want to set up a database server? The former is not exactly easy, and you'll want to take a look at this tutorial, and at 500 lines or less -...

    As in, making a DB engine from scratch, or do you just want to set up a database server?

    The former is not exactly easy, and you'll want to take a look at this tutorial, and at 500 lines or less - they implemented different kinds of simple databases in chapters 6, 7 and 10.

    If the latter, what do you need from it? Postgres is fine, but you might just be able to make use of SQLite if your app is not heavy on the writing, which doesn't require you to set up anything - and, as someone else said already, you probably don't want to store large files inside of it. There's already the filesystem for that.

    If I'm looking to create a database that will, eventually, be able to store images, songs, and videos, where should I start my homework?

    If you just want to store them, you'll need a schema - what information about them do you want to keep? What relationships would there be? Do you want to be able to tag them? Put them into categories? Will these categories be flat or hierarchical?

    Again, what do you need/want to do with it?

    4 votes
  5. vakieh
    Link
    If you've used MERN, you've already hosted a database. You just delete everything that returns an actual web page and badda bing badda boom you've made a DaaS. A shitty one, but one all the same....

    If you've used MERN, you've already hosted a database. You just delete everything that returns an actual web page and badda bing badda boom you've made a DaaS. A shitty one, but one all the same.

    Your next question needs to be 'how do I make a good DaaS'. And for that my only answer is 'go to university' (and any answer that does not start here is feeding you bullshit), followed with 'implement 100 shitty attempts', followed with 'gather a team with the necessary skills' - because a good DaaS needs to handle scalability and security to a point where you aren't going to find the right skills in a single person unless that person has some solid industry experience under their belt.

    2 votes