14 votes

Database schema for an upcoming comment hosting system

I'm working on a small and simple comment hosting platform in PHP (SQLITE database) to host comments for my static blog https://prahladyeri.github.io/. No login, sign-ups or third party OAuth, just plain old Wordpress.org style commenting system.

I have come up with the following DB schema so far to store the comments (comments table) and enable the administrator's dashboard authentication (users table). Can this be improved further?

-- init.sql

drop table if exists comments;
drop table if exists posts;
drop table if exists users;

create table comments (
	id integer primary key,
	reply_to_id integer references comments (id),
	post_id integer references posts (id),
	message text,
	name text,
	email text,
	website text,
	ip text, -- $_SERVER['REMOTE_ADDR']
	notify text default 'n',
	status text default 'Approved', -- Approved/Spam
	created_at datetime default (datetime(CURRENT_TIMESTAMP, 'localtime')),
	modified_at datetime default (datetime(CURRENT_TIMESTAMP, 'localtime'))
);

create table posts (
	id integer primary key,
	user_id integer references users (id),
	uri text -- /blog/2024/05/some-slug.html
);

create table users (
	id integer primary key,
	username text not null,
	password text not null,
	email text, -- can be null
	name text not null,
	website text, -- comments will be posted to this site
	role text not null, -- Admin/Staff
	created_at datetime default (datetime(CURRENT_TIMESTAMP, 'localtime')),
	modified_at datetime default (datetime(CURRENT_TIMESTAMP, 'localtime')),
	unique (username),
	unique (email)
);

-- create default data
-- create a default admin user who handles to dashboard
insert into users(username,password, name, role, website)
values("admin", "admin108", 'Admin', 'Admin', 'https://example.com/');

23 comments

  1. [6]
    csos95
    (edited )
    Link
    Something to be aware of is that in SQLite the types are more of a hint for how to store the data for a column rather than a constraint. You can store any type of data in any type of column...

    Something to be aware of is that in SQLite the types are more of a hint for how to store the data for a column rather than a constraint.
    You can store any type of data in any type of column (unless you specifically use strict tables).
    It will attempt to convert the data to the storage class indicated by the column type used, but if it fails it will store the data as is.
    This page of the SQLite documentation explains this in detail.
    I come back to it often to remind myself of what column types are associated with what affinities.

    Additionally, the numeric parameters on types are ignored.
    So a column of type varchar(255) will store a 300 character string without complaint.
    To add a constraint on the length you need to add a check constraint to the end.
    So instead of name varchar(255) not null, you can do name text not null check(length(name) <= 255) (I generally prefer using the five affinity type names so I don't have to remember the exact rules for affinity mapping).

    EDIT: I just noticed you have two primary keys set for the comments table.
    That won't work.
    The id column should probably be defined as id integer primary key autoincrement and the user_id should be defined as user_id integer not null references users(id).
    The autoincrement prevents rowids from being reused in certain situations and isn't strictly necessary, but I always use it to be safe because it doesn't really affect performance for anything I've used SQLite for.
    Additionally if you ever want to have a primary key that isn't integer (which is a special case that maps to the rowid), you need to add not null to it.
    Otherwise you could end up with every row having a NULL id.
    This is because of two issues:

    1. primary key does not imply not null in SQLite because there was a bug early on and changing it to follow the SQL standard would break programs that relied on that behavior (that's the general reason behind many weird behaviors/default values in SQLite).
    2. In SQL null is not equal to anything, even null, so all of the null primary key values would be considered different.

    Another thing is that you can put the unique constraints directly on the column they refer to when it's just a single column.
    Ex: username text not null unique

    13 votes
    1. [5]
      pyeri
      (edited )
      Link Parent
      That's a good point. If I use strict, a lot of headaches will be over as the engine will perform those checks instead of me doing it at application layer. But only trouble is that strict keyword...

      You can store any type of data in any type of column (unless you specifically use strict tables).

      That's a good point. If I use strict, a lot of headaches will be over as the engine will perform those checks instead of me doing it at application layer. But only trouble is that strict keyword was introduced in 2021, so you lose compatibility with any older versions of Sqlite programs or libraries, they will throw an error opening strict tables. What is the general consensus in this regard, are folks using strict more often in their apps these days? Also, are there any disadvantages or drawbacks of using strict mode?

      That won't work.

      Sorry, that was a copy-pasta mistake while fetching it from an earlier project. This is exactly what I do in this situation but not sure even referential integrity is maintained by Sqlite unless we use the strict mode?

      user_id integer not null references users(id)

      1. csos95
        Link Parent
        Here is the documentation on strict tables. I haven't bothered to use them because I do all my stuff in rust (usually with diesel) and have strong type checking to prevent me from using the wrong...

        Here is the documentation on strict tables.
        I haven't bothered to use them because I do all my stuff in rust (usually with diesel) and have strong type checking to prevent me from using the wrong type by accident.

        but not sure even referential integrity is maintained by Sqlite unless we use the strict mode?

        Strict tables are unrelated to foreign key constraints.
        The default for SQLite (another one of those "backwards compatibility" oddities) is to not enforce foreign key constraints.
        You have to enable it with a pragma statement: pragma foreign_keys = on;
        A full list of pragma statements is available here.

        I usually use journal_mode = wal, synchronous = normal, foreign_keys = on, and busy_timeout = SOME_TIMEOUT_IN_MS in my projects.
        This will use a write ahead log for writes (much faster and makes it so readers and writers (still limited to one writer at a time though) do not block each other), make it do less syncing to the filesystem, enable foreign key checking, and automatically waits if a table is locked with a default busy_handler.

        Something to note for that last option is that it mostly will take care of database is locked errors that occur when you try to read/write when the needed table is locked.
        The one case it doesn't cover is when a deferred transaction (the default) is "upgraded" to a write transaction.
        In this case, if a needed table is locked, it will immediately error instead of using the busy_handler.
        So if you use transactions, you should make them immediate when you are going to be writing.
        Ex: begin immediate transaction
        Transactions documentation here

        3 votes
      2. [3]
        Macha
        Link Parent
        Note that even the older currently supported Ubuntu LTS (22.04) has a SQLite version that supports strict mode: https://packages.ubuntu.com/jammy/libsqlite3-0

        But only trouble is that strict keyword was introduced in 2021

        Note that even the older currently supported Ubuntu LTS (22.04) has a SQLite version that supports strict mode: https://packages.ubuntu.com/jammy/libsqlite3-0

        1 vote
        1. [2]
          pyeri
          Link Parent
          I still use Xubuntu 18.04 LTS on my Linux laptop for programming! I tried the later versions but for some reason the XFCE DE of later LTS versions like 20.04, 22.04, etc. seem less snappy and...

          I still use Xubuntu 18.04 LTS on my Linux laptop for programming! I tried the later versions but for some reason the XFCE DE of later LTS versions like 20.04, 22.04, etc. seem less snappy and perform slower in comparison, hence I have never upgraded since.

          1. Macha
            Link Parent
            18.04 seems inadvisable from a security perspective. What browser versions do you have? e.g. just this month there was a pdf.js issue that affected basically all browsers.

            18.04 seems inadvisable from a security perspective. What browser versions do you have? e.g. just this month there was a pdf.js issue that affected basically all browsers.

            4 votes
  2. [4]
    NaraVara
    Link
    Irrespective of the technical side, I’ll warn you that having an open text box on the internet that isn’t gated by a signup is going to be flooded with spam, scams, and porn basically constantly....

    Irrespective of the technical side, I’ll warn you that having an open text box on the internet that isn’t gated by a signup is going to be flooded with spam, scams, and porn basically constantly. You’re gonna need to be prepared for permanent moderating duty.

    7 votes
    1. [3]
      F13
      Link Parent
      Using a simple captcha like Cloudflare's has reduced the amount of spam I received from roughly daily to none at all. I'm sure it won't prevent everything, especially on a more popular site, but...

      Using a simple captcha like Cloudflare's has reduced the amount of spam I received from roughly daily to none at all. I'm sure it won't prevent everything, especially on a more popular site, but it's a step.

      3 votes
      1. [2]
        pyeri
        Link Parent
        I think using Google's captcha is somewhat better than Cloudflare's as the end user is more likely to be already logged-in to Google systems. With Cloudflare, they have to solve the captcha...

        I think using Google's captcha is somewhat better than Cloudflare's as the end user is more likely to be already logged-in to Google systems. With Cloudflare, they have to solve the captcha riddles which will be an extra step?

        However, Cloudflare shines on the backend very well. Simply putting your domain behind the Cloudflare DNS should prevent a lot of DDOS and network attacks me thinks.

        1. F13
          Link Parent
          In most cases I've seen, the Cloudflare captcha is either completely invisible or just a checkbox. But like most captchas, in some cases, it'll do a puzzle.

          In most cases I've seen, the Cloudflare captcha is either completely invisible or just a checkbox. But like most captchas, in some cases, it'll do a puzzle.

          3 votes
  3. [2]
    tape
    Link
    Using strings for role and status go against my soul, but I recognize it's a small db and it won't matter in the slightest. They did a real good job hammering normal forms into me I guess. Lol...

    Using strings for role and status go against my soul, but I recognize it's a small db and it won't matter in the slightest. They did a real good job hammering normal forms into me I guess. Lol

    Seems good 👍

    5 votes
    1. pyeri
      (edited )
      Link Parent
      These are not regular users but "admin users" like myself who will be administering the dashboard. For one, it puts the dashboard page behind an authentication system. Plus I may want to have a...

      also have a system to allow it since you're storing users?

      These are not regular users but "admin users" like myself who will be administering the dashboard. For one, it puts the dashboard page behind an authentication system. Plus I may want to have a Diqus style feature in future where multiple "admin" users can administer their own site's comments through this single app.

      Would you need a way to tie a comment to a user in the system?...that would stop commenters from mimicking in-system users probably.

      My blog is too small right now for spammers or freaks to start doing such things. In future, I have plans to add a spam filter and/or third party auth if it becomes necessary. But considering that even a popular site like wptavern.com doesn't have that, it doesn't seem like an absolute necessity to have?

      2 votes
  4. post_below
    Link
    As @csos95 mentioned, in your example SQLite would ignore character constraints, but putting that aside in case you decide to enforce them some other way, you might consider making them more...

    As @csos95 mentioned, in your example SQLite would ignore character constraints, but putting that aside in case you decide to enforce them some other way, you might consider making them more realistic.

    For example:

    ip varchar(500)

    The longest IP you're going to encounter any time soon is IPV6 at a maximum of 39 characters.

    Some SQL engines (I don't know if SQLite is among them) will effectively convert VARCHAR to CHAR when creating temp tables for things like grouping and sorting. The bigger your VARCHARs the higher the chance it will get pushed out of memory and reduce performance.

    Like I said this may not apply to SQLite, but no harm in having realistic constraints.

    5 votes
  5. [7]
    unkz
    Link
    I assume you aren’t really storing passwords in plaintext?

    I assume you aren’t really storing passwords in plaintext?

    2 votes
    1. [6]
      pyeri
      Link Parent
      Yes. This password field will merely store the md5 hash of the password. App will compare the hashes in real time to authenticate the user.

      Yes. This password field will merely store the md5 hash of the password. App will compare the hashes in real time to authenticate the user.

      1 vote
      1. unkz
        Link Parent
        You shouldn’t use MD5 for that purpose anymore. It only takes seconds to calculate a collision these days. PBKDF2 or something similar.

        You shouldn’t use MD5 for that purpose anymore. It only takes seconds to calculate a collision these days. PBKDF2 or something similar.

        20 votes
      2. csos95
        Link Parent
        Adding to what unkz said, php has built-in functions for secure password hashing and verification. password_hash and password_verify.

        Adding to what unkz said, php has built-in functions for secure password hashing and verification.
        password_hash and password_verify.

        11 votes
      3. [2]
        Power0utage
        Link Parent
        This is what immediately stuck out to me too. OP, please don't store plaintext passwords. Please don't store MD5 or SHA passwords. Ideally, please don't store passwords at all, but if you're going...

        This is what immediately stuck out to me too.

        OP, please don't store plaintext passwords. Please don't store MD5 or SHA passwords. Ideally, please don't store passwords at all, but if you're going to do it, follow best practices. Even better, use a library that does it for you.

        If you're going to run a website, big or small, with user authentication, you need to go in assuming that the database is going to get compromised. You also need to assume that some portion of your users are going to use the same password that they use for their email and bank accounts, so you're probably going to be directly responsible for them getting compromised.

        3 votes
        1. skybrian
          Link Parent
          This is an unconventional choice, but for the forum I’m building, my plan is to require a Github login. I don’t want to touch passwords and I’m not sure I want anyone’s email address either. I...

          This is an unconventional choice, but for the forum I’m building, my plan is to require a Github login. I don’t want to touch passwords and I’m not sure I want anyone’s email address either. I don’t want to worry about account recovery or email delivery. As a bonus, I can rely on Github for usernames, too. Implementing it was pretty easy.

          It’s not hard to sign up with Github and you can create multiple accounts. It will probably be too much friction for some people. I’m not that interested in growth, though.

          Maybe I’ll add direct support for passkeys someday.

          2 votes
      4. zod000
        Link Parent
        To reinforce what unkz said, it would be in your best interest to use something more secure for hashes like PBKDF2 or better. In fact, It might be a good idea to do as little of the password...

        To reinforce what unkz said, it would be in your best interest to use something more secure for hashes like PBKDF2 or better. In fact, It might be a good idea to do as little of the password authentication code on your own as possible.

        3 votes
  6. [3]
    zod000
    Link
    The schema itself looks fine to me at first glance. I would recommend that you are rigorous on enforcing the contents of the data being inserted beyond just using strict mode for SQLite. You...

    The schema itself looks fine to me at first glance. I would recommend that you are rigorous on enforcing the contents of the data being inserted beyond just using strict mode for SQLite. You should look into methods for preventing SQL injection and XSS attacks among other things. Though you may not need it immediately, it may be worth looking for APIs to automate checking for spam or fraud/abuse messages (many are free up to a certain quota). Good luck with your project!

    2 votes
    1. [2]
      Power0utage
      Link Parent
      If you're using PHP, you should be using PDO or mysqli_ with bound parameters to start your SQL injection prevention journey. Like everything else, there are already well-established libraries and...

      If you're using PHP, you should be using PDO or mysqli_ with bound parameters to start your SQL injection prevention journey. Like everything else, there are already well-established libraries and frameworks that can handle most of these gotchas for you.

      As for spam, Akismet used to be the big one and it was pretty easy to integrate. Not sure if that's still the case!

      1 vote
      1. zod000
        Link Parent
        I'm glad you provided some specific examples for the OP, my PHP experience isn't current enough that I felt comfortable giving out highly specific advice.

        I'm glad you provided some specific examples for the OP, my PHP experience isn't current enough that I felt comfortable giving out highly specific advice.

        1 vote