13 votes

What's it take to make a secure, stable, and scalable site?

I think I've identified an unfulfilled need in our current online environment, have an idea of an end-result that could flourish, and some possible ways to monetize without being a cancer on society.
I have some experience with basic site-building, but this would be a very ambitious project for one person. It will probably take a long time before I get it going—if I get it going at all. I know some of what I need to start learning to make it work, but my mind doesn't know where to start when it comes to security, stability, or scalability.

Security:
This is probably my biggest concern. We hear all the time about data breaches, and this idea is—at its core—a kind of social media; by its nature we're dealing with PII.
Where does everyone go wrong here, and how do I protect the information of potential users?
All I've got is I know that storing passwords in plaintext is a bad idea, and to encrypt 'stuff' .

Stability:
I'm sure we've all experienced a site going down to a heavy load. Say I can get a site up, and things go well for a bit, and then somebody with a big twitter/instagram/youtube audience says something and a hoard of people come to check it out. A site going down when you try to check it out doesn't leave a positive impression. Can this sort of thing be prepared for, in the possibility such a thing ever happened?

Scalability:
If I can get a workable site running, I don't expect much of a userbase for a while, if ever. But if it does get large enough to attract hundreds of thousands, or a million+ returning users, it would be nice if I could have the architecture in place to handle that ahead of time, instead of playing catch-up later. What makes a site scalable, and is there any reason it can't be prepared for?

I'd love to be able to pay some folk to help me, but I don't have the cash for that. The initial launch will have to be done by me alone, so I'm fully prepared to learn what I need to and take the long amount of time required (maybe next summer?) to get the minimum product going. Even if this ends up not being feasible to get going alone, I figure the worst case scenario is a lot of learning along the way.

22 comments

  1. [9]
    aphoenix
    (edited )
    Link
    This is a huge topic; it's analogous to saying "how does one be a doctor?" and expecting there to be some sort of answer. This is the sort of thing that people learn and implement over a period of...

    This is a huge topic; it's analogous to saying "how does one be a doctor?" and expecting there to be some sort of answer. This is the sort of thing that people learn and implement over a period of years. I'll try to give some general ideas of where to start.

    The shortest answer I can give you is that each of these are things to work towards incrementally. You need to have an idea of what you mean by "secure, stable, scalable" and exactly how much security, stability, and scalability that you need. The answer is that you want enough to be acceptable, but not so much that it's prohibitively expensive. I could, without too much personal effort, set up a site that was nigh unhackable, incredibly stable, and could handle a very large number of concurrent connections. This would be very expensive, and it wouldn't be 100% effective at any of these. It will always be possible for someone to hack your site or for the site to experience too much traffic. The best you can do is to make it more difficult than it is worth to get the information that you have, and for the site to survive normal usage.

    Application / Network Architecture is where a lot of the answers are going to be. You need to know how and where you are going to store data. There are a number of different ways to architect your system with security in mind, and there are a number of different options depending on budgetary and scalability needs.

    That said, here are some things that I would recommend considering:

    • use SSH for server access, with key authentication (not password)
    • don't use FTP
    • only allow SSH from a particular VPN
    • monitor login attempts and log things properly
    • keep your servers up to date - do the security patches as quickly as possible
    • back things up
    • back things up though, like, for real
    • Edit (thanks @Bauke): Know how to restore those backups when things go wrong!
    • keep your data separate from your application, and don't allow public access to your database
    • Edit (thanks @Akir): Validate all your data on the server side. You can pre-validate things in the client, but that is not sufficient validation.

    Those are some things that I tend to do, but they are by no means an exhaustive list - those are just things I remembered off the top of my head. These won't give you great security, but it's a place to start for someone at "don't store passwords as plain text".

    11 votes
    1. [3]
      Bauke
      Link Parent
      And just as important, know how to restore from backups!!! Your backups will be nigh worthless if you do not know how to restore your systems from them. Make sure that in the event of catastrophe,...
      • back things up
      • back things up though, like, for real

      And just as important, know how to restore from backups!!! Your backups will be nigh worthless if you do not know how to restore your systems from them. Make sure that in the event of catastrophe, you know exactly what you have to do to get back up and running. Have it written down and practice it. You don't want to be caught with your pants down when shit hits the fan.

      6 votes
      1. aphoenix
        Link Parent
        This is super important, and I'm going to add it to my original list. I tend to think of it like this: you've only backed something up if it is successfully retrievable. If you have backed it up...

        And just as important, know how to restore from backups!!!

        This is super important, and I'm going to add it to my original list. I tend to think of it like this: you've only backed something up if it is successfully retrievable. If you have backed it up in a way that you can't get it, than you haven't done part 1: back things up.

        This seems like it might be obvious to people who don't necessarily do this, or who are just starting out, but some backup options don't make it easy to restore from your backups. You have to know the process for doing so.

        4 votes
      2. DrStone
        Link Parent
        Not only practice, but regularly test. Being comfortable restoring from a backup and having it work in the beginning isn't going to help if over time something changed or screwed up the backup data.

        practice it

        Not only practice, but regularly test. Being comfortable restoring from a backup and having it work in the beginning isn't going to help if over time something changed or screwed up the backup data.

        1 vote
    2. [2]
      Akir
      Link Parent
      While I agree with the sentiment, you should also be putting these factors into consideration while you're designing the framework. Though it's hard to understand how to build your foundations...

      The shortest answer I can give you is that each of these are things to work towards incrementally.

      While I agree with the sentiment, you should also be putting these factors into consideration while you're designing the framework. Though it's hard to understand how to build your foundations without having experience under your belt. It might not be a bad idea to bring a more experienced partner on board if you can.

      And here's a few more tips for security:

      • Whatever processes can be done on the server should be done on the server.
      • Never trust software that runs on the client. This also includes basic forms - validate all data server-side.
      • Make sure you only expose data to the client when absolutely necessary
      • Make use of version control systems like Git. Use an external service like GitHub and it doubles as a form of backup.
      4 votes
      1. aphoenix
        Link Parent
        I hope you didn't conclude from what I said that these factors are anything other than necessary from the beginning of planning to launch of a project, and beyond. I disagree. Do things where it...

        While I agree with the sentiment, you should also be putting these factors into consideration while you're designing the framework.

        I hope you didn't conclude from what I said that these factors are anything other than necessary from the beginning of planning to launch of a project, and beyond.

        Whatever processes can be done on the server should be done on the server.

        I disagree. Do things where it is appropriate to do so. Sometimes that means things are done client side, and that's okay. We don't even know what the project is - it may be necessary to do some, or even most, of the work on the client side.

        validate all data server-side.

        That's 100% necessary, and I'm adding it to my original list for posterity.

        2 votes
    3. [3]
      CALICO
      Link Parent
      You're not wrong. Before posting this topic I was tracking I need to learn/brush up on: Databases + SQL JavaScript Servers HTTP Version Control Authentication whatever private messaging entails I...

      This is a huge topic; it's analogous to saying "how does one be a doctor?" and expecting there to be some sort of answer.

      You're not wrong. Before posting this topic I was tracking I need to learn/brush up on:

      • Databases + SQL
      • JavaScript
      • Servers
      • HTTP
      • Version Control
      • Authentication
      • whatever private messaging entails

      I have pretty solid experience—for a non-professional—with HTML/CSS & PHP, and self hosting.
      I'm under no delusion that this wouldn't be a big step up from what I've touched in the past, but if it was easy then where would be the fun in that? I know it will be hard, and take a while to get a running prototype I can demo to validate the market.


      You need to have an idea of what you mean by "secure, stable, scalable" and exactly how much security, stability, and scalability that you need.

      That's fair.
      By 'secure', I mean protecting users information such as passwords, messages, login stats, and such, and keeping the backend more difficult to exploit than its worth. This is my biggest blindspot, and I know that Cybersecurity is an entire degree program. Nothing is entirely secure, but I'd rather not miss something obvious so any script-kiddie could crack it wide open. I don't need the same level of security as say, Google, because I do not plan to intake as much data or become as large as Facebook, for example. As a smaller, less valuable target, realistically it just needs to be "secure enough".

      On stability, my mind thinks to small sites that get posted on to reddit and crash under the load of tens of thousands of visitors in an hour, whose typical visitor count may normally reach that in a week. I don't really know what makes a site fail like this. On an abstract level, my understanding is the server is not prepared to handle the load. On a practical level, I don't know what that means.
      Is it the coding of the backend to blame? Is it my host? How is it typically avoided? If it's the host, do they charge different price points for anticipated load? Would I have to upgrade my plan with the host to change that? Is there a way to adjust this sort of thing dynamically? Etc etc

      For scalability, mostly I don't know if there's any difference under the hood (not counting the servers) between a blog with a hundred monthly users, and one with a billion. I assume there is, but I don't know what that difference is.


      All that said, I really appreciate your post! You've given me things to work with (as have @Akir and @Bauke), and I have more of a starting point on these things than I did yesterday.

      3 votes
      1. aphoenix
        Link Parent
        Given some more context, I can add a bit more insight: I think the best place to start, then, is with a framework that helps make some of these decisions for you. Since you have PHP experience, I...

        Given some more context, I can add a bit more insight:

        By 'secure', I mean protecting users information such as passwords, messages, login stats, and such, and keeping the backend more difficult to exploit than its worth.

        I think the best place to start, then, is with a framework that helps make some of these decisions for you. Since you have PHP experience, I would suggest checking out Laravel and seeing the sorts of things that it gets you "for free". There is a section in the documentation about security in Laravel that may address some of things that you're looking for or may give you ideas how to do it youself if you do end up wanting to roll your own from scratch, which I do not recommend. I cannot state this more strongly - using a framework to do this is going to be the best first step to security of your application.

        The next big thing I would recommend is to consider every piece of data you store and if it is necessary. For example, I would suggest that you never store things like Credit Card numbers, Social Security / Social Insurance Numbers, Driver's Licence numbers, or other official documentation. If you don't store it, you can't leak it, and most of the time there's no need to store things like that. After that, consider each piece of information that you want to store for relevancy; it might be "nice" to store honourifics, pronouns, or gender, but is there any utility in doing so? (it's okay if your answer is "yes")

        my mind thinks to small sites that get posted on to reddit and crash under the load of tens of thousands of visitors in an hour

        This is a common issue that people bring up, but the reality is that in 99% of cases, the server should crash if suddenly there is a spike in traffic by 1000000%. For most sites, you want to arrange the cost of hosting such that you are paying an appropriate amount for the traffic that you have almost all the time, with a bit of a buffer. You don't want to be paying for the ability to scale to 10000000% capacity at a moment's notice if that's going to happen one time, because in most cases that's just throwing money away. Here's a pirate based analogy: you have a bit of treasure on your ship, and you want to take it from Point A to Point B. When you're doing this, you hire an appropriate sized crew to deal with a pirate ship, should the need arise. You do not hire the entire flotilla of the Imperial British Navy, because that's an inappropriate amount of resourcing for your needs.

        Scaling is a place where you can address stability; consider in the previous analogy if you could hire just one guy to protect your treasure, but when the need arrives then hundreds more could just be zapped into existence! I'm pretty excited about this pirate themed analogy. With some work, you could set up your application to spin up more servers based on demands. This would almost necessarily be done via some kind of cloud computing (Amazon Web Services, Google Cloud, MS Azure, etc) and you would have some set of rules about how to bring on more application servers when needed... but this is almost certainly overkill at this point. I would suggest that you try to think of things like this:

        • get access to a small Linode server, or something similar, and do your development there
        • separate your services into different silos that make sense. A database service, an application service. Those are typically the only two that you need. As you grow, you may identify the need for more services, but that's a good spot to start.
        • use a framework to develop your application. Keep it simple; get a proof of concept working.
        • monitor your traffic usage - when you see an uptick in traffic, requisition bigger servers
        • set up some basic firewall rules (remove standard

        If things are successful and work, then you'll want to think about things like this:

        • am I spending too much time deploying this application to the servers? -> I need a build pipeline
        • are the server resources frequently taxed? -> am I caching enough things?
        • are the servers resources taxed, but I'm still caching things? -> I need a bigger server
        • are there spikes in server resources that happen with some frequency? -> I need to be able to scale based on traffic

        There are tons more points I could put on this list, but I think the most important part is to start building. I think you have the right ideas in your head about how to move forward ("I want to be secure and stable") and I think it's important to remember that perfect is the enemy of good; sometimes perfect is the enemy of doing things. There's lots that I glossed over too - this is a really barebones comment - but hopefully this is enough to help get you started.

        3 votes
      2. skybrian
        Link Parent
        Consider whether you need to store passwords at all, or whether you could rely on some other identity provider such as email. For a website that people use rarely, it might be okay to send them an...

        Consider whether you need to store passwords at all, or whether you could rely on some other identity provider such as email. For a website that people use rarely, it might be okay to send them an email with a link whenever they want to log in. Even on a website used frequently, you could rely on an authentication cookie between rare logins.

        If you have passwords then you will need password recovery so it might make sense just to rely on recovery.

        But this assumes people have working email and will remember which email they gave you and they won't lose their account. It depends on the audience and how bad it is for someone to lose access to their account. If it's important then for redundancy, they might need multiple ways to get to their account, such as registering multiple email addresses or alternatives such as Google, GitHub, or Facebook authentication.

        Which is to say, a problem like managing user accounts can be easy or hard depending on what assumptions you make about your audience, how important it is to get it right and what other services you can leverage. Truly universal service is hard, but there are easier solutions that might be sufficient.

        1 vote
  2. [7]
    TACD
    Link
    Good news! Your biggest concern is possibly the easiest one to address: You can avoid data breaches and the loss of PII by not collecting any in the first place. The overwhelming majority of...

    Good news! Your biggest concern is possibly the easiest one to address: You can avoid data breaches and the loss of PII by not collecting any in the first place. The overwhelming majority of information collected by sites isn't absolutely necessary for delivering a service and is collected for marketing information or 'just in case'.

    Collect the absolute bare minimum of information you need, and then challenge yourself as to whether you even need that. A genuine, provable dedication to user privacy is still quite a rare USP, and nobody can steal what you don't have.

    9 votes
    1. [6]
      teaearlgraycold
      Link Parent
      And you don’t even need passwords if you use magic links or oauth

      And you don’t even need passwords if you use magic links or oauth

      1 vote
      1. [5]
        Wes
        Link Parent
        I'm not familiar with magic links. Would that be some sort of nonce in a URL emailed to the user for authentication?

        I'm not familiar with magic links. Would that be some sort of nonce in a URL emailed to the user for authentication?

        2 votes
        1. StellarTabi
          Link Parent
          but then you have to ask for an email address :(

          but then you have to ask for an email address :(

          1 vote
  3. [2]
    skybrian
    Link
    Keeping things simple and scalable means keeping it a static website as much as possible. A static website generator can do arbitrarily tricky stuff but it can be kept offline when you're not...

    Keeping things simple and scalable means keeping it a static website as much as possible. A static website generator can do arbitrarily tricky stuff but it can be kept offline when you're not using it, or at least out of the serving path.

    4 votes
    1. Wes
      Link Parent
      I love SSGs, but I don't know if they're really applicable for social media websites as CALICO described. You're likely going to need a database at some point.

      I love SSGs, but I don't know if they're really applicable for social media websites as CALICO described. You're likely going to need a database at some point.

      3 votes
  4. [2]
    stu2b50
    (edited )
    Link
    The other post is right. These are big idea questions that companies startups hire senior developers for 300k-500k/yr to answer. There's no simple answer. Each one is an entire domain that a team...

    The other post is right. These are big idea questions that companies startups hire senior developers for 300k-500k/yr to answer. There's no simple answer. Each one is an entire domain that a team of engineers would focus on.

    Edit

    To actually be helpful, here are some things to consider

    • as a small site, try to delegate as much as possible. Oauth makes Google or Facebook handle it, and their quite literal tens of millions of dollars of salary dedicated to make it work

    • to reiterate, password security is an incredible pain in the ass, pls use Oauth instead.

    • if you really must, you MUST hash and salt your passwords, preferably with a well thought of hash like bcrypt or argon. Use a library, don't implement either yourself. Salting is important, thankfully many libraries do it for you now.

    • it's extremely tedious, but having really solid dev architecture is key for stability. That means a really well thought code review process, solid CI, and setting up k8s or an alternative to manage deploys, automatic health and recover.

    • scalability is why companies spend billions of dollars on AWS/Azure/etc. Esp with k8s, you can quickly spin up more clusters to handle demand.

    • also just keep it in mind

    3 votes
    1. StellarTabi
      Link Parent
      also opens you up to weird bugs and/or threat vectors. sign up for Trello with an email address at domain? Somebody makes an Atlas organization with that domain? your atlas avatar now shows up in...

      Oauth makes Google or Facebook handle it

      also opens you up to weird bugs and/or threat vectors.

      • sign up for Trello with an email address at domain? Somebody makes an Atlas organization with that domain? your atlas avatar now shows up in Trello! Are you able to log in with both a password and a "Login with your Atlas account"? Is your implementation of this secure? Is Atlas's? Did your user expect this? Can my account be hijacked from someone who only knows my name or only knows my email address?
      • signup anywhere with your email address or your Facebook account. Then sign up again with the other (assume same email address associated with both). Do you have two accounts? Did you gain access to your account without entering a password or verifying your email address?
      • signup anywhere with your Facebook account. What if you need to change your email? Does it work? What if you change your email address on Facebook itself?
      • signup anywhere with your Facebook account. What if Facebook terminates your account for literally no reason?
      • signup anywhere with your Facebook account. Make a Twitter account with the same email. Log into the service. Does this Twitter account give you access to the account associated with the Facebook account? Can you still log in with your Facebook account? Did you have to verify your Twitter account by email to even do this?
      • Facebook decides your website no longer can authenticate users with their facebook accounts anymore or randomly revokes your keys. gf hf.

      One thing that would help guard against the account hijacking possibilities of the above is implementing TOTP (one-time passwords) in your webapp. Not through SMS, that's hijackable and I don't want to enter my phone number. Google Authenticator/Authy/LastPass have TOTP apps.

      3 votes
  5. StellarTabi
    Link
    Stability: Strong type system + High Test Coverage. Strong type system is NOT a substitute for automated tests, and automated tests are NOT a substitute for a Strong type system. Sidenote,...

    Stability: Strong type system + High Test Coverage.

    Strong type system is NOT a substitute for automated tests, and automated tests are NOT a substitute for a Strong type system. Sidenote, languages with strong types tend to be faster and less memory intensive languages. You might spend more time upfront battling the type system and writing tests (Test Driven Design AKA "TDD") but you'll save even more time down the line because types will force you to define behavior for more edge cases, tests will cover your base APIs, bugfixes should be covered by tests, new features covered with new tests. When new features or major refactoring happens, the type system and tests (better together) will let you fearlessly push new releases.

    Tests should cover both the "happy path and "unhappy paths" as possible. Make sure things that shouldn't work actually don't work (validation errors, user input, fuzzying, references to invalid objects, API calls that require permission and/or ownership to view or edit or delete).

    People shouldn't be getting "500 errors" if they fill out a form. Useful validation messages.

    Database transactions for complex DB updates so that if SQL statement 2 out of 3 fails in an API call, you don't leave a resource in an unwanted state.

    Learn more SQL than your ORM provides, and fully learn your ORM. Good ORMs allow you to make complex composable SQL queries and also allow you to mix in RAW SQL. Learn about indexes and EXPLAIN for performance.

    Use integer IDs for internal references. If you want, hide them in API calls (expose a UUID instead) so that outside observers can't figure out how many products or users are in your DB, and makes guessing URLs for new items or other users items harder. UUIDs are also valid if your DB will be sharded across regions. If you're a B2B (like slack), you might want to isolate each customer to their own DB. Enterprise customers should quit you if another customer's data shows up, because that means another customer is probably seeing their data.

    Always use prepared statements in SQL (see SQL Injection). Don't use database users with full privileges (why would the database user your main application runtime uses be allowed to drop any table, be an owner of the DB, or be an admin user?). Use a separate DB user account for altering tables and administration.

    Don't trust user input. if you accept HTML, someone will try to put JavaScript in it. If you don't accept HTML, someone will still try to put JavaScript in it just in case you don't sanitize it. If data input by one user is shown to another user, make sure it's escaped so that you don't compromise your user.

    Don't let users upload files directly to your server. Always upload them to something like AWS S3.

    Anyone can upload a *.jpg with the filename ending in *.png. A *.exe could be named *.jpg. An image file can have a password protected *.zip file full of pirated content embedded while still being a valid image file. Your safest bet is to store the uploaded images privately, use a serverless function (aka AWS lambda, aka not your real application server) to make filesize-optimized thumbnails, and only show those thumbnails to public end-users.

    3 votes
  6. thistle
    Link
    Scalability should not be your top priority at this point. If you prioritise scaling too early, then you'll lose valuable time that could be spent making what you have work better for your early...

    Scalability should not be your top priority at this point. If you prioritise scaling too early, then you'll lose valuable time that could be spent making what you have work better for your early user base. Making architectural changes to scale up later isn't always as hard as you might think.

    That said, there is a balance to it - you should make things with some level of generalisation in mind. Modularity is great for this (writing bits of code that interface with each other in pre-defined ways, allowing you to change their internal workings without having to change other modules).

    1 vote