This is a huge topic; it's analogous to saying "how does one be a doctor?" and expecting there to be some sort of answer. This is the sort of thing that people learn and implement over a period of...
This is a huge topic; it's analogous to saying "how does one be a doctor?" and expecting there to be some sort of answer. This is the sort of thing that people learn and implement over a period of years. I'll try to give some general ideas of where to start.
The shortest answer I can give you is that each of these are things to work towards incrementally. You need to have an idea of what you mean by "secure, stable, scalable" and exactly how much security, stability, and scalability that you need. The answer is that you want enough to be acceptable, but not so much that it's prohibitively expensive. I could, without too much personal effort, set up a site that was nigh unhackable, incredibly stable, and could handle a very large number of concurrent connections. This would be very expensive, and it wouldn't be 100% effective at any of these. It will always be possible for someone to hack your site or for the site to experience too much traffic. The best you can do is to make it more difficult than it is worth to get the information that you have, and for the site to survive normal usage.
Application / Network Architecture is where a lot of the answers are going to be. You need to know how and where you are going to store data. There are a number of different ways to architect your system with security in mind, and there are a number of different options depending on budgetary and scalability needs.
That said, here are some things that I would recommend considering:
use SSH for server access, with key authentication (not password)
don't use FTP
only allow SSH from a particular VPN
monitor login attempts and log things properly
keep your servers up to date - do the security patches as quickly as possible
back things up
back things up though, like, for real
Edit (thanks @Bauke): Know how to restore those backups when things go wrong!
keep your data separate from your application, and don't allow public access to your database
Edit (thanks @Akir): Validate all your data on the server side. You can pre-validate things in the client, but that is not sufficient validation.
Those are some things that I tend to do, but they are by no means an exhaustive list - those are just things I remembered off the top of my head. These won't give you great security, but it's a place to start for someone at "don't store passwords as plain text".
This is super important, and I'm going to add it to my original list. I tend to think of it like this: you've only backed something up if it is successfully retrievable. If you have backed it up...
And just as important, know how to restore from backups!!!
This is super important, and I'm going to add it to my original list. I tend to think of it like this: you've only backed something up if it is successfully retrievable. If you have backed it up in a way that you can't get it, than you haven't done part 1: back things up.
This seems like it might be obvious to people who don't necessarily do this, or who are just starting out, but some backup options don't make it easy to restore from your backups. You have to know the process for doing so.
Not only practice, but regularly test. Being comfortable restoring from a backup and having it work in the beginning isn't going to help if over time something changed or screwed up the backup data.
practice it
Not only practice, but regularly test. Being comfortable restoring from a backup and having it work in the beginning isn't going to help if over time something changed or screwed up the backup data.
While I agree with the sentiment, you should also be putting these factors into consideration while you're designing the framework. Though it's hard to understand how to build your foundations...
The shortest answer I can give you is that each of these are things to work towards incrementally.
While I agree with the sentiment, you should also be putting these factors into consideration while you're designing the framework. Though it's hard to understand how to build your foundations without having experience under your belt. It might not be a bad idea to bring a more experienced partner on board if you can.
And here's a few more tips for security:
Whatever processes can be done on the server should be done on the server.
Never trust software that runs on the client. This also includes basic forms - validate all data server-side.
Make sure you only expose data to the client when absolutely necessary
Make use of version control systems like Git. Use an external service like GitHub and it doubles as a form of backup.
I hope you didn't conclude from what I said that these factors are anything other than necessary from the beginning of planning to launch of a project, and beyond. I disagree. Do things where it...
While I agree with the sentiment, you should also be putting these factors into consideration while you're designing the framework.
I hope you didn't conclude from what I said that these factors are anything other than necessary from the beginning of planning to launch of a project, and beyond.
Whatever processes can be done on the server should be done on the server.
I disagree. Do things where it is appropriate to do so. Sometimes that means things are done client side, and that's okay. We don't even know what the project is - it may be necessary to do some, or even most, of the work on the client side.
validate all data server-side.
That's 100% necessary, and I'm adding it to my original list for posterity.
Given some more context, I can add a bit more insight: I think the best place to start, then, is with a framework that helps make some of these decisions for you. Since you have PHP experience, I...
Given some more context, I can add a bit more insight:
By 'secure', I mean protecting users information such as passwords, messages, login stats, and such, and keeping the backend more difficult to exploit than its worth.
I think the best place to start, then, is with a framework that helps make some of these decisions for you. Since you have PHP experience, I would suggest checking out Laravel and seeing the sorts of things that it gets you "for free". There is a section in the documentation about security in Laravel that may address some of things that you're looking for or may give you ideas how to do it youself if you do end up wanting to roll your own from scratch, which I do not recommend. I cannot state this more strongly - using a framework to do this is going to be the best first step to security of your application.
The next big thing I would recommend is to consider every piece of data you store and if it is necessary. For example, I would suggest that you never store things like Credit Card numbers, Social Security / Social Insurance Numbers, Driver's Licence numbers, or other official documentation. If you don't store it, you can't leak it, and most of the time there's no need to store things like that. After that, consider each piece of information that you want to store for relevancy; it might be "nice" to store honourifics, pronouns, or gender, but is there any utility in doing so? (it's okay if your answer is "yes")
my mind thinks to small sites that get posted on to reddit and crash under the load of tens of thousands of visitors in an hour
This is a common issue that people bring up, but the reality is that in 99% of cases, the server should crash if suddenly there is a spike in traffic by 1000000%. For most sites, you want to arrange the cost of hosting such that you are paying an appropriate amount for the traffic that you have almost all the time, with a bit of a buffer. You don't want to be paying for the ability to scale to 10000000% capacity at a moment's notice if that's going to happen one time, because in most cases that's just throwing money away. Here's a pirate based analogy: you have a bit of treasure on your ship, and you want to take it from Point A to Point B. When you're doing this, you hire an appropriate sized crew to deal with a pirate ship, should the need arise. You do not hire the entire flotilla of the Imperial British Navy, because that's an inappropriate amount of resourcing for your needs.
Scaling is a place where you can address stability; consider in the previous analogy if you could hire just one guy to protect your treasure, but when the need arrives then hundreds more could just be zapped into existence! I'm pretty excited about this pirate themed analogy. With some work, you could set up your application to spin up more servers based on demands. This would almost necessarily be done via some kind of cloud computing (Amazon Web Services, Google Cloud, MS Azure, etc) and you would have some set of rules about how to bring on more application servers when needed... but this is almost certainly overkill at this point. I would suggest that you try to think of things like this:
get access to a small Linode server, or something similar, and do your development there
separate your services into different silos that make sense. A database service, an application service. Those are typically the only two that you need. As you grow, you may identify the need for more services, but that's a good spot to start.
use a framework to develop your application. Keep it simple; get a proof of concept working.
monitor your traffic usage - when you see an uptick in traffic, requisition bigger servers
set up some basic firewall rules (remove standard
If things are successful and work, then you'll want to think about things like this:
am I spending too much time deploying this application to the servers? -> I need a build pipeline
are the server resources frequently taxed? -> am I caching enough things?
are the servers resources taxed, but I'm still caching things? -> I need a bigger server
are there spikes in server resources that happen with some frequency? -> I need to be able to scale based on traffic
There are tons more points I could put on this list, but I think the most important part is to start building. I think you have the right ideas in your head about how to move forward ("I want to be secure and stable") and I think it's important to remember that perfect is the enemy of good; sometimes perfect is the enemy of doing things. There's lots that I glossed over too - this is a really barebones comment - but hopefully this is enough to help get you started.
Consider whether you need to store passwords at all, or whether you could rely on some other identity provider such as email. For a website that people use rarely, it might be okay to send them an...
Consider whether you need to store passwords at all, or whether you could rely on some other identity provider such as email. For a website that people use rarely, it might be okay to send them an email with a link whenever they want to log in. Even on a website used frequently, you could rely on an authentication cookie between rare logins.
If you have passwords then you will need password recovery so it might make sense just to rely on recovery.
But this assumes people have working email and will remember which email they gave you and they won't lose their account. It depends on the audience and how bad it is for someone to lose access to their account. If it's important then for redundancy, they might need multiple ways to get to their account, such as registering multiple email addresses or alternatives such as Google, GitHub, or Facebook authentication.
Which is to say, a problem like managing user accounts can be easy or hard depending on what assumptions you make about your audience, how important it is to get it right and what other services you can leverage. Truly universal service is hard, but there are easier solutions that might be sufficient.
Good news! Your biggest concern is possibly the easiest one to address: You can avoid data breaches and the loss of PII by not collecting any in the first place. The overwhelming majority of...
Good news! Your biggest concern is possibly the easiest one to address: You can avoid data breaches and the loss of PII by not collecting any in the first place. The overwhelming majority of information collected by sites isn't absolutely necessary for delivering a service and is collected for marketing information or 'just in case'.
Collect the absolute bare minimum of information you need, and then challenge yourself as to whether you even need that. A genuine, provable dedication to user privacy is still quite a rare USP, and nobody can steal what you don't have.
Keeping things simple and scalable means keeping it a static website as much as possible. A static website generator can do arbitrarily tricky stuff but it can be kept offline when you're not...
Keeping things simple and scalable means keeping it a static website as much as possible. A static website generator can do arbitrarily tricky stuff but it can be kept offline when you're not using it, or at least out of the serving path.
I love SSGs, but I don't know if they're really applicable for social media websites as CALICO described. You're likely going to need a database at some point.
I love SSGs, but I don't know if they're really applicable for social media websites as CALICO described. You're likely going to need a database at some point.
The other post is right. These are big idea questions that companies startups hire senior developers for 300k-500k/yr to answer. There's no simple answer. Each one is an entire domain that a team...
The other post is right. These are big idea questions that companies startups hire senior developers for 300k-500k/yr to answer. There's no simple answer. Each one is an entire domain that a team of engineers would focus on.
Edit
To actually be helpful, here are some things to consider
as a small site, try to delegate as much as possible. Oauth makes Google or Facebook handle it, and their quite literal tens of millions of dollars of salary dedicated to make it work
to reiterate, password security is an incredible pain in the ass, pls use Oauth instead.
if you really must, you MUST hash and salt your passwords, preferably with a well thought of hash like bcrypt or argon. Use a library, don't implement either yourself. Salting is important, thankfully many libraries do it for you now.
it's extremely tedious, but having really solid dev architecture is key for stability. That means a really well thought code review process, solid CI, and setting up k8s or an alternative to manage deploys, automatic health and recover.
scalability is why companies spend billions of dollars on AWS/Azure/etc. Esp with k8s, you can quickly spin up more clusters to handle demand.
also opens you up to weird bugs and/or threat vectors. sign up for Trello with an email address at domain? Somebody makes an Atlas organization with that domain? your atlas avatar now shows up in...
Oauth makes Google or Facebook handle it
also opens you up to weird bugs and/or threat vectors.
sign up for Trello with an email address at domain? Somebody makes an Atlas organization with that domain? your atlas avatar now shows up in Trello! Are you able to log in with both a password and a "Login with your Atlas account"? Is your implementation of this secure? Is Atlas's? Did your user expect this? Can my account be hijacked from someone who only knows my name or only knows my email address?
signup anywhere with your email address or your Facebook account. Then sign up again with the other (assume same email address associated with both). Do you have two accounts? Did you gain access to your account without entering a password or verifying your email address?
signup anywhere with your Facebook account. What if you need to change your email? Does it work? What if you change your email address on Facebook itself?
signup anywhere with your Facebook account. What if Facebook terminates your account for literally no reason?
signup anywhere with your Facebook account. Make a Twitter account with the same email. Log into the service. Does this Twitter account give you access to the account associated with the Facebook account? Can you still log in with your Facebook account? Did you have to verify your Twitter account by email to even do this?
Facebook decides your website no longer can authenticate users with their facebook accounts anymore or randomly revokes your keys. gf hf.
One thing that would help guard against the account hijacking possibilities of the above is implementing TOTP (one-time passwords) in your webapp. Not through SMS, that's hijackable and I don't want to enter my phone number. Google Authenticator/Authy/LastPass have TOTP apps.
Stability: Strong type system + High Test Coverage. Strong type system is NOT a substitute for automated tests, and automated tests are NOT a substitute for a Strong type system. Sidenote,...
Stability: Strong type system + High Test Coverage.
Strong type system is NOT a substitute for automated tests, and automated tests are NOT a substitute for a Strong type system. Sidenote, languages with strong types tend to be faster and less memory intensive languages. You might spend more time upfront battling the type system and writing tests (Test Driven Design AKA "TDD") but you'll save even more time down the line because types will force you to define behavior for more edge cases, tests will cover your base APIs, bugfixes should be covered by tests, new features covered with new tests. When new features or major refactoring happens, the type system and tests (better together) will let you fearlessly push new releases.
Tests should cover both the "happy path and "unhappy paths" as possible. Make sure things that shouldn't work actually don't work (validation errors, user input, fuzzying, references to invalid objects, API calls that require permission and/or ownership to view or edit or delete).
People shouldn't be getting "500 errors" if they fill out a form. Useful validation messages.
Database transactions for complex DB updates so that if SQL statement 2 out of 3 fails in an API call, you don't leave a resource in an unwanted state.
Learn more SQL than your ORM provides, and fully learn your ORM. Good ORMs allow you to make complex composable SQL queries and also allow you to mix in RAW SQL. Learn about indexes and EXPLAIN for performance.
Use integer IDs for internal references. If you want, hide them in API calls (expose a UUID instead) so that outside observers can't figure out how many products or users are in your DB, and makes guessing URLs for new items or other users items harder. UUIDs are also valid if your DB will be sharded across regions. If you're a B2B (like slack), you might want to isolate each customer to their own DB. Enterprise customers should quit you if another customer's data shows up, because that means another customer is probably seeing their data.
Always use prepared statements in SQL (see SQL Injection). Don't use database users with full privileges (why would the database user your main application runtime uses be allowed to drop any table, be an owner of the DB, or be an admin user?). Use a separate DB user account for altering tables and administration.
Don't trust user input. if you accept HTML, someone will try to put JavaScript in it. If you don't accept HTML, someone will still try to put JavaScript in it just in case you don't sanitize it. If data input by one user is shown to another user, make sure it's escaped so that you don't compromise your user.
Don't let users upload files directly to your server. Always upload them to something like AWS S3.
Anyone can upload a *.jpg with the filename ending in *.png. A *.exe could be named *.jpg. An image file can have a password protected *.zip file full of pirated content embedded while still being a valid image file. Your safest bet is to store the uploaded images privately, use a serverless function (aka AWS lambda, aka not your real application server) to make filesize-optimized thumbnails, and only show those thumbnails to public end-users.
This is a huge topic; it's analogous to saying "how does one be a doctor?" and expecting there to be some sort of answer. This is the sort of thing that people learn and implement over a period of years. I'll try to give some general ideas of where to start.
The shortest answer I can give you is that each of these are things to work towards incrementally. You need to have an idea of what you mean by "secure, stable, scalable" and exactly how much security, stability, and scalability that you need. The answer is that you want enough to be acceptable, but not so much that it's prohibitively expensive. I could, without too much personal effort, set up a site that was nigh unhackable, incredibly stable, and could handle a very large number of concurrent connections. This would be very expensive, and it wouldn't be 100% effective at any of these. It will always be possible for someone to hack your site or for the site to experience too much traffic. The best you can do is to make it more difficult than it is worth to get the information that you have, and for the site to survive normal usage.
Application / Network Architecture is where a lot of the answers are going to be. You need to know how and where you are going to store data. There are a number of different ways to architect your system with security in mind, and there are a number of different options depending on budgetary and scalability needs.
That said, here are some things that I would recommend considering:
Those are some things that I tend to do, but they are by no means an exhaustive list - those are just things I remembered off the top of my head. These won't give you great security, but it's a place to start for someone at "don't store passwords as plain text".
This is super important, and I'm going to add it to my original list. I tend to think of it like this: you've only backed something up if it is successfully retrievable. If you have backed it up in a way that you can't get it, than you haven't done part 1: back things up.
This seems like it might be obvious to people who don't necessarily do this, or who are just starting out, but some backup options don't make it easy to restore from your backups. You have to know the process for doing so.
Not only practice, but regularly test. Being comfortable restoring from a backup and having it work in the beginning isn't going to help if over time something changed or screwed up the backup data.
While I agree with the sentiment, you should also be putting these factors into consideration while you're designing the framework. Though it's hard to understand how to build your foundations without having experience under your belt. It might not be a bad idea to bring a more experienced partner on board if you can.
And here's a few more tips for security:
I hope you didn't conclude from what I said that these factors are anything other than necessary from the beginning of planning to launch of a project, and beyond.
I disagree. Do things where it is appropriate to do so. Sometimes that means things are done client side, and that's okay. We don't even know what the project is - it may be necessary to do some, or even most, of the work on the client side.
That's 100% necessary, and I'm adding it to my original list for posterity.
Given some more context, I can add a bit more insight:
I think the best place to start, then, is with a framework that helps make some of these decisions for you. Since you have PHP experience, I would suggest checking out Laravel and seeing the sorts of things that it gets you "for free". There is a section in the documentation about security in Laravel that may address some of things that you're looking for or may give you ideas how to do it youself if you do end up wanting to roll your own from scratch, which I do not recommend. I cannot state this more strongly - using a framework to do this is going to be the best first step to security of your application.
The next big thing I would recommend is to consider every piece of data you store and if it is necessary. For example, I would suggest that you never store things like Credit Card numbers, Social Security / Social Insurance Numbers, Driver's Licence numbers, or other official documentation. If you don't store it, you can't leak it, and most of the time there's no need to store things like that. After that, consider each piece of information that you want to store for relevancy; it might be "nice" to store honourifics, pronouns, or gender, but is there any utility in doing so? (it's okay if your answer is "yes")
This is a common issue that people bring up, but the reality is that in 99% of cases, the server should crash if suddenly there is a spike in traffic by 1000000%. For most sites, you want to arrange the cost of hosting such that you are paying an appropriate amount for the traffic that you have almost all the time, with a bit of a buffer. You don't want to be paying for the ability to scale to 10000000% capacity at a moment's notice if that's going to happen one time, because in most cases that's just throwing money away. Here's a pirate based analogy: you have a bit of treasure on your ship, and you want to take it from Point A to Point B. When you're doing this, you hire an appropriate sized crew to deal with a pirate ship, should the need arise. You do not hire the entire flotilla of the Imperial British Navy, because that's an inappropriate amount of resourcing for your needs.
Scaling is a place where you can address stability; consider in the previous analogy if you could hire just one guy to protect your treasure, but when the need arrives then hundreds more could just be zapped into existence! I'm pretty excited about this pirate themed analogy. With some work, you could set up your application to spin up more servers based on demands. This would almost necessarily be done via some kind of cloud computing (Amazon Web Services, Google Cloud, MS Azure, etc) and you would have some set of rules about how to bring on more application servers when needed... but this is almost certainly overkill at this point. I would suggest that you try to think of things like this:
If things are successful and work, then you'll want to think about things like this:
There are tons more points I could put on this list, but I think the most important part is to start building. I think you have the right ideas in your head about how to move forward ("I want to be secure and stable") and I think it's important to remember that perfect is the enemy of good; sometimes perfect is the enemy of doing things. There's lots that I glossed over too - this is a really barebones comment - but hopefully this is enough to help get you started.
Consider whether you need to store passwords at all, or whether you could rely on some other identity provider such as email. For a website that people use rarely, it might be okay to send them an email with a link whenever they want to log in. Even on a website used frequently, you could rely on an authentication cookie between rare logins.
If you have passwords then you will need password recovery so it might make sense just to rely on recovery.
But this assumes people have working email and will remember which email they gave you and they won't lose their account. It depends on the audience and how bad it is for someone to lose access to their account. If it's important then for redundancy, they might need multiple ways to get to their account, such as registering multiple email addresses or alternatives such as Google, GitHub, or Facebook authentication.
Which is to say, a problem like managing user accounts can be easy or hard depending on what assumptions you make about your audience, how important it is to get it right and what other services you can leverage. Truly universal service is hard, but there are easier solutions that might be sufficient.
Good news! Your biggest concern is possibly the easiest one to address: You can avoid data breaches and the loss of PII by not collecting any in the first place. The overwhelming majority of information collected by sites isn't absolutely necessary for delivering a service and is collected for marketing information or 'just in case'.
Collect the absolute bare minimum of information you need, and then challenge yourself as to whether you even need that. A genuine, provable dedication to user privacy is still quite a rare USP, and nobody can steal what you don't have.
And you don’t even need passwords if you use magic links or oauth
I'm not familiar with magic links. Would that be some sort of nonce in a URL emailed to the user for authentication?
but then you have to ask for an email address :(
I'm assuming this is what @teaearlgraycold is referring to? https://magic.link/
Something like that, but there’s no need to pay for a SaaS version.
Exactly
Keeping things simple and scalable means keeping it a static website as much as possible. A static website generator can do arbitrarily tricky stuff but it can be kept offline when you're not using it, or at least out of the serving path.
I love SSGs, but I don't know if they're really applicable for social media websites as CALICO described. You're likely going to need a database at some point.
The other post is right. These are big idea questions that companies startups hire senior developers for 300k-500k/yr to answer. There's no simple answer. Each one is an entire domain that a team of engineers would focus on.
Edit
To actually be helpful, here are some things to consider
as a small site, try to delegate as much as possible. Oauth makes Google or Facebook handle it, and their quite literal tens of millions of dollars of salary dedicated to make it work
to reiterate, password security is an incredible pain in the ass, pls use Oauth instead.
if you really must, you MUST hash and salt your passwords, preferably with a well thought of hash like bcrypt or argon. Use a library, don't implement either yourself. Salting is important, thankfully many libraries do it for you now.
it's extremely tedious, but having really solid dev architecture is key for stability. That means a really well thought code review process, solid CI, and setting up k8s or an alternative to manage deploys, automatic health and recover.
scalability is why companies spend billions of dollars on AWS/Azure/etc. Esp with k8s, you can quickly spin up more clusters to handle demand.
also just keep it in mind
also opens you up to weird bugs and/or threat vectors.
One thing that would help guard against the account hijacking possibilities of the above is implementing TOTP (one-time passwords) in your webapp. Not through SMS, that's hijackable and I don't want to enter my phone number. Google Authenticator/Authy/LastPass have TOTP apps.
Stability: Strong type system + High Test Coverage.
Strong type system is NOT a substitute for automated tests, and automated tests are NOT a substitute for a Strong type system. Sidenote, languages with strong types tend to be faster and less memory intensive languages. You might spend more time upfront battling the type system and writing tests (Test Driven Design AKA "TDD") but you'll save even more time down the line because types will force you to define behavior for more edge cases, tests will cover your base APIs, bugfixes should be covered by tests, new features covered with new tests. When new features or major refactoring happens, the type system and tests (better together) will let you fearlessly push new releases.
Tests should cover both the "happy path and "unhappy paths" as possible. Make sure things that shouldn't work actually don't work (validation errors, user input, fuzzying, references to invalid objects, API calls that require permission and/or ownership to view or edit or delete).
People shouldn't be getting "500 errors" if they fill out a form. Useful validation messages.
Database transactions for complex DB updates so that if SQL statement 2 out of 3 fails in an API call, you don't leave a resource in an unwanted state.
Learn more SQL than your ORM provides, and fully learn your ORM. Good ORMs allow you to make complex composable SQL queries and also allow you to mix in RAW SQL. Learn about indexes and EXPLAIN for performance.
Use integer IDs for internal references. If you want, hide them in API calls (expose a UUID instead) so that outside observers can't figure out how many products or users are in your DB, and makes guessing URLs for new items or other users items harder. UUIDs are also valid if your DB will be sharded across regions. If you're a B2B (like slack), you might want to isolate each customer to their own DB. Enterprise customers should quit you if another customer's data shows up, because that means another customer is probably seeing their data.
Always use prepared statements in SQL (see SQL Injection). Don't use database users with full privileges (why would the database user your main application runtime uses be allowed to drop any table, be an owner of the DB, or be an admin user?). Use a separate DB user account for altering tables and administration.
Don't trust user input. if you accept HTML, someone will try to put JavaScript in it. If you don't accept HTML, someone will still try to put JavaScript in it just in case you don't sanitize it. If data input by one user is shown to another user, make sure it's escaped so that you don't compromise your user.
Don't let users upload files directly to your server. Always upload them to something like AWS S3.
Anyone can upload a *.jpg with the filename ending in *.png. A *.exe could be named *.jpg. An image file can have a password protected *.zip file full of pirated content embedded while still being a valid image file. Your safest bet is to store the uploaded images privately, use a serverless function (aka AWS lambda, aka not your real application server) to make filesize-optimized thumbnails, and only show those thumbnails to public end-users.