Question about REST APIS and encryption
So I am finally starting the process of designing a personal website that can help manage and organize my finances for me.
So obviously, the security of such data is paramount and for the heck of it, I want to design a webapp where it doesn't operate by the rules of "trust me bro" even though I will be the one designing it and most likely will be the only one ever to use it. Just want that experience of proper encryption setup.
Also, even if I am the one operating it, I'd like to set it up so that even if the database is compromised, none of my information is.
skip to bottom if you want to just see my 2 question
Did some reading online, between reading when StandardNotes does encryption as well as how it does it and some basic reading into encryption
- https://www.baeldung.com/java-aes-encryption-decryption
- https://security.stackexchange.com/questions/14068/why-most-people-use-256-bit-encryption-instead-of-128-bit
and the importance of not having a local unencrypted database like Joplin does
So all that got me curious how Google encrypts the user data it has and would up reading
- https://security.stackexchange.com/questions/269341/how-does-googles-on-device-encryption-work
- https://developers.google.com/workspace/cse/guides/encrypt-and-decrypt-data
and the basic take-aways seem to be:
- utilize encryption on a field before storing it in a database so that even if the machine gets compromised, the data won't be
- if you want to go even further, take the approach of StandardNotes, where it seems even the web server itself never touched unencrypted data it seems? Looks like all the encrypting and decrypting happens locally and only encrypted data is sent to the server
-
But that got me curious. It can't be argued that Google is not secure. they have the best minds working there to ensure just that. and yet its also well known that their respect for user privacy is non-existent. Which means that they've made sure to protect the data [email, google searches, google docs, google maps history] from hackers but they can themselves decrypt at least some user data for the purpose of data collection and selling ads.
But if Google can decrypt the data and that implies they store the keys on a server from what I can tell from my reading, how it is protected if someone malicious gains access to the database? If that person got access to the database and the keys that Google uses to decrypt the data, wouldn't that compromise the data? -
if I decide to design my webapp so that all the encrypting and decrypting happens locally, that means that if I were to decide to create a REST API for my application, that would also have to be taking in data in encrypted format, no? Cause if that takes it in plaintext, that means that my webserver would have to be responsible for encryption, which it needs the keys to do that with and if it can encrypt with keys it has access to, then it can decrypt too, no? or are websites that deal with encrypted databases and have REST APIs that can take in plain text information generally coded to be using asymmetric encryption? meaning its different keys being used for encryption and decryption? Or is API Token the key in an encrypted format? or have I misunderstood the whole thing?
Yep. They just make sure that doesn't happen.
But to be more clear, what you were talking before is called End-to-End encryption, where the client encrypts and decrypts the data. E2EE, on a design level, really changes how the application can work, however. With E2EE, the server must be little more than a dumb data storage location. After all, all it has is encrypted blobs; you can't do server side search, or any other kind of server processing. You can't allow another user to have the data unless you give that user your personal key.
Whether or not the company values privacy aside, E2EE only works for some kinds of applications.
For the second question, you're getting more into threat models. If an attacker has complete access to the server, yes, you're just screwed. You're always screwed (unless it's E2EE). The server always has to have enough data to decrypt its own data, after all. You are defending against, say, a bad database configuration that unknowingly allows attackers to get access to the data; they've only comprised the database in this case, so any encrypted data is still encrypted, since they don't have access to the secret.
All that is to say, you have a choice: you can make this webapp E2EE, which would mean that essentially 100% of the business logic needs to be done on the client, and the server is little more than a repository to store encrypted blobs for users.
Or, you can have logic in the server, but the data must come in unencrypted. You can encrypt it further if you want, but honestly I don't think it's particularly worth the effort for a simple server written just for your own personal use. Just harden the server it's hosted on.
The transport layer will automatically have encryption from HTTPS.
Just as a fun fyi, in case you are unaware, homomorphic encryption allows for processing of encrypted data. I've never gotten around to doing anything with it, but I do find the concept super fascinating! In the case for the OP it is way beyond what is needed (simple approach of just hardening the server, encrypting the data at rest and using https should be good enough), but if I had all the time in the world I'd be keen to use it.
I do like that idea as I wanted this project to give me more FE and BE exposure but I am not a pro and have a fear that I will miss something in my attempts to harden the server. I am not exactly an expert at security hardening and not sure what are even the basics and I doubt a 20 min YouTube tutorial can cover it. Bit of an exaggeration but I think that demonstrates my anxiety with that.
To be honest, you're really overthinking things. Again, the thread model is the most important part of security. Just like in real life, there's no point in reinforcing your door but leaving your lock the same.
No one cares about your financial data. The only hackers you'll encounter running your server for your own personal use are dragnet operators who loop through IP blocks looking for basic misconfigurations, like database servers on public ports with admin/admin accounts. And even those hackers do not care about your financial data - they're looking for passwords and credit cards and other things that can be packaged and sold.
Just do it. It's fine.
As an aside, homomorphic encryption is way above your level of expertise from everything else on the thread. It's a very young field of active research. I wouldn't bother.
What you have in your favor where security is concerned is that you aren't a target (security by obscurity).
You should still do the basics like running a firewall, brute force protection and maybe changing some ports to be outside the standard range. Maybe disallow root ssh access or use a certificate rather than password. But really you can learn about that stuff as you go, there are lots of good guides out there.
It's worth it to spend a little time learning to harden your server so that you don't have to write the whole application to run in browser. But don't stress too much, talented hackers aren't wasting time on random servers so mostly all you have to protect against are low effort bots hitting common ports and endpoints and common software (i.e. Worpress). The most important thing is to stay on top of patches.
something I remembered, until Tailscale Funnel is publicly available, I am stuck with Cloudflare Tunnel and they handle the packet decryption. So if I want my data to be protected from Cloudflare (just cause I don't like that they technically can see all my data), I have to encrypt myself before sending it to my server.
So, if I want to perform server-side validation, I'd have to first decrypt to do that anyways. Or use Homomorphic encryption as @archevel suggested (thanks for the link btw,I didn't know about that).
So maybe I will look into that.
Actually, the more I think about it. Given that my machine is publicly accessible only via Cloudfalre Tunnel, I doubt I even need to do any hardening. I gotta imagine any local configurations I make are meaningless compared to what Cloudflare does to secure its connections.
that's actually surprising given the stories I have come across of hackers who hack into a variety things, either for clout or money or just messing around.
hackers from Russia or China really can't find a flaw in Google's infra that they can use to destabilize a major tech company from a hostile-ish nation? interesting.
Consider that Google has host intrusion detection. And each zone has fully independent infra. Each storage node has its own encryption keys. They control everything down to the firmware on the hard drives. They know Intel’s chips better than Intel does. Russia’s certainly gotten in before - and at Google’s scale you need to worry about hackers becoming employees. But the blast radius is well controlled and state actors don’t want to make a big explosion anyway.
I mean, I'd call this a big explosion personally: Russia Hacked U.S. Power Grid — So What Will The Trump Administration Do About It?
The Google immune system will actually be able to respond to attacks from Russia or China. That’s why infiltrators would want to lay low.
Utility companies not so much.
Directly hacking into a server is honestly not a common attack vector. Secure server access is a somewhat solved problem, and it can mostly only be bypassed if there's some sort of RCE exploit found on the server. For an example, see the recent 4chan hack, where using an outdated vulnerable pdf parsing library gave hackers direct access to the entire server.
What my company does is that theres just some objects in the payload encrypted and then theres other metadata that we need thats not encrypted
Does it have to be a publicly-accessible web app? Wondering why you don't keep everything local, such as by serving a website from your own computer on your local LAN.
right now? probably not. but I think I just want to challenge myself to be able to make it publicly accessible and secure.
Also, a part of this is me learning Angular for the FE of the website so I hope that one day, I can use those FE skills to make a simple react-native android app to go along with the website. Not a definite but want to keep my options open I think.
I’d be keeping it behind tailscale or a Cloudflare tunnel if it were me - something that means it’s accessible wherever you need it, but not actually on the wider internet at all without first clearing the security to create a private network connection.
A big part of good security is reducing the attack surface, so if you can have that extra layer (i.e. you’re building for a small enough audience that distributing tailscale keys to each client is feasible) you pretty much always should.
The very important thing is not to then rely on that network security layer! Effectively every piece of software in existence - even incredibly robust and battle tested encryption libraries - has at least a few CVEs over the years. So treat everything as if it can and will be breached, build as if you’re exposing the app to the wider internet (because you may well realise that a misconfiguration or vulnerability does exactly that), but wall it off anyway in the hope that most potential attackers don’t even get the chance to try.
Why not build the frontend in React if you plan to use React native for the mobile app?
I have 2 competing desires. be able to maybe one day make a simple mobile app but also get exposure to TypeScript via this project.
Not to scare you away, but I don't recommend Angular for learning front end. It's strongly opinionated and makes things convoluted by wanting to handle everything by itself. It has inferior tooling and a worse debugging experience than other frameworks too.
I recommend React or Vue (esp the former if you want to get into React Native). They support TypeScript out of the box too.
That said, if you want to just give it a try then feel free! It's just not the best tool for the job unless you're a fierce believer in OOP.
I was just made aware of that. thank God I made this post before I started actually writing Angular code.
One doesn't rule out the other.
well how about that? I was led to believe that Angular is the only framework that supports TypeScript and for some reason, it didn't occur to me to investigate the veracity of that statement.
At this point, it's very uncommon to find a well maintained JavaScript library or framework that doesn't have out-of-the-box Typescript support!
Well, I understand the desire to learn and to challenge oneself, but, personally, I think it is not worth the risk in this case (considering the sensitivity of the data you plan to store). Being "secure enough" is a reasonable challenge, but being so secure that only the very top malicious actors in the world can break it is a much higher standard to meet, one which the average software developer is not equipped to meet. I think you could learn and exercise some skills well enough by keeping it local.
Some points for consideration:
Anything with a port exposed to the open internet is always going to be risky. At a bare minimum you'll want to have a robust reverse proxy or Cloudflare tunnel configuration with MFA, and you'll really need to stay on top of security advisories and maintenance/updates for your unmanaged infra.
Encryption is only as strong as its implementation, and only truly effective when at rest. End-to-end/client-side encryption is the most secure implementation, but any cybersecurity professional will advise against trying to roll your own implementation since there's way too many variables at play that can end up as vulnerabilities. Use something well-established and battle-hardened.
Public-facing web apps are a bit of a security nightmare at the best of times. Since you're dealing with web stack there are many points of failure and vulnerability to contend with (in addition to your standard mitigations for SQL injection, XSS et al).
Consider all your points of ingress and egress, and how they're going to be secured. SSH hardening, VPN configuration, firewalling, how you'll ensure your DB is only accessible from your web app - there's a lot to account for. As I said before, choose battle-hardened tech. All hail SSH.
TLDR: A proper Cloudflare stack implementation will eliminate pretty much all of this headache, potentially for the pretty price of $0 per month if you do it right.
What do you mean by that? I was just planning on utilizing well-known encryption libraries to handle that for me or is that considered "my own implementation"?
There's two types of encryption library.
First is the algorithm libraries. I think many modern ones try to call these one "subtle" (as in Go's crypto/subtle package, or WebCrypto SubtleCrypto package) or even "hazmat" (in Python Cryptography). Most people, when they think about encryption they think "Oh I'll need to use AES-GCM" and find a AES-GCM. This is not advisable as in some cases you might need additional things more than AES-GCM, and perhaps your internet advice is not up to date. Which is why it is called subtle - there will be subtle implementation details you need to follow to correctly use them.
Don't try to be creative using these types of library. For example, I've seen people think "I'll need to search encrypted data, so I'll just set IV to a hardcoded value" and do that with AES-GCM, without realizing that nonce reuse in AES-GCM results in a catastrophic breakdown of the encryption.
The second type I'd call a use case library. This is the one that do "I want to encrypt data to Bob" and provide an
encrypt()
function for that use case. You don't need to know what it does - but you still need to read all the instructions and caveats (for example it may not be secure after you reuse it for a zillion times).There are two libraries I'd recommend
Remember, always read the docs to see if it perfectly match up your use case, and any limitations. If you're using Tink, the algorithm in use also imposes its own limitations.
That comes at the cost of absolutely trusting cloudflare and all their partners.
The same could be said for Azure/AWS/GCP or any other service provider you care to name. I consider them to be as much of a vulnerability vector as any other point in the chain of delivery, and any data that I care about gets encrypted before it touches someone else's infra.
True; good approach. Cloudflare gets much bigger proportion of traffic though - but of course other infra providers get access to the data at rest. But In my opinion cloudflare breaks the basic understanding of basic security practices by requring you to break your secure communication channels...
That may be just my own distaste...
Sounds like a fun project! You could take a look at how Actual Budget does E2EE for inspiration, but I haven't kept up with the app so unsure where it's at these days. I remember having it enabled a year or two back.
Good Lord! I both feel amazed by what they have setup and also weirdly feel disappointed I am not embarking on an original idea :sweat_smile:.
I think I figured no one else would want to invest time in an open-source e2ee encrypted budgeting software when banks have the budgeting software market cornered.
Nevertheless, thanks for the link
Are you sure you need encryption for your own app? Since you're the one hosting it, it doesn't really matter if the data will be stored in plaintext on the server. I understand that you're trying to make your app secure basically as a challenge, but threat modeling is still a thing. You don't need the same level of security Google has.
You basically just need proper user authentication to protect your data from being accessible from the outside. That doesn't require any encryption, only basic backend logic. There's a lot of different guides on rolling your own auth, or you can use a pre-made solution like Auth.js (assuming your backend will in JS as well - but there are solutions for almost any language)
For protecting your server, key-based ssh authentication and basic security practices like not opening a bunch of random ports should be enough. If you'll be writing a full-stack TypeScript app with something like Next.js, you can host it on a platform like Vercel, in which case you won't even have to handle security of the server.
Basically, as you're just hosting a personal app on a small server, your biggest threat are bots that try to brute-force ssh passwords to install things on your server (usually a botnet/crypto miner). Disabling ssh password auth protects against that.
oh I know, I was just curious how they do it. I have no delusions that my server will never be as secure as what google has nor is there a point to making it as secure. My website will never be as enticing to hackers as Google :P
Lots of excellent practical advice has been given to you so far in the thread, but I believe I am somewhat qualified to answer your theoretical questions more directly. I did my best to read through all the comments, so I apologize if I'm adding thoughts that have already been added. Not too long ago, I was working at Mastercard for a couple of years developing in-house key management software which involved a lot of different encryption methods and some pretty complex architecture. Developing that kind of system requires diving deep into the world of cryptography. Before answering your questions, I want to concur with some other advice in the thread: Doing cryptography yourself is a great exercise, but the closer you are to the cryptographic implementation layer, the more likely it is that you'll screw something up. In a production setting you would want an expert in the subject to verify your architectural decisions and implementation. Now that that's out of the way, some direct answers to your questions:
Google does not store their encryption keys (or at least their root encryption keys) in a database. Most likely, Google stores their encryption keys in Hardware Security Modules (HSMs). These are specially designed hardware devices for the specific purpose of securely generating and storing cryptographic keys. Without getting too technical, it is practically impossible for anyone (even most people at Google) to ever have access to these keys. Data encryption and decryption with those keys takes place inside the HSM, so the keys are never exposed "in the clear." So while Google at-large can decrypt the data that gets sent to its server, there is almost certainly a complicated web of cryptographic services within Google that ensure their encryption keys are never exposed.
End-to-end encryption I think by definition means that the server or any other actor between the ends never receives sensitive data in plaintext. That would defeat the purpose. So if you have a server in the middle, that server should never receive plaintext data that should otherwise be encrypted, or cryptographic keys that can be used to decrypt any sensitive information. There's a whole wealth of knowledge on the internet, and some in this thread, that describes in detail how these systems work that I will not belabor here. Asymmetric cryptography plays a role, but there is usually some hybrid of asymmetric and symmetric cryptography that is in place for systems like this. To answer more directly, if I were designing an E2EE API, the sensitive bits would be encrypted the whole way through until it reached a client.
All that being said, I worked pretty closely with some of this stuff for a couple of years which makes me more knowledgeable on the subject than most, but it still does not make me an expert. Cryptography is a vast and complex web of math and computer science. Given the demographic of the site, it's very likely there are some engineers here with more experience than me on this topic. To those more experienced folks, please correct any info here if misleading or incorrect.