Angry Data Engineering noises from me I adore Signal, I contribte something like £15 a month to its operating costs. I absolutely love how simple and easy it is to use. Nothing feels gamified, I...
Most tech companies collect and create as much data as they can. They build large data warehouses, and then later invent new terms like “data lake” when their unquenchable thirst for more of your private information can no longer fit within the confines of a single warehouse.
Angry Data Engineering noises from me
I adore Signal, I contribte something like £15 a month to its operating costs. I absolutely love how simple and easy it is to use. Nothing feels gamified, I can turn off bits that I don't like (The stories for example), I can throw people into chats quickly. It feels professional, grown up and has an air of "Look, you want to send secure messages... here we are." Versus something like Whatsapp that feels like it's designed for teenagers by teenagers with 'features' overflowing in every direction.
That and Whatsapp is about as secure as a window when I've got a spark plug.
We estimate that by 2025, Signal will require approximately $50 million dollars a year to operate
...
Current Infrastructure Costs (as of November 2023): Approximately $14 million dollars per year.
...
To sustain our ongoing development efforts, about half of Signal’s overall operating budget goes towards recruiting, compensating, and retaining the people who build and care for Signal. When benefits, HR services, taxes, recruiting, and salaries are included, this translates to around $19 million dollars per year.
So less than $2/year per user. It's a large sum of money, but not one I would think insurmountable with a Wikipedia-style donation drive. It's an interesting look at the breakdown though. I get...
So less than $2/year per user. It's a large sum of money, but not one I would think insurmountable with a Wikipedia-style donation drive.
It's an interesting look at the breakdown though.
I get why they do server-side queuing, but I feel like an app like Signal would fare better by simply awaiting both devices to be online at the same time and have them exchange data directly.
The somewhat-recently posted Veilid seems a potential contender for allowing this sort of exchange in a more-secure and potentially-cheaper manner.
They're also using cloud providers.... and cloud services can be really expensive, especially storage. They could save tremendously with colocation. Like, first random provider I found. Under $2,300 USD a month for a full rack with a boatload of IPs and bandwidth. That's 46U of servers. You could easily throw $3 million dollars of hardware in it. Amortize that hardware over 5 years of vendor support, and you've got a massive processing and storage datacenter for less than $650,000 a year. Like, $3 million in hardware can easily host 1000's of VMs.
I get this is some random napkin math so I'm missing a bunch. You'd obviously want multiple providers in different locations. Backtracking their storage costs, guesstimating S3 Standard, that's about 56 TB of storage a month. Assuming you want some great redundancy on that storage if buying for yourself, say three copies on different RAID 10 of 3.8 TB SSDs, that's on the order of $150,000 of hard drives, again amortized over 5 years is $30,000 annually. And that's probably overkill given this data is intended to be ephemeral. You could probably ditch traditional RAID and just have giant balls of disposable disks and insure message has 4 copies across all your servers.
Crossing out my entire post since I screwed up the bandwidth calculations.
Crossing out my entire post since I screwed up the bandwidth calculations.
From a bandwidth perspective on voice and video calls alone:
At current traffic levels, the amount of outbound bandwidth that is required to support Signal voice and video calls is around 20 petabytes per year (that’s 20 million gigabytes) which costs around $1.7 million dollars per year in bandwidth fees just for calling, and that figure doesn’t include the development costs associated with hiring experienced engineers to maintain our calling software, or the cost of the necessary server infrastructure to support those calls
20 petabytes ÷ 365 ÷ 24 ÷ 60 ÷ 60 is about 634Gb/sec, so they would need at least 2TB in uplink for redundancy and bursting at the bare minimum. Between peering costs, dark/lit fiber between DCs, high bandwidth (400GbE) DWDM optics, OTN hardware, power/UPSes, smart hands, hosting things themselves would probably cost at least an order of magnitude more for at least the first year, and probably every subsequent year. For example, pulling this somewhat out of my ass, but an Adva FSP-3000 with Mux, ROADM, service channel line card, other line cards, and optics would be over a million dollars by itself, and you'd need something like this in every data center you want to host stuff in.
Source: started my IT career at an outsourced NOC and got to play with some seriously cool OTN gear.
Weirdly, wolfram alpha converted 20 petabytes/year to about 5 gigabits/second. I think your calculation winds up at ~634 Mb, not 634 Gb? (I lean on wolfram a lot for unit conversions; it’s nice to...
Weirdly, wolfram alpha converted 20 petabytes/year to about 5 gigabits/second. I think your calculation winds up at ~634 Mb, not 634 Gb?
(I lean on wolfram a lot for unit conversions; it’s nice to have a second pair of eyes to sanity check stuff)
They used bits instead of bytes in the answer without including it in the conversion. It ends up at ~634 MBps, which is roughly 5000 Mbps or 5 Gbps. 20 [petabytes/year] ÷ 365 [days/year] ÷ 24...
They used bits instead of bytes in the answer without including it in the conversion. It ends up at ~634 MBps, which is roughly 5000 Mbps or 5 Gbps.
Yep, just like that! Although I suppose my point was moreso that @vord’s point still stood — one could easily handle that much traffic with commodity hosting solutions. @th0mcat was right in that...
Yep, just like that! Although I suppose my point was moreso that @vord’s point still stood — one could easily handle that much traffic with commodity hosting solutions. @th0mcat was right in that transferring 1000x data would be dramatically more costly; however, thankfully it doesn’t seem like Signal is in that position.
That's fair, I didn't give too much consideration or investigation to bandwidth needs. The one I linked was 10GbE, with what seemed ability to burst up. I was kind of assuming with this colo the...
That's fair, I didn't give too much consideration or investigation to bandwidth needs. The one I linked was 10GbE, with what seemed ability to burst up. I was kind of assuming with this colo the datacenter itself would handle most of those needs (hence their ability to charge extra for you bursting). I have 0 doubts that propping up their own datacenter would likely be a huge investment compared to just piggybacking off of dozens of others.
For their needs, it'd probably make more sense to maintain more, smaller 1/4 rack or less presence across many different centers.
I should be clear: I don't think they could cut server costs to under 5 million annually. But I'd bet they could get it under 10. They almost certainly would need some degree of cloud presence no matter what to handle burst, particularly for the video chat.
If I send you a message, I would like to be able to assume that you will eventually get it, even if my wifi or battery disappears without warning. Also, 99.9% of my communication on signal is...
I get why they do server-side queuing, but I feel like an app like Signal would fare better by simply awaiting both devices to be online at the same time and have them exchange data directly.
If I send you a message, I would like to be able to assume that you will eventually get it, even if my wifi or battery disappears without warning. Also, 99.9% of my communication on signal is group chats so I don’t know how that would be expected to work. It would be awful to have different people seeing radically different views of the group chat depending on whose devices are online at the same time.
I mean, it's not that hard to wait 1s for an ack, and if it fails from a direct send, resend to server. Messages get lost all the time for one reason or another. If your connection is so spotty...
I mean, it's not that hard to wait 1s for an ack, and if it fails from a direct send, resend to server.
Messages get lost all the time for one reason or another. If your connection is so spotty that you need to have the server as an intermediary, odds are you'll lose the messgae before it hits the server anyhow.
Actually now that I think about it, setting aside the group chat issues (which I think would make direct sends non-viable in most cases), why on Earth would I want everyone on my signal contact...
Actually now that I think about it, setting aside the group chat issues (which I think would make direct sends non-viable in most cases), why on Earth would I want everyone on my signal contact list to know my IP? That sounds absolutely horrible for privacy.
That's why I mentioned Veilid, or some other onion routing system like Tor. Added benefit reduces snooping from 3rd parties. Chat programs seem the ideal thing to use this for. I mean, I generally...
That's why I mentioned Veilid, or some other onion routing system like Tor. Added benefit reduces snooping from 3rd parties. Chat programs seem the ideal thing to use this for.
I mean, I generally don't have contacts that I wouldn't trust with my IP, but YMMV.
I wonder if Oxide Computer servers would be interesting to Signal. They seem too expensive for many non-profits, but Signal seems big enough to use Oxide servers to run their own cloud. Locating...
I wonder if Oxide Computer servers would be interesting to Signal. They seem too expensive for many non-profits, but Signal seems big enough to use Oxide servers to run their own cloud. Locating your own cloud in major metropolises seems like a plausible approach, even if you want to continue to use other people's clouds in other situations.
I mean, that really is the thing about other clouds...most of them are metro-adjacent anyway. I'd guess a good bit of Signal's bandwidth doesn't cross too far geographical distance. There will be...
I mean, that really is the thing about other clouds...most of them are metro-adjacent anyway.
I'd guess a good bit of Signal's bandwidth doesn't cross too far geographical distance. There will be exceptions of course, but I would anticipate the majority of users' data mostly staying in the same timezone, if not the same 50 miles.
It's not that easy. Wikipedia has roughly 4.5 billion yearly users, and donation revenue of $165 million. That's about $0.035 per user, and they still have to plead for donations for a while every...
So less than $2/year per user. It's a large sum of money, but not one I would think insurmountable with a Wikipedia-style donation drive.
It's not that easy. Wikipedia has roughly 4.5 billion yearly users, and donation revenue of $165 million. That's about $0.035 per user, and they still have to plead for donations for a while every year. A quick Google search seems to indicate that they received about 13 million donations, which would be an average donation of $12.69.
So comparing that to Signal's $1.25/user (40M users and $50M of expenses), Signal costs way more per user. Actually, I'm not sure where your $2/user came from. If you applied the same percentage of donations/user (1 donation for every 346 users), then only 115k users would actually end up donating, and they would each need to donate $434. If the average donor were willing to donate the same amount to Signal as they do to Wikipedia ($12), you would need 10% participation which is astronomical.
I would contend that Signal's userbase would probably have a better conversion rate, given they have a bit more skin in the game, and their engagement with it is significantly higher than with...
I would contend that Signal's userbase would probably have a better conversion rate, given they have a bit more skin in the game, and their engagement with it is significantly higher than with wikipedia.
Discord is pulling in significant cash with Nitro, there's little stopping Signal from doing the same.
Yeah, that's a good point - active signal users who are using the app are going to be more invested in it than your standard Wikipedia user. A much higher percentage of Wikipedia's editors donate...
Yeah, that's a good point - active signal users who are using the app are going to be more invested in it than your standard Wikipedia user. A much higher percentage of Wikipedia's editors donate compared to the standard user.
Signal expects $50M of expenses in 2025, but this year they are at around $33M. The 40M user number is also an outdated one for 2021, But even so they are at about $0.85/user. A $5 donation would...
Signal expects $50M of expenses in 2025, but this year they are at around $14M $33M. The 40M user number is also an outdated one for 2021, presumably they have now more than 100M since their growth rate doesn't seem to have slowed down if they expect costs to triple in two years. But even so they are at about $0.35/user $0.85/user. A $5 donation would cover a user for a decade five years. In the worst case they could introduce a paid subscription for a low dollar amount, like WhatsApp initially had.
Edit: Misread the blog post. Kinda makes the whole comment a bit redundant.
$14M is just their current infra costs. Their labor costs are an additional $19M. Assuming those are the only 2 costs (they're not), they'd be at $33M. Infra costs will increase with user...
$14M is just their current infra costs. Their labor costs are an additional $19M. Assuming those are the only 2 costs (they're not), they'd be at $33M. Infra costs will increase with user increases, nearly linearly (but not exactly). Labor costs could potentially increase as well, but less linearly. So I guess that's where they get $50M based on their projected growth. I'm curious, however, how many active users they have, not just accounts. I am interested in it, but don't actually know anyone who uses it so I haven't bothered. It's possible that many people have tried it but aren't active (and therefore not invested) because the rest of their circle doesn't use it.
Its a killer for me as well. Most of the people i know are average users and dont think about security and privacy stuff - they want something that just works and something many others actually...
Its a killer for me as well. Most of the people i know are average users and dont think about security and privacy stuff - they want something that just works and something many others actually use. Which isnt signal unfortunately.
To be clear, by “this isn’t Signal”, you mean the latter part, right? For my use cases Signal has been on feature parity to WhatsApp (at least the WhatsApp I knew when I last used it, c. 2021) for...
they want something that just works and something many others actually use. Which isnt signal unfortunately.
To be clear, by “this isn’t Signal”, you mean the latter part, right?
For my use cases Signal has been on feature parity to WhatsApp (at least the WhatsApp I knew when I last used it, c. 2021) for quite a long time.
I even recently switched phones, and given my old phone was still working and nearby, even the device transfer “just worked” flawlessly.
Many use the same ones as their friends right?. Doesn't look like any of my friends use Signal.. and getting them to swap is... well i think others who have tried will know how i feel.
Many use the same ones as their friends right?. Doesn't look like any of my friends use Signal.. and getting them to swap is... well i think others who have tried will know how i feel.
I have tried, and it worked out surprisingly well. There are only a handful (like, literally countable on one, maybe two hands) of contacts I cannot reach on Signal nowadays, in my...
I have tried, and it worked out surprisingly well. There are only a handful (like, literally countable on one, maybe two hands) of contacts I cannot reach on Signal nowadays, in my somewhat-but-not-overly techy circle.
Compare that to a year or two ago, of course you’d feel hopeless. Barely anyone in my friends or family had it. It did help a lot that a relative of mine, independently from me deleting my WhatsApp account, suggested we move the ~30 person family group chat over to Signal. For context, we all grew up and mostly still live in central Europe. Same for most of my friends, although I believe most of them still use WhatsApp mainly.
You could always await the other device, or do a hybrid approach. You could observe the users connection habbits (how often do they have a connection to the internet). If they don't have a...
You could always await the other device, or do a hybrid approach. You could observe the users connection habbits (how often do they have a connection to the internet). If they don't have a connection often, use the servers. If they do, use the awaiting. Should probably cut a good part of the storage costs. Of course, all this observing of user habits should be done locally but this is totally feasable.
Angry Data Engineering noises from me
I adore Signal, I contribte something like £15 a month to its operating costs. I absolutely love how simple and easy it is to use. Nothing feels gamified, I can turn off bits that I don't like (The stories for example), I can throw people into chats quickly. It feels professional, grown up and has an air of "Look, you want to send secure messages... here we are." Versus something like Whatsapp that feels like it's designed for teenagers by teenagers with 'features' overflowing in every direction.
That and Whatsapp is about as secure as a window when I've got a spark plug.
From the article:
...
...
So less than $2/year per user. It's a large sum of money, but not one I would think insurmountable with a Wikipedia-style donation drive.
It's an interesting look at the breakdown though.
I get why they do server-side queuing, but I feel like an app like Signal would fare better by simply awaiting both devices to be online at the same time and have them exchange data directly.
The somewhat-recently posted Veilid seems a potential contender for allowing this sort of exchange in a more-secure and potentially-cheaper manner.
They're also using cloud providers.... and cloud services can be really expensive, especially storage. They could save tremendously with colocation. Like, first random provider I found. Under $2,300 USD a month for a full rack with a boatload of IPs and bandwidth. That's 46U of servers. You could easily throw $3 million dollars of hardware in it. Amortize that hardware over 5 years of vendor support, and you've got a massive processing and storage datacenter for less than $650,000 a year. Like, $3 million in hardware can easily host 1000's of VMs.
I get this is some random napkin math so I'm missing a bunch. You'd obviously want multiple providers in different locations. Backtracking their storage costs, guesstimating S3 Standard, that's about 56 TB of storage a month. Assuming you want some great redundancy on that storage if buying for yourself, say three copies on different RAID 10 of 3.8 TB SSDs, that's on the order of $150,000 of hard drives, again amortized over 5 years is $30,000 annually. And that's probably overkill given this data is intended to be ephemeral. You could probably ditch traditional RAID and just have giant balls of disposable disks and insure message has 4 copies across all your servers.
Crossing out my entire post since I screwed up the bandwidth calculations.
From a bandwidth perspective on voice and video calls alone:20 petabytes ÷ 365 ÷ 24 ÷ 60 ÷ 60 is about 634Gb/sec, so they would need at least 2TB in uplink for redundancy and bursting at the bare minimum. Between peering costs, dark/lit fiber between DCs, high bandwidth (400GbE) DWDM optics, OTN hardware, power/UPSes, smart hands, hosting things themselves would probably cost at least an order of magnitude more for at least the first year, and probably every subsequent year. For example, pulling this somewhat out of my ass, but an Adva FSP-3000 with Mux, ROADM, service channel line card, other line cards, and optics would be over a million dollars by itself, and you'd need something like this in every data center you want to host stuff in.Source: started my IT career at an outsourced NOC and got to play with some seriously cool OTN gear.Weirdly, wolfram alpha converted 20 petabytes/year to about 5 gigabits/second. I think your calculation winds up at ~634 Mb, not 634 Gb?
(I lean on wolfram a lot for unit conversions; it’s nice to have a second pair of eyes to sanity check stuff)
They used bits instead of bytes in the answer without including it in the conversion. It ends up at ~634 MBps, which is roughly 5000 Mbps or 5 Gbps.
20 [petabytes/year] ÷ 365 [days/year] ÷ 24 [hours/day] ÷ 60 [minutes/hour] ÷ 60 [seconds/minute] * 10^9 [megabytes/petabyte] ≈ 634 [megabytes/second]
634 [megabytes/second] * 8 [megabits/megabyte] = 5072 [megabits/second]
5072 [megabits/second] ÷ 10^3 [megabits/gigabit] = 5.072 [gigabits/second] ≈ 5 [gigabits/second]
Yep, just like that! Although I suppose my point was moreso that @vord’s point still stood — one could easily handle that much traffic with commodity hosting solutions. @th0mcat was right in that transferring 1000x data would be dramatically more costly; however, thankfully it doesn’t seem like Signal is in that position.
Whoops! My bad!
Also thank you for pointing out WolframAlpha does conversions like that!
That's fair, I didn't give too much consideration or investigation to bandwidth needs. The one I linked was 10GbE, with what seemed ability to burst up. I was kind of assuming with this colo the datacenter itself would handle most of those needs (hence their ability to charge extra for you bursting). I have 0 doubts that propping up their own datacenter would likely be a huge investment compared to just piggybacking off of dozens of others.
For their needs, it'd probably make more sense to maintain more, smaller 1/4 rack or less presence across many different centers.
I should be clear: I don't think they could cut server costs to under 5 million annually. But I'd bet they could get it under 10. They almost certainly would need some degree of cloud presence no matter what to handle burst, particularly for the video chat.
As /u/kacey said,I badly screwed up my bandwidth calculations, so my entire post is worthless. Whoops, sorry about that!
Happens to us all.
<insert Office Space meme about missing a decimal>If I send you a message, I would like to be able to assume that you will eventually get it, even if my wifi or battery disappears without warning. Also, 99.9% of my communication on signal is group chats so I don’t know how that would be expected to work. It would be awful to have different people seeing radically different views of the group chat depending on whose devices are online at the same time.
I mean, it's not that hard to wait 1s for an ack, and if it fails from a direct send, resend to server.
Messages get lost all the time for one reason or another. If your connection is so spotty that you need to have the server as an intermediary, odds are you'll lose the messgae before it hits the server anyhow.
Actually now that I think about it, setting aside the group chat issues (which I think would make direct sends non-viable in most cases), why on Earth would I want everyone on my signal contact list to know my IP? That sounds absolutely horrible for privacy.
That's why I mentioned Veilid, or some other onion routing system like Tor. Added benefit reduces snooping from 3rd parties. Chat programs seem the ideal thing to use this for.
I mean, I generally don't have contacts that I wouldn't trust with my IP, but YMMV.
I wonder if Oxide Computer servers would be interesting to Signal. They seem too expensive for many non-profits, but Signal seems big enough to use Oxide servers to run their own cloud. Locating your own cloud in major metropolises seems like a plausible approach, even if you want to continue to use other people's clouds in other situations.
I mean, that really is the thing about other clouds...most of them are metro-adjacent anyway.
I'd guess a good bit of Signal's bandwidth doesn't cross too far geographical distance. There will be exceptions of course, but I would anticipate the majority of users' data mostly staying in the same timezone, if not the same 50 miles.
It's not that easy. Wikipedia has roughly 4.5 billion yearly users, and donation revenue of $165 million. That's about $0.035 per user, and they still have to plead for donations for a while every year. A quick Google search seems to indicate that they received about 13 million donations, which would be an average donation of $12.69.
So comparing that to Signal's $1.25/user (40M users and $50M of expenses), Signal costs way more per user. Actually, I'm not sure where your $2/user came from. If you applied the same percentage of donations/user (1 donation for every 346 users), then only 115k users would actually end up donating, and they would each need to donate $434. If the average donor were willing to donate the same amount to Signal as they do to Wikipedia ($12), you would need 10% participation which is astronomical.
I would contend that Signal's userbase would probably have a better conversion rate, given they have a bit more skin in the game, and their engagement with it is significantly higher than with wikipedia.
Discord is pulling in significant cash with Nitro, there's little stopping Signal from doing the same.
Yeah, that's a good point - active signal users who are using the app are going to be more invested in it than your standard Wikipedia user. A much higher percentage of Wikipedia's editors donate compared to the standard user.
Signal expects $50M of expenses in 2025, but this year they are at around
$14M$33M. The 40M user number is also an outdated one for 2021,presumably they have now more than 100M since their growth rate doesn't seem to have slowed down if they expect costs to triple in two years.But even so they are at about$0.35/user$0.85/user. A $5 donation would cover a user fora decadefive years. In the worst case they could introduce a paid subscription for a low dollar amount, like WhatsApp initially had.Edit: Misread the blog post. Kinda makes the whole comment a bit redundant.
$14M is just their current infra costs. Their labor costs are an additional $19M. Assuming those are the only 2 costs (they're not), they'd be at $33M. Infra costs will increase with user increases, nearly linearly (but not exactly). Labor costs could potentially increase as well, but less linearly. So I guess that's where they get $50M based on their projected growth. I'm curious, however, how many active users they have, not just accounts. I am interested in it, but don't actually know anyone who uses it so I haven't bothered. It's possible that many people have tried it but aren't active (and therefore not invested) because the rest of their circle doesn't use it.
That includes me. My circle didn't want / couldn't be bothered to move off WhatsApp
Its a killer for me as well. Most of the people i know are average users and dont think about security and privacy stuff - they want something that just works and something many others actually use. Which isnt signal unfortunately.
To be clear, by “this isn’t Signal”, you mean the latter part, right?
For my use cases Signal has been on feature parity to WhatsApp (at least the WhatsApp I knew when I last used it, c. 2021) for quite a long time.
I even recently switched phones, and given my old phone was still working and nearby, even the device transfer “just worked” flawlessly.
Many use the same ones as their friends right?. Doesn't look like any of my friends use Signal.. and getting them to swap is... well i think others who have tried will know how i feel.
I have tried, and it worked out surprisingly well. There are only a handful (like, literally countable on one, maybe two hands) of contacts I cannot reach on Signal nowadays, in my somewhat-but-not-overly techy circle.
Compare that to a year or two ago, of course you’d feel hopeless. Barely anyone in my friends or family had it. It did help a lot that a relative of mine, independently from me deleting my WhatsApp account, suggested we move the ~30 person family group chat over to Signal. For context, we all grew up and mostly still live in central Europe. Same for most of my friends, although I believe most of them still use WhatsApp mainly.
My husband and I use signal between ourselves. Sometimes that means messaging internationally, but more frequently quite local to each other.
You could always await the other device, or do a hybrid approach. You could observe the users connection habbits (how often do they have a connection to the internet). If they don't have a connection often, use the servers. If they do, use the awaiting. Should probably cut a good part of the storage costs. Of course, all this observing of user habits should be done locally but this is totally feasable.
Or even like "oh these users are having a back and forth, do direct messages until there's a failed delivery, then resend to server"