Why not both, traditional ethics be damned? (Mostly jesting) Company had a net win either way. $50k one time to save > $200k. Dude was even nice enough to have it fail gracefully after the exploit...
Why not both, traditional ethics be damned? (Mostly jesting)
Company had a net win either way. $50k one time to save > $200k. Dude was even nice enough to have it fail gracefully after the exploit stopped working. Could have left them high and dry with a dead product.
Yeah I wondered about that! I had to price out some ML model costs the other day and unsurprisingly $100k/month is a lot, whichever way you slice it. The biggest instance you can get on AWS is 8...
Yeah I wondered about that! I had to price out some ML model costs the other day and unsurprisingly $100k/month is a lot, whichever way you slice it.
The biggest instance you can get on AWS is 8 A100 GPUs, 640GB VRAM, 1.1TB normal RAM, and that'll cost $29k/month at full price or $17k/month equivalent if you pay a year up front. Not that you should do the latter, because that's a $150k machine to buy outright and if you're using it 24/7 for more than six months it's far more economical to just have one in house. So let's say they're on AWS because it's mostly burst load and having hardware sitting idle most of the month would be a waste: if you want to get all your data in by the 28th and run it before the end of the month, $100k will get you 416 80GB A100s for two days.
Now, for context: 416 A100s is world class supercomputer territory. Not top ten, for sure, but the kind of thing a major university would be writing excited press releases about. You could run 80 instances of GPT3 simultaneously* - which may not sound huge, but remember it can respond in seconds on that kind of hardware, so you're talking about the ability to hold probably 1000+ fluent, human-level conversations simultaneously before you start seeing a noticeable delay. For 48 straight hours.
That power doesn't go quite as far if they're doing significant retraining every month rather than just inference, but it's still very respectable for something domain-specific that I'd expect to be a lot less complex than a generalised language model in reality.
On the other hand, huge as this kind of capacity actually is, we're also talking about $1.2M/year for a core business need. I've seen companies spend more than that on obviously ill-conceived ideas that get quietly abandoned after six months, so perhaps it's insignificant enough to them that it does only really "matter" in the context of a bullet point on the VP of Engineering's quarterly review.
It also makes me laugh that exploiting the free trials of other providers is exactly the kind of morally and legally dubious "hack" that Google or Amazon themselves would have done back in their startup days. Bezos is on record saying they would use a specific product code they knew would never ship to pad distributor orders and get around minimums at the expense of their suppliers and logistics partners, for example.
*I don't have GPT3 specs to hand, I don't actually know if they've even published them explicitly, but it's the one people are familiar with and a similar scale and architecture to BLOOM, which I do know the hardware requirements for.
I'm reasonably certain the story is bullshit; but honestly it doesn't matter because this sort of stuff does happen, soooo. Anyway, re AWS costs, I am not on board at all with your calculations....
I'm reasonably certain the story is bullshit; but honestly it doesn't matter because this sort of stuff does happen, soooo.
Anyway, re AWS costs, I am not on board at all with your calculations. 100k/mo is a very common spend on compute, especially if you're doing AI stuff. The type of company that can afford to spend like this, likely runs MANY machines in parallel.
Back at my old startup, I spent 12k/mo on compute on optimized lambdas at all, for a tiny userbase of maybe 300-400k. And another 20k on Redshift. This was all POST optimization and NOT using GPUs.
This number tells me they're either a small company that doesn't really know how to optimize, or they're a medium-ish company that REALLY knows how to optimize (unlikely in that story).
Haha yeah, very fair, I certainly have my doubts too but worst case it makes for an interesting hypothetical either way! Genuinely interested to know why? I've re-checked the per-instance costs...
I'm reasonably certain the story is bullshit; but honestly it doesn't matter because this sort of stuff does happen, soooo.
Haha yeah, very fair, I certainly have my doubts too but worst case it makes for an interesting hypothetical either way!
Anyway, re AWS costs, I am not on board at all with your calculations.
Genuinely interested to know why? I've re-checked the per-instance costs and the kind of models you can run on them and I'm pretty sure it all adds up, but it's always possible I've missed something.
I jumped to GPUs because it all seems to centre on "the AI model is quite expensive to execute", and the majority of common AI/ML tooling executes orders of magnitude better on GPU than CPU. I'm assuming it really is primarily compute spend (i.e. number crunching for the neural nets) rather than storage, data warehousing, queries, etc. because none of that would efficiently distribute out to thousands of effectively ephemeral third party accounts. I'm also assuming a fairly small handful of customers because "very niche, industrial market", although each customer may well generate a ton of data since it's all machine telemetry.
Spending $1.2M/year across all cloud services for a B2C platform with millions of active users and a hockey stick growth curve wouldn't surprise me, there are a lot of moving parts and it all adds up, but if it's going to just ML model execution for a comparatively small number of business customers that seems like a different ball game - doubly so if that's just inference rather than training, which by my read is what the post is implying!
ML models are just very common. Like if this is an ad company of sorts that uses ML models to analyse behaviour and spit out whatever, it could be under constant load and spend hundreds of...
ML models are just very common. Like if this is an ad company of sorts that uses ML models to analyse behaviour and spit out whatever, it could be under constant load and spend hundreds of thousands a month. Same if it’s any type of b2b company analysing stuff for its own customers.
I don't disagree with any of that, I guess I just don't see how it's in conflict with what I was saying? Even if it's sustained load, the throughput of 50-70 A100s is huge - and I wasn't implying...
I don't disagree with any of that, I guess I just don't see how it's in conflict with what I was saying? Even if it's sustained load, the throughput of 50-70 A100s is huge - and I wasn't implying that they couldn't possibly need that capacity or anything like that, just giving a sense of how big it really is.
At Walmart's scale, running weekly forecasting on the order of 100 TB of input (100B+ individual data points) to generate 500M outputs takes around 12 hours on 56 GPUs after three days of retraining. That's also on older P100 cores, the A100s are 5-10x faster, but let's assume that model complexity has also increased proportionally so throughput remains equal and we've just increased forecast quality.
Again, I'm not suggesting it's impossible for them to be operating at that scale, I'm just saying "shit, even with pessimistic assumptions, these guys are processing twice as much data as Walmart". That's big! Big enough to be unusual and big enough to be interesting, is all.
Equally, yeah, it's only around 1% of the GPU compute that Meta or Tesla have deployed. Like I said, I've seen companies waste $1.2M on basically nothing before, so spending that on compute isn't going to set any records. But 1% of those companies' GPU cores is still a lot in the same way that 1% of the speed of light is still pretty damn fast!
In the original story, the company in question came from a soviet research institute and the numbers produced by the (very secret) model were important to "everyone in the industry." It describes...
In the original story, the company in question came from a soviet research institute and the numbers produced by the (very secret) model were important to "everyone in the industry."
It describes kind of a weird architecture though. The thread says that the model is downloaded from a web panel, then uploaded to a server that spawned a GCP instance per-request, calculated, then tore it back down. I don't know if that sheds more light or casts more shadow across things.
Thanks for this great breakdown! My Occams razor says if they didn't care about that $1.2M cost they probably wouldn't have risked paying a stranger $50k to lower it.
Thanks for this great breakdown! My Occams razor says if they didn't care about that $1.2M cost they probably wouldn't have risked paying a stranger $50k to lower it.
According to the comments on the article, it seems like this story is an adaptation of this Twitter thread. Seems like there are some details changed, though.
According to the comments on the article, it seems like this story is an adaptation of this Twitter thread. Seems like there are some details changed, though.
Anyone able to parse out what this nonsense line is at the start of the article?
Writer suffer from a cat attack?
It says: “this story is real i am the mysterious developer”, encoded with a Beaufort cipher using “codestore” as the key.
Wait, so he's the one who hired the guy, who audited the (fraudulent) code or the one who did the hack in the first place?
Why not both, traditional ethics be damned? (Mostly jesting)
Company had a net win either way. $50k one time to save > $200k. Dude was even nice enough to have it fail gracefully after the exploit stopped working. Could have left them high and dry with a dead product.
This here is what makes me raise eyebrows. I don't doubt that there is a good reason for this, but I'd love to see a breakdown of the spend there.
Yeah I wondered about that! I had to price out some ML model costs the other day and unsurprisingly $100k/month is a lot, whichever way you slice it.
The biggest instance you can get on AWS is 8 A100 GPUs, 640GB VRAM, 1.1TB normal RAM, and that'll cost $29k/month at full price or $17k/month equivalent if you pay a year up front. Not that you should do the latter, because that's a $150k machine to buy outright and if you're using it 24/7 for more than six months it's far more economical to just have one in house. So let's say they're on AWS because it's mostly burst load and having hardware sitting idle most of the month would be a waste: if you want to get all your data in by the 28th and run it before the end of the month, $100k will get you 416 80GB A100s for two days.
Now, for context: 416 A100s is world class supercomputer territory. Not top ten, for sure, but the kind of thing a major university would be writing excited press releases about. You could run 80 instances of GPT3 simultaneously* - which may not sound huge, but remember it can respond in seconds on that kind of hardware, so you're talking about the ability to hold probably 1000+ fluent, human-level conversations simultaneously before you start seeing a noticeable delay. For 48 straight hours.
That power doesn't go quite as far if they're doing significant retraining every month rather than just inference, but it's still very respectable for something domain-specific that I'd expect to be a lot less complex than a generalised language model in reality.
On the other hand, huge as this kind of capacity actually is, we're also talking about $1.2M/year for a core business need. I've seen companies spend more than that on obviously ill-conceived ideas that get quietly abandoned after six months, so perhaps it's insignificant enough to them that it does only really "matter" in the context of a bullet point on the VP of Engineering's quarterly review.
It also makes me laugh that exploiting the free trials of other providers is exactly the kind of morally and legally dubious "hack" that Google or Amazon themselves would have done back in their startup days. Bezos is on record saying they would use a specific product code they knew would never ship to pad distributor orders and get around minimums at the expense of their suppliers and logistics partners, for example.
*I don't have GPT3 specs to hand, I don't actually know if they've even published them explicitly, but it's the one people are familiar with and a similar scale and architecture to BLOOM, which I do know the hardware requirements for.
I'm reasonably certain the story is bullshit; but honestly it doesn't matter because this sort of stuff does happen, soooo.
Anyway, re AWS costs, I am not on board at all with your calculations. 100k/mo is a very common spend on compute, especially if you're doing AI stuff. The type of company that can afford to spend like this, likely runs MANY machines in parallel.
Back at my old startup, I spent 12k/mo on compute on optimized lambdas at all, for a tiny userbase of maybe 300-400k. And another 20k on Redshift. This was all POST optimization and NOT using GPUs.
This number tells me they're either a small company that doesn't really know how to optimize, or they're a medium-ish company that REALLY knows how to optimize (unlikely in that story).
Haha yeah, very fair, I certainly have my doubts too but worst case it makes for an interesting hypothetical either way!
Genuinely interested to know why? I've re-checked the per-instance costs and the kind of models you can run on them and I'm pretty sure it all adds up, but it's always possible I've missed something.
I jumped to GPUs because it all seems to centre on "the AI model is quite expensive to execute", and the majority of common AI/ML tooling executes orders of magnitude better on GPU than CPU. I'm assuming it really is primarily compute spend (i.e. number crunching for the neural nets) rather than storage, data warehousing, queries, etc. because none of that would efficiently distribute out to thousands of effectively ephemeral third party accounts. I'm also assuming a fairly small handful of customers because "very niche, industrial market", although each customer may well generate a ton of data since it's all machine telemetry.
Spending $1.2M/year across all cloud services for a B2C platform with millions of active users and a hockey stick growth curve wouldn't surprise me, there are a lot of moving parts and it all adds up, but if it's going to just ML model execution for a comparatively small number of business customers that seems like a different ball game - doubly so if that's just inference rather than training, which by my read is what the post is implying!
ML models are just very common. Like if this is an ad company of sorts that uses ML models to analyse behaviour and spit out whatever, it could be under constant load and spend hundreds of thousands a month. Same if it’s any type of b2b company analysing stuff for its own customers.
These are a dime a dozen honestly.
I don't disagree with any of that, I guess I just don't see how it's in conflict with what I was saying? Even if it's sustained load, the throughput of 50-70 A100s is huge - and I wasn't implying that they couldn't possibly need that capacity or anything like that, just giving a sense of how big it really is.
At Walmart's scale, running weekly forecasting on the order of 100 TB of input (100B+ individual data points) to generate 500M outputs takes around 12 hours on 56 GPUs after three days of retraining. That's also on older P100 cores, the A100s are 5-10x faster, but let's assume that model complexity has also increased proportionally so throughput remains equal and we've just increased forecast quality.
Again, I'm not suggesting it's impossible for them to be operating at that scale, I'm just saying "shit, even with pessimistic assumptions, these guys are processing twice as much data as Walmart". That's big! Big enough to be unusual and big enough to be interesting, is all.
Equally, yeah, it's only around 1% of the GPU compute that Meta or Tesla have deployed. Like I said, I've seen companies waste $1.2M on basically nothing before, so spending that on compute isn't going to set any records. But 1% of those companies' GPU cores is still a lot in the same way that 1% of the speed of light is still pretty damn fast!
In the original story, the company in question came from a soviet research institute and the numbers produced by the (very secret) model were important to "everyone in the industry."
It describes kind of a weird architecture though. The thread says that the model is downloaded from a web panel, then uploaded to a server that spawned a GCP instance per-request, calculated, then tore it back down. I don't know if that sheds more light or casts more shadow across things.
Thanks for this great breakdown! My Occams razor says if they didn't care about that $1.2M cost they probably wouldn't have risked paying a stranger $50k to lower it.
According to the comments on the article, it seems like this story is an adaptation of this Twitter thread. Seems like there are some details changed, though.