I haven't been closely following the releases of new models and the research papers that sometimes go along with them. So I was wondering: is power consumption ever seriously talked about by...
I haven't been closely following the releases of new models and the research papers that sometimes go along with them. So I was wondering: is power consumption ever seriously talked about by OpenAI, Google, Anthropic, and others? Do we get some specific numbers of how much power their models actually consume to produce 100 tokens? The cost of training some of these models? Or we're not yet at that stage yet and nobody cares and at best we can just get really rough estimates based on "trust me bro" tweets by their respective CEO's?
Some time ago, I came across this post. GPT-OSS (2OB, 120B) are meager, yet energy efficient models and I was surprised that the power consumption is still larger than what I had estimated before.
For 120B to generate a 1000 tokens (which is really not a lot) it would take up around 83 Wh of electricity. For context my PC consumes ~100Wh when idling. So it's almost equivalent as leaving my PC on for an hour. Considering that proprietary models are TRILLIONS of parameters and probably not as energy efficient the true power consumption of these models is concerning. Of course, these big data centers do a lot of things to maximize the efficiency that this "test" fails to do. But even if you half the energy consumption it's still significant given their size and their pervasiveness in handling everyday, trivial tasks.
I haven't come across similar posts or studies for newer open source models, so if somebody has, please share.
Also, I don't seem to fully understand how does context size fit into all of this. The larger the context size the more power it would take to produce those 100 tokens?