To deflate the hype a bit, this is an impractical result on a benchmark that's not necessarily measuring what you and I would care about. And it seems they brute-forced it. But I expect that...
To deflate the hype a bit, this is an impractical result on a benchmark that's not necessarily measuring what you and I would care about. And it seems they brute-forced it.
But I expect that having done it at all, efficiency will improve. There is a trend in AI of prices declining dramatically.
I also don't think it's a coincidence they're releasing this now, as growth is slowing and market pressure is turning to expect AI companies to actually make money. 2025 will pretty clearly be the...
I also don't think it's a coincidence they're releasing this now, as growth is slowing and market pressure is turning to expect AI companies to actually make money.
2025 will pretty clearly be the year that they try to make money off AI instead of growing without regard for cost. This would not be the first time an AI company put out a hype video based on lies to try and pump up valuation - the industry is rife with it. I'll believe it when I see it and it actually works and doesn't just turn out to be several underpaid third worlders in a trenchcoat.
I think it's good to be skeptical, but that's going a bit far. This was an outside benchmark. I doubt it would be possible to cheat in such a simple way, and I doubt OpenAI could have that level...
I think it's good to be skeptical, but that's going a bit far. This was an outside benchmark. I doubt it would be possible to cheat in such a simple way, and I doubt OpenAI could have that level of corruption without someone leaking about it. The many machine learning researchers that OpenAI has hired probably don't think of themselves as cheating?
But we've often seen with scientific research that there are many ways to subtly fool yourself. Also, AI algorithms will use whatever shortcuts they can find.
The hype part tends to be describing research results to public and discussing their significance. Altman is a hype machine all by himself. The "12 days of OpenAI" thing is a publicity campaign. There are many outside influencers that will feed the hype, spreading the news far and wide, blogging and making videos.
You can think of this as fulfilling a demand - people want to talk about dramatic news.
yes this is what I'm talking about. I'm of the opinion that openai is primarily an investment scam. it's clearly useful tech in some situations, but the valuation is utterly delusional. And this...
But we've often seen with scientific research that there are many ways to subtly fool yourself. Also, AI algorithms will use whatever shortcuts they can find.
yes this is what I'm talking about. I'm of the opinion that openai is primarily an investment scam. it's clearly useful tech in some situations, but the valuation is utterly delusional. And this is supported by online disinfo tools being more powerful than ever AND traditional search engines getting worse. OpenAI's valuation only makes sense if you think they're on the path and close to AGI, and I think it's equal parts copium and hopium to think they're doing that with a text transformer model.
Assuming you’re right, I’m wondering who loses if they don’t live up to their valuation? They’re not public, so it seems like it would be some VC’s and companies like Microsoft who provided the...
Assuming you’re right, I’m wondering who loses if they don’t live up to their valuation? They’re not public, so it seems like it would be some VC’s and companies like Microsoft who provided the datacenter time.
VCs and retail investors in NVDA imo. Datacenters are already talking about lowering demand for AI hardware, that's why hype videos like this are getting pushed out. I think some cloud provider is...
VCs and retail investors in NVDA imo. Datacenters are already talking about lowering demand for AI hardware, that's why hype videos like this are getting pushed out. I think some cloud provider is gonna get left holding a huge bag.
There's clear use to massive parallel vector processing, that's always been the case. I just don't think the market is as huge as the AI bulls want to believe. Maybe I'm wrong and intelligence is a natural result of sufficient complexity and we're about to have self aware machine intelligence. I just think extraordinary claims need to be backed up and tested rigorously and openai is anything but.
Maybe NVidia is a bubble, or maybe people continue to find new usages for the de-facto standard matrix multiplication hardware, or maybe their competitors catch up enough that their lead...
Maybe NVidia is a bubble, or maybe people continue to find new usages for the de-facto standard matrix multiplication hardware, or maybe their competitors catch up enough that their lead disappears. All sorts of scenarios and AGI is only one. It’s 6.6% of the S&P 500 and I don’t know enough to weight it any differently.
Despite all the attempts at rigor by stock market analysts, stock prices aren’t built on rigorous evaluation. They’re built on hints and guesses, hopes and fears.
This result was a form of brute-force, but it's interesting largely because this approach wasn't viable before. You could generate ten thousand results and build a consensus from them, but you...
This result was a form of brute-force, but it's interesting largely because this approach wasn't viable before. You could generate ten thousand results and build a consensus from them, but you wouldn't have seen the stair-stepping towards correctness that o3 showed.
The result seems to be moving from using RL to "know stuff", towards using RL to "know how to reason". Even without extensive use of CoT, you can still use that to improve inference with only one iteration. They showed that o3 scored 3x the previous SOTA score on this benchmark even on its first attempt.
Considering there's likely a lot of repeated work happening in these CoT chains, I expect it's very likely they'll be able to incorporate optimizations to make this process cheaper and faster to run. What might cost $350K now may "only" be $3,500 in a year. And while that's still a heavy fee for you and I, it begins to be a useful tool for companies that have extremely high demands on correctness. Think automatic audits on tools like OpenSSL, or checking for bugs in critical code paths. Perhaps it could even serve as a "research assistant" in frontier scientific research, where there's very few people in the world you could ever bounce ideas off of.
Even if it's not practical right now for regular people, the ability to reach high levels of accuracy in extremely difficult problem spaces is still an exciting result. It shows that we've still not reached the potential of these models as some have speculated.
As I have said in the past, much of the doubt that is expressed over AI is driven by people who are afraid of being replaced or outshined. I can’t say I blame them. The world will certainly be an...
As I have said in the past, much of the doubt that is expressed over AI is driven by people who are afraid of being replaced or outshined. I can’t say I blame them. The world will certainly be an interesting place once humans are no longer the dominant intelligence. Let’s hope that it leads to greater prosperity but let’s be honest, it’ll probably lead to more inequality or something else negative.
AI Explained is always a good source for keeping up with the latest in this field. His videos are well-researched and clear. It's incredible how these models keep leapfrogging each other. Every...
AI Explained is always a good source for keeping up with the latest in this field. His videos are well-researched and clear.
It's incredible how these models keep leapfrogging each other. Every time it seems like progress is slowing, we see new techniques emerge like mixture of experts, multimodality training, and now chain of thought. The capabilities continue to improve, and benchmarks are being quickly obsoleted.
This is the first model to show serious adaption of training data to new material, which is closer to rational thinking than we've seen before. It seems we really are getting to the point of needing to consider what AGI really means.
I look forward to seeing new quantization and optimization techniques explored for chain of thought reasoning. It should be possible to reduce inference costs to more reasonable levels, even on weaker hardware. Like MoE, it would just need to run serially, and not in parallel.
I work in a fairly bustling and large co-working space for startups. I had a friend meet me for lunch and she needed to snake past cubicle after cubicle to get to my desk in the far corner. When...
I work in a fairly bustling and large co-working space for startups. I had a friend meet me for lunch and she needed to snake past cubicle after cubicle to get to my desk in the far corner. When she got there, her jaw was on the floor. She told me to look at every monitor we passed on the way back out of the building. The math isn’t exact, but it is probably underestimating it to say 4 out of 5 people had chatgpt or claude open on one of their monitors / were actively using ai to do whatever. (And this doesn’t count the people we passed who were using cursor (because that just looks like vs code so there is no way to tell at a glance.))
I keep seeing/hearing people dropping comments like “hype cycle” and “hallucination “ and “wall” and “not for a long time” but this is the worst ai is ever going to be, and it is already changing everything. Dismissive comments just feel like someone out of touch or totally lacking basic imagination or clinging to a view of the world they wish were still true.
The future is already here and is going to chew everyone and everything up, and what comes out the other end simply won’t be recognizable except as far as human nature (greed and status seeking monkey behavior) still motivates people and defines culture.
It seems like startups are where there will be the most enthusiastic adopters of AI. A lot of them might be AI startups? It certainly is a lot different from cryptocurrency.
It seems like startups are where there will be the most enthusiastic adopters of AI. A lot of them might be AI startups?
It certainly is a lot different from cryptocurrency.
okay but this isn't actually anything new. you are describing an IDE with integrated documentation search or using Google or stackoverflow. these are not the parts of programming that require...
The math isn’t exact, but it is probably underestimating it to say 4 out of 5 people had chatgpt or claude open on one of their monitors / were actively using ai to do whatever.
okay but this isn't actually anything new. you are describing an IDE with integrated documentation search or using Google or stackoverflow. these are not the parts of programming that require thought or intelligence. I haven't yet seen an AI that can do programming beyond what's doable with vim snippets or whatever is the equivalent in your preferred IDE. That's cool, don't get me wrong, but I am far from the only person unconvinced about the worldchanging ability of even more high quality autocomplete.
It's definitely a nice upgrade over the previous status quo. Cursor mainly acts as a way to remove drudgery from programming - finding the right SO post, changing bits of code across a file after...
It's definitely a nice upgrade over the previous status quo. Cursor mainly acts as a way to remove drudgery from programming - finding the right SO post, changing bits of code across a file after you redefine a variable or change some invariant.
Right, I don't mean to act like I think this is totally useless or whatever. Clearly it has use in information retrieval - and the fact that it can do really really good autocomplete/boilerplate...
Right, I don't mean to act like I think this is totally useless or whatever. Clearly it has use in information retrieval - and the fact that it can do really really good autocomplete/boilerplate generation is kickass. But this isn't really going to speed up programming - those are not the parts of coding that take up much time! - and its weird that it keeps getting touted as a fundamental paradigm shift when it seems a lot more like a nicer interface over extant tech.
Author's summary: All the talk about AI models 'hitting a wall' and slowing down has been vaporized - in fact we don't appear to be approaching any walls at all. That was just wishful thinking on...
Author's summary:
o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more.
All the talk about AI models 'hitting a wall' and slowing down has been vaporized - in fact we don't appear to be approaching any walls at all. That was just wishful thinking on the part of some researchers and it has now been demonstrated as false. This model, in a nutshell, can beat subject matter expert humans at anything you can create a benchmark to measure, with only a few holdouts for spatial reasoning and other niche areas that are just too 'soft' in the data (for now) to succumb to rigorous reasoning steps. The o3 model does this at an absurd electrical processing cost, but in minutes it still solves what would take any human expert days to calculate. Results are well into the 90th percentile now on even the most grueling math and physics thrown at it.
It's not quite an AGI but it is blurring the line and closing in on that goal faster than any experts or futurists (including the crazy ones) ever predicted it could. There are still no brakes on this train, and safety is becoming a very real concern with increasing urgency. I look forward to learning how to install a nuclear reactor in my data centers just to power these things.
Don't you think that you are brushing past this awfully quickly? A lot of the hitting a wall talk specifically takes place in this specific context. Where the wall described is exactly that the...
The o3 model does this at an absurd electrical processing cost,
Don't you think that you are brushing past this awfully quickly? A lot of the hitting a wall talk specifically takes place in this specific context. Where the wall described is exactly that the way they seem to be able to still scale up seems to be to throw absurd amounts of power at the issue.
I haven't watched the video, although I saw the announcement and have yet to form an overall opinion. It is just that when I read your comment this jumped out to me as a bit of an oddity.
Edit:
So if I understand it correctly. A prompt does roughly cost $3000 with o3. Which also means that just doing the benchmark will have cost more than a million dollar.
The high-efficiency score of 75.7% is within the budget rules of ARC-AGI-Pub (costs <$10k) and therefore qualifies as 1st place on the public leaderboard!
The low-efficiency score of 87.5% is quite expensive, but still shows that performance on novel tasks does improve with increased compute (at least up to this level.)
Despite the significant cost per task, these numbers aren't just the result of applying brute force compute to the benchmark. OpenAI's new o3 model represents a significant leap forward in AI's ability to adapt to novel tasks. This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs. o3 is a system capable of adapting to tasks it has never encountered before, arguably approaching human-level performance in the ARC-AGI domain.
Of course, such generality comes at a steep cost, and wouldn't quite be economical yet: you could pay a human to solve ARC-AGI tasks for roughly $5 per task (we know, we did that), while consuming mere cents in energy. Meanwhile o3 requires $17-20 per task in the low-compute mode. But cost-performance will likely improve quite dramatically over the next few months and years, so you should plan for these capabilities to become competitive with human work within a fairly short timeline.
So it is genuinely a step forward and not just brute forcing the issue. But unless they can actually find a viable way of bringing the cost down I honestly am not convinced we are actually moving forward in this space in any practical terms.
We already have two ways to solve this problem and we're going to use both of them. The first is to start mass producing nuclear power systems and reduce electricity cost to a fraction of a...
We already have two ways to solve this problem and we're going to use both of them.
The first is to start mass producing nuclear power systems and reduce electricity cost to a fraction of a percent of what it is now. We already know how and have already started doing this using small modular reactor technology. We need to do this anyway, for more important reasons than tech. It'll be refreshing to see AI driving development of climate solutions just by power demand. We can't seem to get the ball rolling there by ourselves - well, now we have help, with all of the deepest pockets lining up to pitch in. Capitalist frenzy mode is super effective.
The second is to iterate the hardware and methodology to get more bang per watt of power consumed. Business as usual for the tech industry. The switch to optical computation is the biggest potential gain here, and with the AI helping us design the new paradigm I see no reason it shouldn't happen in a fraction of the time it took to invent and iterate electronics to this level. We're hardly starting from scratch. What took a hundred years to do with electronics can take ten to do with optics. In the meantime, I'm sure there's plenty of in-code method-based optimizations waiting for us to discover as we better understand how to design these systems. The tech is two years old, it's not even a toddler yet. There's plenty of room left in silicon before we need to hit the optics.
The data center that costs you $350,000 for a nine-minute computation today comes on a $350 card that fits in a single slot in your computer tomorrow. It'll take a half hour instead of nine minutes, but at least now everyone can run their own. That's what tech does. Nvidia remembers how much money they made selling GPUs to gamers. Once the profits in the big iron get stale they'll do that again with the AI tech.
The biggest stumbling block to both of these solutions has always been justifying the cost in a capitalist system that fundamentally couldn't care less about ethics or rules or the future - only the bottom line. If the demand for AI is there and it delivers the money, we're going to get the necessary investments, fast.
Sorry, you lost me there already. I’m not against nuclear power in principle, but it doesn’t feel realistic to position it as a quick fix for huge energy costs. Especially when we’re still...
The first is to start mass producing nuclear power systems
Sorry, you lost me there already. I’m not against nuclear power in principle, but it doesn’t feel realistic to position it as a quick fix for huge energy costs. Especially when we’re still struggling to roll out or upgrade nuclear for even basic power demands. Building or revamping any large-scale nuclear infrastructure takes years, decades even, because of strict regulatory requirements, community pushback, and the challenge of financing these projects. Cost overruns are common, and long construction timelines often clash with the urgent needs of a fast-evolving technology like AI. Some of those can be solved by throwing more money at them, but that doesn't make it better.
I’m not denying the potential value of nuclear in the long run, but hand-waving away these massive barriers makes it sound more like wishful thinking than a genuine plan. Not to mention that we still need to contend with the electricity needs of the rest of the world. Any extra power consumption of green electricity (let's at least go with nuclear being green) that goes into AIs like these is power that doesn't go towards our other energy needs. Tangible industries that still heavily rely on fossil fuels where electricity could also be used as an example.
Adding to that, there’s also a difference between rhetorical enthusiasm, just throwing more power at AI, and the real-world steps we know can help reduce the energy burden. Things better hardware design, improved software-level efficiencies, and other optimizations. Some of which you sort of recognize but make it appear as if those will be here by tomorrow, which is also not the case. We are where we are with the current hardware after decades of development, not years. It’s clear, to me anyway, that solving AI’s growing energy demands isn’t as simple as flipping a switch or building a few reactors. And it would help if we acknowledged the complexity instead of glossing over it.
I see a world where every building has its own small modular reactor. About the size of a fridge, moderately higher power output than a room full of gas generators, but you only refuel it once...
Sorry, you lost me there already.
I see a world where every building has its own small modular reactor. About the size of a fridge, moderately higher power output than a room full of gas generators, but you only refuel it once every year, take it in for parts maintenance every ten, and it's always running. I expect to have my own someday so I don't have to rely on a shoddy power grid and I'd buy one tomorrow if they were on the market. I think I'd put it in its own shed rather than in the basement, though. The waste heat is a pain in the ass to deal with, though it does make for cheap desalination and heating.
The simple truth is we do that, or we have a civilization level collapse - it's the only solution that's guaranteed to work. All other solutions to all other problems in all other domains (no exceptions) depend upon as much cheap energy as we can generate, and nuclear has absolutely no competition whatsoever when it comes to the energy density of the power systems. This is as close to an inevitable outcome as it gets.
It might be MSRs, gas cooled reactors, or some fancy power system like Helion's fusion reactors that does away with turbines - doesn't matter which horse wins. The only reason it hasn't happened yet is because people's idea of nuclear power is sixty years out of date with what's already running in the lab today, and they think reactor = bomb (which is total nonsense, thanks Hollywood).
The AI craze creates the perfect excuse for a bunch of tech moguls to rush generation four fission and prototype fusion systems. Ten billion is overkill for this project, could probably do it with one billion and in less time than it took for the Manhattan project. They'll make almost as much money selling the power systems as they do on the AI technology. Humanity's first multi-trillionaire is probably going to be the person who gets this tech to the market. Sam Altman is already investing.
Well, you might see such a world but it is still far away from our current reality. If anything I must commend your optimistic view of the future. Though I do think it is shaped as tunnel vision....
Well, you might see such a world but it is still far away from our current reality. If anything I must commend your optimistic view of the future. Though I do think it is shaped as tunnel vision. As it doesn't change anything about the current day realities I brought up multiple times you keep effectively bypassing. Throwing more money at things that already receive substantial resources will not make things develop exponentially faster.
I think I have made the point clear enough, so will be disengaging from here.
Nuclear energy is competing against all other forms of energy production. It has some advantages, but so do competitors. If geothermal turns out to be big, there's no fuel, no waste, and it would...
Nuclear energy is competing against all other forms of energy production. It has some advantages, but so do competitors. If geothermal turns out to be big, there's no fuel, no waste, and it would be reliable, not depending on the weather. What advantages would nuclear power have?
Meanwhile, solar is extremely cheap, and batteries are getting better.
It's hard to pick winners in energy, but I'd bet on cheaper and on faster iteration, so that companies can learn how to bring costs down faster.
https://arcprize.org/arc Let's wait for a few years and we will see is this true or not. I like the idea behind this test, it's basically simplified variation of part of standard IQ test. I don't...
ARC-AGI is the only AI benchmark that measures our progress towards general intelligence.
Let's wait for a few years and we will see is this true or not.
I like the idea behind this test, it's basically simplified variation of part of standard IQ test.
I don't like the implementation, and that they comparing how people score on it vs AI.
All the fuss over benchmarks and AGI vs ASI vs human thought does make me roll my eyes a bit. Hard to measure intelligence when we can't even write down a rigorous formal definition of what...
All the fuss over benchmarks and AGI vs ASI vs human thought does make me roll my eyes a bit. Hard to measure intelligence when we can't even write down a rigorous formal definition of what intelligence is and how it operates and why it works in the first place. We're dealing with fuzzy concepts that we still fundamentally do not understand. Somewhere along the line of AI research I think we're going to start to get some real insight into this problem, and I'm very curious to find out.
ARC is interesting because it's a trivial human task that AI can't comprehend. Unless that changes, it's definitely indicative of lacking something fundamental to achieving whatever may be defined...
ARC is interesting because it's a trivial human task that AI can't comprehend.
Unless that changes, it's definitely indicative of lacking something fundamental to achieving whatever may be defined as "intelligence"
Only a guess, but I think that’s an unlikely scenario. I don’t imagine AGI will be magic in the sense of “automatically solves R&D.” There will be bottlenecks and Amdahl’s law applies. In the case...
Only a guess, but I think that’s an unlikely scenario. I don’t imagine AGI will be magic in the sense of “automatically solves R&D.” There will be bottlenecks and Amdahl’s law applies. In the case of drugs, doing medical trials will be a bottleneck, some subproblems of improving medical trials have political aspects to them, and AGI isn’t likely to magically solve politics.
When Amdahl’s law doesn’t apply, things might get a little weird, though?
To deflate the hype a bit, this is an impractical result on a benchmark that's not necessarily measuring what you and I would care about. And it seems they brute-forced it.
But I expect that having done it at all, efficiency will improve. There is a trend in AI of prices declining dramatically.
I also don't think it's a coincidence they're releasing this now, as growth is slowing and market pressure is turning to expect AI companies to actually make money.
2025 will pretty clearly be the year that they try to make money off AI instead of growing without regard for cost. This would not be the first time an AI company put out a hype video based on lies to try and pump up valuation - the industry is rife with it. I'll believe it when I see it and it actually works and doesn't just turn out to be several underpaid third worlders in a trenchcoat.
We at Amazon use A.I.* to keep track while you shop, so you don't need to pay in store - you can simply leave!
***A.I. meaning Actually Indian people
I think it's good to be skeptical, but that's going a bit far. This was an outside benchmark. I doubt it would be possible to cheat in such a simple way, and I doubt OpenAI could have that level of corruption without someone leaking about it. The many machine learning researchers that OpenAI has hired probably don't think of themselves as cheating?
But we've often seen with scientific research that there are many ways to subtly fool yourself. Also, AI algorithms will use whatever shortcuts they can find.
The hype part tends to be describing research results to public and discussing their significance. Altman is a hype machine all by himself. The "12 days of OpenAI" thing is a publicity campaign. There are many outside influencers that will feed the hype, spreading the news far and wide, blogging and making videos.
You can think of this as fulfilling a demand - people want to talk about dramatic news.
yes this is what I'm talking about. I'm of the opinion that openai is primarily an investment scam. it's clearly useful tech in some situations, but the valuation is utterly delusional. And this is supported by online disinfo tools being more powerful than ever AND traditional search engines getting worse. OpenAI's valuation only makes sense if you think they're on the path and close to AGI, and I think it's equal parts copium and hopium to think they're doing that with a text transformer model.
Assuming you’re right, I’m wondering who loses if they don’t live up to their valuation? They’re not public, so it seems like it would be some VC’s and companies like Microsoft who provided the datacenter time.
They’re professionals and know the risks.
VCs and retail investors in NVDA imo. Datacenters are already talking about lowering demand for AI hardware, that's why hype videos like this are getting pushed out. I think some cloud provider is gonna get left holding a huge bag.
There's clear use to massive parallel vector processing, that's always been the case. I just don't think the market is as huge as the AI bulls want to believe. Maybe I'm wrong and intelligence is a natural result of sufficient complexity and we're about to have self aware machine intelligence. I just think extraordinary claims need to be backed up and tested rigorously and openai is anything but.
Maybe NVidia is a bubble, or maybe people continue to find new usages for the de-facto standard matrix multiplication hardware, or maybe their competitors catch up enough that their lead disappears. All sorts of scenarios and AGI is only one. It’s 6.6% of the S&P 500 and I don’t know enough to weight it any differently.
Despite all the attempts at rigor by stock market analysts, stock prices aren’t built on rigorous evaluation. They’re built on hints and guesses, hopes and fears.
This result was a form of brute-force, but it's interesting largely because this approach wasn't viable before. You could generate ten thousand results and build a consensus from them, but you wouldn't have seen the stair-stepping towards correctness that o3 showed.
The result seems to be moving from using RL to "know stuff", towards using RL to "know how to reason". Even without extensive use of CoT, you can still use that to improve inference with only one iteration. They showed that o3 scored 3x the previous SOTA score on this benchmark even on its first attempt.
Considering there's likely a lot of repeated work happening in these CoT chains, I expect it's very likely they'll be able to incorporate optimizations to make this process cheaper and faster to run. What might cost $350K now may "only" be $3,500 in a year. And while that's still a heavy fee for you and I, it begins to be a useful tool for companies that have extremely high demands on correctness. Think automatic audits on tools like OpenSSL, or checking for bugs in critical code paths. Perhaps it could even serve as a "research assistant" in frontier scientific research, where there's very few people in the world you could ever bounce ideas off of.
Even if it's not practical right now for regular people, the ability to reach high levels of accuracy in extremely difficult problem spaces is still an exciting result. It shows that we've still not reached the potential of these models as some have speculated.
As I have said in the past, much of the doubt that is expressed over AI is driven by people who are afraid of being replaced or outshined. I can’t say I blame them. The world will certainly be an interesting place once humans are no longer the dominant intelligence. Let’s hope that it leads to greater prosperity but let’s be honest, it’ll probably lead to more inequality or something else negative.
AI Explained is always a good source for keeping up with the latest in this field. His videos are well-researched and clear.
It's incredible how these models keep leapfrogging each other. Every time it seems like progress is slowing, we see new techniques emerge like mixture of experts, multimodality training, and now chain of thought. The capabilities continue to improve, and benchmarks are being quickly obsoleted.
This is the first model to show serious adaption of training data to new material, which is closer to rational thinking than we've seen before. It seems we really are getting to the point of needing to consider what AGI really means.
I look forward to seeing new quantization and optimization techniques explored for chain of thought reasoning. It should be possible to reduce inference costs to more reasonable levels, even on weaker hardware. Like MoE, it would just need to run serially, and not in parallel.
I work in a fairly bustling and large co-working space for startups. I had a friend meet me for lunch and she needed to snake past cubicle after cubicle to get to my desk in the far corner. When she got there, her jaw was on the floor. She told me to look at every monitor we passed on the way back out of the building. The math isn’t exact, but it is probably underestimating it to say 4 out of 5 people had chatgpt or claude open on one of their monitors / were actively using ai to do whatever. (And this doesn’t count the people we passed who were using cursor (because that just looks like vs code so there is no way to tell at a glance.))
I keep seeing/hearing people dropping comments like “hype cycle” and “hallucination “ and “wall” and “not for a long time” but this is the worst ai is ever going to be, and it is already changing everything. Dismissive comments just feel like someone out of touch or totally lacking basic imagination or clinging to a view of the world they wish were still true.
The future is already here and is going to chew everyone and everything up, and what comes out the other end simply won’t be recognizable except as far as human nature (greed and status seeking monkey behavior) still motivates people and defines culture.
It seems like startups are where there will be the most enthusiastic adopters of AI. A lot of them might be AI startups?
It certainly is a lot different from cryptocurrency.
okay but this isn't actually anything new. you are describing an IDE with integrated documentation search or using Google or stackoverflow. these are not the parts of programming that require thought or intelligence. I haven't yet seen an AI that can do programming beyond what's doable with vim snippets or whatever is the equivalent in your preferred IDE. That's cool, don't get me wrong, but I am far from the only person unconvinced about the worldchanging ability of even more high quality autocomplete.
It's definitely a nice upgrade over the previous status quo. Cursor mainly acts as a way to remove drudgery from programming - finding the right SO post, changing bits of code across a file after you redefine a variable or change some invariant.
Right, I don't mean to act like I think this is totally useless or whatever. Clearly it has use in information retrieval - and the fact that it can do really really good autocomplete/boilerplate generation is kickass. But this isn't really going to speed up programming - those are not the parts of coding that take up much time! - and its weird that it keeps getting touted as a fundamental paradigm shift when it seems a lot more like a nicer interface over extant tech.
Author's summary:
All the talk about AI models 'hitting a wall' and slowing down has been vaporized - in fact we don't appear to be approaching any walls at all. That was just wishful thinking on the part of some researchers and it has now been demonstrated as false. This model, in a nutshell, can beat subject matter expert humans at anything you can create a benchmark to measure, with only a few holdouts for spatial reasoning and other niche areas that are just too 'soft' in the data (for now) to succumb to rigorous reasoning steps. The o3 model does this at an absurd electrical processing cost, but in minutes it still solves what would take any human expert days to calculate. Results are well into the 90th percentile now on even the most grueling math and physics thrown at it.
It's not quite an AGI but it is blurring the line and closing in on that goal faster than any experts or futurists (including the crazy ones) ever predicted it could. There are still no brakes on this train, and safety is becoming a very real concern with increasing urgency. I look forward to learning how to install a nuclear reactor in my data centers just to power these things.
Don't you think that you are brushing past this awfully quickly? A lot of the hitting a wall talk specifically takes place in this specific context. Where the wall described is exactly that the way they seem to be able to still scale up seems to be to throw absurd amounts of power at the issue.
I haven't watched the video, although I saw the announcement and have yet to form an overall opinion. It is just that when I read your comment this jumped out to me as a bit of an oddity.
Edit:
So if I understand it correctly. A prompt does roughly cost $3000 with o3. Which also means that just doing the benchmark will have cost more than a million dollar.
Though to put some nuance on it by going to the source directly
So it is genuinely a step forward and not just brute forcing the issue. But unless they can actually find a viable way of bringing the cost down I honestly am not convinced we are actually moving forward in this space in any practical terms.
How much would it cost to hire a human subject-matter expert to respond to the same prompt?
Quite literally mentioned in the things I explicitly quoted.
We already have two ways to solve this problem and we're going to use both of them.
The first is to start mass producing nuclear power systems and reduce electricity cost to a fraction of a percent of what it is now. We already know how and have already started doing this using small modular reactor technology. We need to do this anyway, for more important reasons than tech. It'll be refreshing to see AI driving development of climate solutions just by power demand. We can't seem to get the ball rolling there by ourselves - well, now we have help, with all of the deepest pockets lining up to pitch in. Capitalist frenzy mode is super effective.
The second is to iterate the hardware and methodology to get more bang per watt of power consumed. Business as usual for the tech industry. The switch to optical computation is the biggest potential gain here, and with the AI helping us design the new paradigm I see no reason it shouldn't happen in a fraction of the time it took to invent and iterate electronics to this level. We're hardly starting from scratch. What took a hundred years to do with electronics can take ten to do with optics. In the meantime, I'm sure there's plenty of in-code method-based optimizations waiting for us to discover as we better understand how to design these systems. The tech is two years old, it's not even a toddler yet. There's plenty of room left in silicon before we need to hit the optics.
The data center that costs you $350,000 for a nine-minute computation today comes on a $350 card that fits in a single slot in your computer tomorrow. It'll take a half hour instead of nine minutes, but at least now everyone can run their own. That's what tech does. Nvidia remembers how much money they made selling GPUs to gamers. Once the profits in the big iron get stale they'll do that again with the AI tech.
The biggest stumbling block to both of these solutions has always been justifying the cost in a capitalist system that fundamentally couldn't care less about ethics or rules or the future - only the bottom line. If the demand for AI is there and it delivers the money, we're going to get the necessary investments, fast.
Sorry, you lost me there already. I’m not against nuclear power in principle, but it doesn’t feel realistic to position it as a quick fix for huge energy costs. Especially when we’re still struggling to roll out or upgrade nuclear for even basic power demands. Building or revamping any large-scale nuclear infrastructure takes years, decades even, because of strict regulatory requirements, community pushback, and the challenge of financing these projects. Cost overruns are common, and long construction timelines often clash with the urgent needs of a fast-evolving technology like AI. Some of those can be solved by throwing more money at them, but that doesn't make it better.
I’m not denying the potential value of nuclear in the long run, but hand-waving away these massive barriers makes it sound more like wishful thinking than a genuine plan. Not to mention that we still need to contend with the electricity needs of the rest of the world. Any extra power consumption of green electricity (let's at least go with nuclear being green) that goes into AIs like these is power that doesn't go towards our other energy needs. Tangible industries that still heavily rely on fossil fuels where electricity could also be used as an example.
Adding to that, there’s also a difference between rhetorical enthusiasm, just throwing more power at AI, and the real-world steps we know can help reduce the energy burden. Things better hardware design, improved software-level efficiencies, and other optimizations. Some of which you sort of recognize but make it appear as if those will be here by tomorrow, which is also not the case. We are where we are with the current hardware after decades of development, not years. It’s clear, to me anyway, that solving AI’s growing energy demands isn’t as simple as flipping a switch or building a few reactors. And it would help if we acknowledged the complexity instead of glossing over it.
I see a world where every building has its own small modular reactor. About the size of a fridge, moderately higher power output than a room full of gas generators, but you only refuel it once every year, take it in for parts maintenance every ten, and it's always running. I expect to have my own someday so I don't have to rely on a shoddy power grid and I'd buy one tomorrow if they were on the market. I think I'd put it in its own shed rather than in the basement, though. The waste heat is a pain in the ass to deal with, though it does make for cheap desalination and heating.
The simple truth is we do that, or we have a civilization level collapse - it's the only solution that's guaranteed to work. All other solutions to all other problems in all other domains (no exceptions) depend upon as much cheap energy as we can generate, and nuclear has absolutely no competition whatsoever when it comes to the energy density of the power systems. This is as close to an inevitable outcome as it gets.
It might be MSRs, gas cooled reactors, or some fancy power system like Helion's fusion reactors that does away with turbines - doesn't matter which horse wins. The only reason it hasn't happened yet is because people's idea of nuclear power is sixty years out of date with what's already running in the lab today, and they think reactor = bomb (which is total nonsense, thanks Hollywood).
The AI craze creates the perfect excuse for a bunch of tech moguls to rush generation four fission and prototype fusion systems. Ten billion is overkill for this project, could probably do it with one billion and in less time than it took for the Manhattan project. They'll make almost as much money selling the power systems as they do on the AI technology. Humanity's first multi-trillionaire is probably going to be the person who gets this tech to the market. Sam Altman is already investing.
Well, you might see such a world but it is still far away from our current reality. If anything I must commend your optimistic view of the future. Though I do think it is shaped as tunnel vision. As it doesn't change anything about the current day realities I brought up multiple times you keep effectively bypassing. Throwing more money at things that already receive substantial resources will not make things develop exponentially faster.
I think I have made the point clear enough, so will be disengaging from here.
Nuclear energy is competing against all other forms of energy production. It has some advantages, but so do competitors. If geothermal turns out to be big, there's no fuel, no waste, and it would be reliable, not depending on the weather. What advantages would nuclear power have?
Meanwhile, solar is extremely cheap, and batteries are getting better.
It's hard to pick winners in energy, but I'd bet on cheaper and on faster iteration, so that companies can learn how to bring costs down faster.
https://arcprize.org/arc
Let's wait for a few years and we will see is this true or not.
I like the idea behind this test, it's basically simplified variation of part of standard IQ test.
I don't like the implementation, and that they comparing how people score on it vs AI.
All the fuss over benchmarks and AGI vs ASI vs human thought does make me roll my eyes a bit. Hard to measure intelligence when we can't even write down a rigorous formal definition of what intelligence is and how it operates and why it works in the first place. We're dealing with fuzzy concepts that we still fundamentally do not understand. Somewhere along the line of AI research I think we're going to start to get some real insight into this problem, and I'm very curious to find out.
ARC is interesting because it's a trivial human task that AI can't comprehend.
Unless that changes, it's definitely indicative of lacking something fundamental to achieving whatever may be defined as "intelligence"
If we get AGI I hope in return for being an unemployable knowledge worker I at least get infinite medical solutions and infinite fusion energy.
Only a guess, but I think that’s an unlikely scenario. I don’t imagine AGI will be magic in the sense of “automatically solves R&D.” There will be bottlenecks and Amdahl’s law applies. In the case of drugs, doing medical trials will be a bottleneck, some subproblems of improving medical trials have political aspects to them, and AGI isn’t likely to magically solve politics.
When Amdahl’s law doesn’t apply, things might get a little weird, though?