Uh, how about trying a little harder to find a good link to post? This is just a random social media comment with no source at all. Edit: did a few searches and didn’t find anything.
Uh, how about trying a little harder to find a good link to post? This is just a random social media comment with no source at all.
Edit: did a few searches and didn’t find anything.
Trying out the new digg (di.gg), thought it was pretty cool. So the site scraps X and compiles the data to make news posts - specific to AI at this time.
Trying out the new digg (di.gg), thought it was pretty cool. So the site scraps X and compiles the data to make news posts - specific to AI at this time.
So it is taking what is already rumours amd tertiary sources and then repackages them further obscuring any way to actually fact check the claims. To be frank, this is nothing more than a tweet by...
Exemplary
So it is taking what is already rumours amd tertiary sources and then repackages them further obscuring any way to actually fact check the claims.
To be frank, this is nothing more than a tweet by someone who apparently runs an AI company of sorts. This person seems to have no ties whatsoever with google. In their tweet they already call it a rumor.
I see zero value in an AI rehash of a social media platform that already os questionable at best as a source of truth.
True, and solid observation, however there has been a trend that when a new AI comes out, its pretty much a strong leap better than the last. So, if they saw some data that showed a better speed -...
True, and solid observation, however there has been a trend that when a new AI comes out, its pretty much a strong leap better than the last. So, if they saw some data that showed a better speed - even if exaggerated - it was likely still faster.
I haven't noticed that trend myself. More and more these releases feel like more and more marginal improvements. Hard to be able to verify it either way factually when the benchmarks are both...
I haven't noticed that trend myself. More and more these releases feel like more and more marginal improvements. Hard to be able to verify it either way factually when the benchmarks are both apparently heavily gamed and often fail to properly evaluate tasks even when played straight.
Subjectively, the newest models still struggle on my personal benchmark tasks. And I feel less and less the need to stay on top of the latest developments to get the most out of things. The only big headliners I see anymore are efficiency. Which is fantastic, but they don't come with corresponding performance leaps. That all smells logarithmic to me.
Then there is no reason to post about it as it is basically a given anyway. Sorry to say, but this is 100% ai slop drivel and I think we can do without this ai slop drivel nonsense on Tildes. I...
Then there is no reason to post about it as it is basically a given anyway. Sorry to say, but this is 100% ai slop drivel and I think we can do without this ai slop drivel nonsense on Tildes. I realize this is extremely direct, but you yourself seem to be saying there is actually nothing new or of value in this, so why then post it?
If you want to actually talk about generational improvements of AI. Then I'd invite you to make a text post about it, actually write down your own thoughts and speculations rather than posting the tertiary source of a tertiary source slightly rehashed by a LLM.
This could be the nail in the coffin for ChatGPT (IMO).
92% of GPT-5.5’s coding and reasoning performance, reportedly at 15–20x lower inference cost. And the latency? Sub-200ms for most queries. -@kimmonismus
This could be the nail in the coffin for ChatGPT (IMO).
It was. There’s been some back and forth with Claude doing very well but then sometime in the last month GPT got better again (maybe). Or rather Claude got worse. With the suspicion being that...
It was. There’s been some back and forth with Claude doing very well but then sometime in the last month GPT got better again (maybe). Or rather Claude got worse.
With the suspicion being that it’s because GPT can afford enough computer while anthropoic is capping out with the higher demand
It's a "leak" from a future release. So, not sure how reliable that is, but Google has not been playing around on their releases lately - nor do they bluff.
It's a "leak" from a future release. So, not sure how reliable that is, but Google has not been playing around on their releases lately - nor do they bluff.
My personal experiences with 3.1 Pro beg to differ... I have to enable Canvas to get anything usable, and even then, I can get 3 or 4 queries max before it starts losing track and context of what...
but Google has not been playing around on their releases lately
My personal experiences with 3.1 Pro beg to differ... I have to enable Canvas to get anything usable, and even then, I can get 3 or 4 queries max before it starts losing track and context of what I'm trying to do. Flash and Thinking have been fine for looking up information, generating practice tests and study mnemonics, and image generation for my D&D character. But specific to coding, Gemini has left a lot to be desired, for me.
nor do they bluff.
I mean.. I guess that depends on your definition of "bluff", but at a minimum, I can think of a couple instances where they over promised and under delivered (cough pixel pass cough).
Don't get me wrong, I'm a generally content Google user, but until 3.2 is released and benchmarked by people whose income does not depend on how good 3.2 is, I'll not hold my breath and continue to use Claude anytime I need assistance with coding projects.
Even if its self-serving, I'm glad that AI companies are trying to focus on efficiency gains versus just flat performance gains. I know its mostly because we're starting to plateau on pure...
Even if its self-serving, I'm glad that AI companies are trying to focus on efficiency gains versus just flat performance gains. I know its mostly because we're starting to plateau on pure performance, but if they're going to insist on burning through billions of gallons of water and gigawatts of power, efficiency gains like this make a massive difference.
I don't see much evidence that performance is plateauing, except in the sense that for a lot of simpler questions, the answers we get now are good enough and hard to improve much on. But you can...
I don't see much evidence that performance is plateauing, except in the sense that for a lot of simpler questions, the answers we get now are good enough and hard to improve much on. But you can ask harder questions.
They're working on both efficiency and performance gains and they go together. For example, cheaper tokens means you can spend more of them to get better results.
I'm not very tuned in but I thought general consensus was that AI training was taking more processing power for less gains each generation, but I'm totally open to that being false. I don't know...
I'm not very tuned in but I thought general consensus was that AI training was taking more processing power for less gains each generation, but I'm totally open to that being false. I don't know that much about it
How do I truthfully learn which AI is best? Or which AI is going to be best in the future? I cannot keep track of all the AIs as they change. Myself, I do not know how to use the AI. However, I...
How do I truthfully learn which AI is best? Or which AI is going to be best in the future? I cannot keep track of all the AIs as they change. Myself, I do not know how to use the AI. However, I have asked my niece about Chat and she said she buys it for computer programming (subscription). Then she has switched to Claude for computer programming. If we are to buy subscription for AI, how do we learn the most correct choice?
I read on Tildes about Mythos' great hacking skills. However, I do not need to hack, because that is illegal. I assume that the AI developers are trying to create the AIs that are best at defending against hacks. However, maybe that is not most useful thing for regular person such as my niece. So I am asking the purpose benchmarks for AI speed are useful to regular people, or only advanced technology user defending against hacks?
All the paid models are close enough to one another that there's not really a wrong choice. That's especially true since you're (presumably) going to be using it for everyday tasks and not...
All the paid models are close enough to one another that there's not really a wrong choice. That's especially true since you're (presumably) going to be using it for everyday tasks and not something like coding or mathematical reasoning.
Personally I would recommend you follow in your niece's footsteps and get Claude. But any of the paid models will be just fine for your purposes.
To answer your question about benchmarks - not really useful to regular people. Not directly, anyway. It matters because it has follow-on effects, but for the purposes of this discussion it doesn't matter.
I just want to add, if you're going to buy a subscription for everyday use, get it from an actual model provider, not any of the countless 3rd parties that are basically resellers with...
I just want to add, if you're going to buy a subscription for everyday use, get it from an actual model provider, not any of the countless 3rd parties that are basically resellers with questionable harness improvements. So that means: Anthropic, Chat GPT or Google. As R3qn65 said, any of those will be fine for basic use. Probably Chat GPT as it's more multi-modal (better at generating response in more formats, Anthropic models can't generate images for example).
Uh, how about trying a little harder to find a good link to post? This is just a random social media comment with no source at all.
Edit: did a few searches and didn’t find anything.
Trying out the new digg (di.gg), thought it was pretty cool. So the site scraps X and compiles the data to make news posts - specific to AI at this time.
So it is taking what is already rumours amd tertiary sources and then repackages them further obscuring any way to actually fact check the claims.
To be frank, this is nothing more than a tweet by someone who apparently runs an AI company of sorts. This person seems to have no ties whatsoever with google. In their tweet they already call it a rumor.
I see zero value in an AI rehash of a social media platform that already os questionable at best as a source of truth.
True, and solid observation, however there has been a trend that when a new AI comes out, its pretty much a strong leap better than the last. So, if they saw some data that showed a better speed - even if exaggerated - it was likely still faster.
I haven't noticed that trend myself. More and more these releases feel like more and more marginal improvements. Hard to be able to verify it either way factually when the benchmarks are both apparently heavily gamed and often fail to properly evaluate tasks even when played straight.
Subjectively, the newest models still struggle on my personal benchmark tasks. And I feel less and less the need to stay on top of the latest developments to get the most out of things. The only big headliners I see anymore are efficiency. Which is fantastic, but they don't come with corresponding performance leaps. That all smells logarithmic to me.
Then there is no reason to post about it as it is basically a given anyway. Sorry to say, but this is 100% ai slop drivel and I think we can do without this ai slop drivel nonsense on Tildes. I realize this is extremely direct, but you yourself seem to be saying there is actually nothing new or of value in this, so why then post it?
If you want to actually talk about generational improvements of AI. Then I'd invite you to make a text post about it, actually write down your own thoughts and speculations rather than posting the tertiary source of a tertiary source slightly rehashed by a LLM.
Yeah, using aggregators is fine, but try to get back to wherever they got the data from and see if there's anything to them.
This could be the nail in the coffin for ChatGPT (IMO).
Isn’t Claude also outperforming it in programming?
Yes it’s much better. I basically have not used ChatGPT anymore since getting Claude
Same, I either use Claude or Gemini Flash ("free")
I think so, and if I recall it's since they jumped onboard with xAI servers.
It was. There’s been some back and forth with Claude doing very well but then sometime in the last month GPT got better again (maybe). Or rather Claude got worse.
With the suspicion being that it’s because GPT can afford enough computer while anthropoic is capping out with the higher demand
These AI wars do my head in because of how volatile everything is moment to moment
Yeah it's nuts and it makes everything uncertain.
Is this data from a closed beta or something? I don't see any 3.2 models listed on the available models.
It's a "leak" from a future release. So, not sure how reliable that is, but Google has not been playing around on their releases lately - nor do they bluff.
My personal experiences with
3.1 Probeg to differ... I have to enable Canvas to get anything usable, and even then, I can get 3 or 4 queries max before it starts losing track and context of what I'm trying to do.FlashandThinkinghave been fine for looking up information, generating practice tests and study mnemonics, and image generation for my D&D character. But specific to coding, Gemini has left a lot to be desired, for me.I mean.. I guess that depends on your definition of "bluff", but at a minimum, I can think of a couple instances where they over promised and under delivered (cough pixel pass cough).
Don't get me wrong, I'm a generally content Google user, but until 3.2 is released and benchmarked by people whose income does not depend on how good 3.2 is, I'll not hold my breath and continue to use Claude anytime I need assistance with coding projects.
Even if its self-serving, I'm glad that AI companies are trying to focus on efficiency gains versus just flat performance gains. I know its mostly because we're starting to plateau on pure performance, but if they're going to insist on burning through billions of gallons of water and gigawatts of power, efficiency gains like this make a massive difference.
Would agree. Those tokens = energy somewhere. The more we can accomplish for less is better for everyone.
I don't see much evidence that performance is plateauing, except in the sense that for a lot of simpler questions, the answers we get now are good enough and hard to improve much on. But you can ask harder questions.
They're working on both efficiency and performance gains and they go together. For example, cheaper tokens means you can spend more of them to get better results.
I'm not very tuned in but I thought general consensus was that AI training was taking more processing power for less gains each generation, but I'm totally open to that being false. I don't know that much about it
How do I truthfully learn which AI is best? Or which AI is going to be best in the future? I cannot keep track of all the AIs as they change. Myself, I do not know how to use the AI. However, I have asked my niece about Chat and she said she buys it for computer programming (subscription). Then she has switched to Claude for computer programming. If we are to buy subscription for AI, how do we learn the most correct choice?
I read on Tildes about Mythos' great hacking skills. However, I do not need to hack, because that is illegal. I assume that the AI developers are trying to create the AIs that are best at defending against hacks. However, maybe that is not most useful thing for regular person such as my niece. So I am asking the purpose benchmarks for AI speed are useful to regular people, or only advanced technology user defending against hacks?
All the paid models are close enough to one another that there's not really a wrong choice. That's especially true since you're (presumably) going to be using it for everyday tasks and not something like coding or mathematical reasoning.
Personally I would recommend you follow in your niece's footsteps and get Claude. But any of the paid models will be just fine for your purposes.
To answer your question about benchmarks - not really useful to regular people. Not directly, anyway. It matters because it has follow-on effects, but for the purposes of this discussion it doesn't matter.
I just want to add, if you're going to buy a subscription for everyday use, get it from an actual model provider, not any of the countless 3rd parties that are basically resellers with questionable harness improvements. So that means: Anthropic, Chat GPT or Google. As R3qn65 said, any of those will be fine for basic use. Probably Chat GPT as it's more multi-modal (better at generating response in more formats, Anthropic models can't generate images for example).
From my very basic understanding of that, its a speed test for algorithms, logics q's, etc.