16 votes

Gemini 3.2 Flash rumored to hit 92% of GPT-5.5 performance at lower cost

26 comments

  1. [7]
    skybrian
    (edited )
    Link
    Uh, how about trying a little harder to find a good link to post? This is just a random social media comment with no source at all. Edit: did a few searches and didn’t find anything.

    Uh, how about trying a little harder to find a good link to post? This is just a random social media comment with no source at all.

    Edit: did a few searches and didn’t find anything.

    28 votes
    1. [6]
      TylerSuits
      Link Parent
      Trying out the new digg (di.gg), thought it was pretty cool. So the site scraps X and compiles the data to make news posts - specific to AI at this time.

      Trying out the new digg (di.gg), thought it was pretty cool. So the site scraps X and compiles the data to make news posts - specific to AI at this time.

      3 votes
      1. [4]
        creesch
        Link Parent
        So it is taking what is already rumours amd tertiary sources and then repackages them further obscuring any way to actually fact check the claims. To be frank, this is nothing more than a tweet by...
        • Exemplary

        So it is taking what is already rumours amd tertiary sources and then repackages them further obscuring any way to actually fact check the claims.

        To be frank, this is nothing more than a tweet by someone who apparently runs an AI company of sorts. This person seems to have no ties whatsoever with google. In their tweet they already call it a rumor.

        I see zero value in an AI rehash of a social media platform that already os questionable at best as a source of truth.

        31 votes
        1. [3]
          TylerSuits
          Link Parent
          True, and solid observation, however there has been a trend that when a new AI comes out, its pretty much a strong leap better than the last. So, if they saw some data that showed a better speed -...

          True, and solid observation, however there has been a trend that when a new AI comes out, its pretty much a strong leap better than the last. So, if they saw some data that showed a better speed - even if exaggerated - it was likely still faster.

          1. Diff
            Link Parent
            I haven't noticed that trend myself. More and more these releases feel like more and more marginal improvements. Hard to be able to verify it either way factually when the benchmarks are both...

            I haven't noticed that trend myself. More and more these releases feel like more and more marginal improvements. Hard to be able to verify it either way factually when the benchmarks are both apparently heavily gamed and often fail to properly evaluate tasks even when played straight.

            Subjectively, the newest models still struggle on my personal benchmark tasks. And I feel less and less the need to stay on top of the latest developments to get the most out of things. The only big headliners I see anymore are efficiency. Which is fantastic, but they don't come with corresponding performance leaps. That all smells logarithmic to me.

            2 votes
          2. creesch
            Link Parent
            Then there is no reason to post about it as it is basically a given anyway. Sorry to say, but this is 100% ai slop drivel and I think we can do without this ai slop drivel nonsense on Tildes. I...

            Then there is no reason to post about it as it is basically a given anyway. Sorry to say, but this is 100% ai slop drivel and I think we can do without this ai slop drivel nonsense on Tildes. I realize this is extremely direct, but you yourself seem to be saying there is actually nothing new or of value in this, so why then post it?

            If you want to actually talk about generational improvements of AI. Then I'd invite you to make a text post about it, actually write down your own thoughts and speculations rather than posting the tertiary source of a tertiary source slightly rehashed by a LLM.

            2 votes
      2. skybrian
        Link Parent
        Yeah, using aggregators is fine, but try to get back to wherever they got the data from and see if there's anything to them.

        Yeah, using aggregators is fine, but try to get back to wherever they got the data from and see if there's anything to them.

        5 votes
  2. [8]
    TylerSuits
    Link
    This could be the nail in the coffin for ChatGPT (IMO).

    92% of GPT-5.5’s coding and reasoning performance, reportedly at 15–20x lower inference cost. And the latency? Sub-200ms for most queries. -@kimmonismus

    This could be the nail in the coffin for ChatGPT (IMO).

    4 votes
    1. [7]
      cloud_loud
      Link Parent
      Isn’t Claude also outperforming it in programming?

      Isn’t Claude also outperforming it in programming?

      2 votes
      1. [2]
        ali
        Link Parent
        Yes it’s much better. I basically have not used ChatGPT anymore since getting Claude

        Yes it’s much better. I basically have not used ChatGPT anymore since getting Claude

        3 votes
        1. TylerSuits
          Link Parent
          Same, I either use Claude or Gemini Flash ("free")

          Same, I either use Claude or Gemini Flash ("free")

      2. TylerSuits
        Link Parent
        I think so, and if I recall it's since they jumped onboard with xAI servers.

        I think so, and if I recall it's since they jumped onboard with xAI servers.

      3. [3]
        Eji1700
        Link Parent
        It was. There’s been some back and forth with Claude doing very well but then sometime in the last month GPT got better again (maybe). Or rather Claude got worse. With the suspicion being that...

        It was. There’s been some back and forth with Claude doing very well but then sometime in the last month GPT got better again (maybe). Or rather Claude got worse.

        With the suspicion being that it’s because GPT can afford enough computer while anthropoic is capping out with the higher demand

        1. [2]
          cloud_loud
          Link Parent
          These AI wars do my head in because of how volatile everything is moment to moment

          These AI wars do my head in because of how volatile everything is moment to moment

          1 vote
          1. Eji1700
            Link Parent
            Yeah it's nuts and it makes everything uncertain.

            Yeah it's nuts and it makes everything uncertain.

  3. [3]
    goose
    Link
    Is this data from a closed beta or something? I don't see any 3.2 models listed on the available models.

    Is this data from a closed beta or something? I don't see any 3.2 models listed on the available models.

    4 votes
    1. [2]
      TylerSuits
      Link Parent
      It's a "leak" from a future release. So, not sure how reliable that is, but Google has not been playing around on their releases lately - nor do they bluff.

      It's a "leak" from a future release. So, not sure how reliable that is, but Google has not been playing around on their releases lately - nor do they bluff.

      2 votes
      1. goose
        Link Parent
        My personal experiences with 3.1 Pro beg to differ... I have to enable Canvas to get anything usable, and even then, I can get 3 or 4 queries max before it starts losing track and context of what...

        but Google has not been playing around on their releases lately

        My personal experiences with 3.1 Pro beg to differ... I have to enable Canvas to get anything usable, and even then, I can get 3 or 4 queries max before it starts losing track and context of what I'm trying to do. Flash and Thinking have been fine for looking up information, generating practice tests and study mnemonics, and image generation for my D&D character. But specific to coding, Gemini has left a lot to be desired, for me.

        nor do they bluff.

        I mean.. I guess that depends on your definition of "bluff", but at a minimum, I can think of a couple instances where they over promised and under delivered (cough pixel pass cough).

        Don't get me wrong, I'm a generally content Google user, but until 3.2 is released and benchmarked by people whose income does not depend on how good 3.2 is, I'll not hold my breath and continue to use Claude anytime I need assistance with coding projects.

        3 votes
  4. [4]
    OBLIVIATER
    Link
    Even if its self-serving, I'm glad that AI companies are trying to focus on efficiency gains versus just flat performance gains. I know its mostly because we're starting to plateau on pure...

    Even if its self-serving, I'm glad that AI companies are trying to focus on efficiency gains versus just flat performance gains. I know its mostly because we're starting to plateau on pure performance, but if they're going to insist on burning through billions of gallons of water and gigawatts of power, efficiency gains like this make a massive difference.

    3 votes
    1. TylerSuits
      Link Parent
      Would agree. Those tokens = energy somewhere. The more we can accomplish for less is better for everyone.

      Would agree. Those tokens = energy somewhere. The more we can accomplish for less is better for everyone.

      1 vote
    2. [2]
      skybrian
      Link Parent
      I don't see much evidence that performance is plateauing, except in the sense that for a lot of simpler questions, the answers we get now are good enough and hard to improve much on. But you can...

      I don't see much evidence that performance is plateauing, except in the sense that for a lot of simpler questions, the answers we get now are good enough and hard to improve much on. But you can ask harder questions.

      They're working on both efficiency and performance gains and they go together. For example, cheaper tokens means you can spend more of them to get better results.

      1. OBLIVIATER
        Link Parent
        I'm not very tuned in but I thought general consensus was that AI training was taking more processing power for less gains each generation, but I'm totally open to that being false. I don't know...

        I'm not very tuned in but I thought general consensus was that AI training was taking more processing power for less gains each generation, but I'm totally open to that being false. I don't know that much about it

        3 votes
  5. [4]
    turnipostrophe
    Link
    How do I truthfully learn which AI is best? Or which AI is going to be best in the future? I cannot keep track of all the AIs as they change. Myself, I do not know how to use the AI. However, I...

    How do I truthfully learn which AI is best? Or which AI is going to be best in the future? I cannot keep track of all the AIs as they change. Myself, I do not know how to use the AI. However, I have asked my niece about Chat and she said she buys it for computer programming (subscription). Then she has switched to Claude for computer programming. If we are to buy subscription for AI, how do we learn the most correct choice?

    I read on Tildes about Mythos' great hacking skills. However, I do not need to hack, because that is illegal. I assume that the AI developers are trying to create the AIs that are best at defending against hacks. However, maybe that is not most useful thing for regular person such as my niece. So I am asking the purpose benchmarks for AI speed are useful to regular people, or only advanced technology user defending against hacks?

    2 votes
    1. [2]
      R3qn65
      Link Parent
      All the paid models are close enough to one another that there's not really a wrong choice. That's especially true since you're (presumably) going to be using it for everyday tasks and not...

      All the paid models are close enough to one another that there's not really a wrong choice. That's especially true since you're (presumably) going to be using it for everyday tasks and not something like coding or mathematical reasoning.

      Personally I would recommend you follow in your niece's footsteps and get Claude. But any of the paid models will be just fine for your purposes.

      To answer your question about benchmarks - not really useful to regular people. Not directly, anyway. It matters because it has follow-on effects, but for the purposes of this discussion it doesn't matter.

      3 votes
      1. post_below
        Link Parent
        I just want to add, if you're going to buy a subscription for everyday use, get it from an actual model provider, not any of the countless 3rd parties that are basically resellers with...

        I just want to add, if you're going to buy a subscription for everyday use, get it from an actual model provider, not any of the countless 3rd parties that are basically resellers with questionable harness improvements. So that means: Anthropic, Chat GPT or Google. As R3qn65 said, any of those will be fine for basic use. Probably Chat GPT as it's more multi-modal (better at generating response in more formats, Anthropic models can't generate images for example).

    2. TylerSuits
      Link Parent
      From my very basic understanding of that, its a speed test for algorithms, logics q's, etc.

      From my very basic understanding of that, its a speed test for algorithms, logics q's, etc.