30 votes

GPT 5 released

24 comments

  1. [11]
    teaearlgraycold
    Link
    People have been roasting OpenAI for this release quite a bit on Hacker News. There are a number of deceptive charts that are now highlighted on this new site https://www.vibechart.net/. Besides...

    People have been roasting OpenAI for this release quite a bit on Hacker News. There are a number of deceptive charts that are now highlighted on this new site https://www.vibechart.net/.

    Besides that has anyone used GPT-5? What do you think?

    28 votes
    1. [8]
      tibpoe
      Link Parent
      I haven't used it, but in general I like to look at https://lmarena.ai/leaderboard whenever a new model comes out. It's a blind head-to-head test of LLM models. In this case, I don't see gpt-5 as...

      I haven't used it, but in general I like to look at https://lmarena.ai/leaderboard whenever a new model comes out. It's a blind head-to-head test of LLM models.

      In this case, I don't see gpt-5 as anything but a tiny improvement over existing models, or maybe not and it's just noise. It's good OpenAI finally has a coding model, but it doesn't look like it's likely to be a significant enough improvement to bother trying out.

      What I am excited about is their gpt-oss-120b from the other day. It's pretty good: not top tier, but fine for many queries. It's incredible how fast it is, especially if I use the Cerebras-hosted version, which runs at almost 3k tokens/sec. It makes the whole process of iterating on prompts much faster and easier, and it feels like it has some kind of killer application that I just have not figured out yet.

      16 votes
      1. teaearlgraycold
        Link Parent
        I believe their OSS models only have 4B parameters active at any given time, which explains the speed.

        I believe their OSS models only have 4B parameters active at any given time, which explains the speed.

        4 votes
      2. [6]
        d32
        Link Parent
        Minor, but: GPT4.1 was already a quite competent coding model. It was used as the default for Github Copilot, for example.

        Minor, but: GPT4.1 was already a quite competent coding model. It was used as the default for Github Copilot, for example.

        3 votes
        1. [5]
          zestier
          (edited )
          Link Parent
          Did you manage to have success with that? I tried it out pretty heavily due to being the Copilot default and it was, quite frankly, terrible. It had a strong tendency to do absolutely nothing of...

          Did you manage to have success with that? I tried it out pretty heavily due to being the Copilot default and it was, quite frankly, terrible. It had a strong tendency to do absolutely nothing of value. Especially when compared to Claude. The poor performance of 4.1 in particular made me think that coding agents were far worse than they actually were for quite a long time. I was confused how anyone was getting AI to produce really anything because it being the free default meant it was the one I was actually trying and that's what I was basing my judgment on. It took me trying Claude to realize it was a model problem, not AI agents being universally bad.

          I suppose it was fine in auto complete mode, but it's an effectively worthless agent in my opinion.

          3 votes
          1. [3]
            teaearlgraycold
            Link Parent
            Different model, but I was shocked the other day when I saw the newest Claude Sonnet call useRef instead of useState in a React component. It was destructuring an array on the left side of the...

            Different model, but I was shocked the other day when I saw the newest Claude Sonnet call useRef instead of useState in a React component. It was destructuring an array on the left side of the assignment so it wasn’t just a style choice (useRef can be a perfectly fine choice for component state). To those not familiar with React, this pattern is probably one of the most common things in the code training data. To have a billion dollar statistical model fail in this way boggles the mind.

            I think the companies are currently cutting corners like crazy to reduce spending. Worse, they’re probably doing this in A/B tests so not everyone’s going to get the same bad experiments.

            7 votes
            1. [2]
              zestier
              (edited )
              Link Parent
              It is definitely not helped by the design of React hooks being confusing to enough people that they're probably frequently misused in the training data. I really really wanted to like hooks, but...

              It is definitely not helped by the design of React hooks being confusing to enough people that they're probably frequently misused in the training data. I really really wanted to like hooks, but holy cow do they confuse people.

              On AI, yeah, there's still a lot to be desired. The biggest difference I was encountering between Claude and GPT 4.1 was that Claude would at least try to produce a complete solution to a problem. It may have issues, but it at least tried. GPT, especially if the agent was trying to solve a problem across multiple files, would just act like stubs were sufficient. It would just emit stuff like

              function solveProblem() {
                 // TODO: solve the problem
              }
              

              as the final result. I could mitigate that a little with Roo by telling it to keep iterating until all todos were complete and all tests pass, but that was still significantly less effective than ust a single basic prompt to Claude.

              4 votes
              1. teaearlgraycold
                Link Parent
                React Hooks is a cache invalidation nightmare. It needs some constraints and a static analyzer (linters can help a bit) to fix that.

                React Hooks is a cache invalidation nightmare. It needs some constraints and a static analyzer (linters can help a bit) to fix that.

                2 votes
          2. d32
            Link Parent
            Yes! And I was amazed at how good it was. But with one caveat: I used it exclusively as a "coder" - code generating model, always paired with some stronger model - such as Sonnet 4 or sometimes...

            Did you manage to have success with that?

            Yes! And I was amazed at how good it was. But with one caveat: I used it exclusively as a "coder" - code generating model, always paired with some stronger model - such as Sonnet 4 or sometimes GPT4.1 - to actually come up with the implementation plan. Once the plan was solid, 4.1 was able to take it and implement any needed changes, quickly and precisely.

            Additional details: This was in Roo Code, stronger model was "Architect" while 4.1 was used in "Code" mode. Mostly Python, but also some Typescript and bash.

            1 vote
    2. lackofaname
      Link Parent
      Yeesh, are those charts real actual data visualizations published? I'm hardpressed to call it deceptive, it's just kind of embarrassing.

      Yeesh, are those charts real actual data visualizations published? I'm hardpressed to call it deceptive, it's just kind of embarrassing.

      8 votes
    3. ssk
      Link Parent
      Indeed! It was able to do the front 80% of conversion of my iOS Puzzle game app into a Flutter app. Into enough of a place where Claude Code can complete the rest

      Indeed! It was able to do the front 80% of conversion of my iOS Puzzle game app into a Flutter app. Into enough of a place where Claude Code can complete the rest

      3 votes
  2. [8]
    Prodiggles
    Link
    So, I was for longest time in the camp of staying just on the edge of AI code agent tools. Honestly, would have been better off because I started to discover a ton of value on the learning side of...

    So, I was for longest time in the camp of staying just on the edge of AI code agent tools. Honestly, would have been better off because I started to discover a ton of value on the learning side of things.

    Generally, I don't enjoy learning outside my knowledge areas unless they feel applicable. Example: Trivia feels so useless to me, Jeopardy is fun for some not for me.

    However, one thing I tried and just couldn't get into was learning about game development. There's just a ton of reading and trying to figure out specific interfaces, patterns, and quirks of engines and their languages.

    In less than a couple days, I put together a fairly decent proof of concept incorporating a mix of run & gun as well as Roguelike levels. I'd say at first I was making progress, but would hit a wall and just kinda start over. I went through almost all of the coding agents and learned a ton as I remade it from start, let the agent take the wheel and improved my prompts each try.

    At one point, I kinda stopped over-babying and let it take the wheel, checking in and validating it's work. While an agent was working, I was building out a spec for the next feature or task to send its way. While it worked on boilerplate patterns with some ambiguous designs, I worked on improving it's context opportunities (MCP, integrations with other tools, etc.) so every query would be better and produce better results that I'd have it write tests for.

    A lot of really cool ideas I wouldn't have come up with, but it gave me headspace to work on the harder problems while it toiled away.

    I've switched from unity to godot now and on my 5th go at the project, I don't need to spend time learning the API. Instead, I can focus on the concepts and study what it puts together and applying patterns I would have been bored to look through a complete project.

    Just my journey, this far, but GPT-5 already impressed me in the first couple queries and a lot of providers are giving unlimited tokens for an adoption period, but the cool feature of it determining the appropriate model for parts of my query will definitely save on cost over time.

    15 votes
    1. [5]
      Minori
      Link Parent
      When you say you hit a wall and start over, what's making you restart? It sounds expensive running agents for that much.

      When you say you hit a wall and start over, what's making you restart?

      It sounds expensive running agents for that much.

      4 votes
      1. [4]
        Prodiggles
        Link Parent
        I think at this point I'm not fully invested in a single tool yet that can cater to my dev needs and I'm not focused on completing a project but learning how I can build an effective workflow...

        I think at this point I'm not fully invested in a single tool yet that can cater to my dev needs and I'm not focused on completing a project but learning how I can build an effective workflow system. I'm really interested in how different editors or IDEs work with their AI agents on the same problem.

        My goal eventually is to find suitable environments and configurations to catalog best candidates for an AI agent orchestration project. I've been continually improving my prompts, types of documentation to regularly have it generate, distill things down, then find out and rank how each does on the same problem.

        While the cost seems high now, I'm learning a ton and really there's a time factor here because even with my gaming rig, running ollama even on just a 4B or 8B parameters model takes 3-4x longer with a lower quality result. I'd have to get 3 more GPUs and it would still be worse off than using a free or $10 subscription. The economics play too well for the big players and they're all competing for our attention and dollars.

        5 votes
        1. [3]
          d32
          Link Parent
          Would you mind sharing some combination of tools you've used? I guess everyone knows about VS code + cline or forks, but for example, how did you leverage agentic coding together with Godot? It...

          Would you mind sharing some combination of tools you've used?
          I guess everyone knows about VS code + cline or forks, but for example, how did you leverage agentic coding together with Godot? It has its own IDE, doesn't it? What goes with it? Command line agents?

          1 vote
          1. [2]
            zestier
            (edited )
            Link Parent
            Godot also has vsx extensions. It is entirely possible to use the Godot application as just a visual editor and then do the scripting in an IDE that supports AI. For example, all you need to do to...

            Godot also has vsx extensions. It is entirely possible to use the Godot application as just a visual editor and then do the scripting in an IDE that supports AI. For example, all you need to do to use it with Copilot in VSCode is have both the Copilot and Godot extensions installed at the same time.

            3 votes
            1. Prodiggles
              Link Parent
              Basically this, but one step further you can use the Godot MCP server that basically gives hooks into different operations that are slightly more native for VS Code Copilot, Cursor, or Rider to...

              Basically this, but one step further you can use the Godot MCP server that basically gives hooks into different operations that are slightly more native for VS Code Copilot, Cursor, or Rider to call into. This can give information such as runtime errors or node properties and hierarchy a bit more reliably than trying to export them via a .gd script.

              1 vote
    2. [2]
      chundissimo
      Link Parent
      Did you find any flows / tips for working with nodes in Godot with AI tools? I tried briefly and it felt a little awkward to use an LLM for code generation but have to figure out the proper node...

      Did you find any flows / tips for working with nodes in Godot with AI tools? I tried briefly and it felt a little awkward to use an LLM for code generation but have to figure out the proper node structure manually. I guess you can ask it for structure tips but I didn’t find the results of that super compelling

      1 vote
      1. Prodiggles
        (edited )
        Link Parent
        The biggest tip I can give is to spend time with VS Code Copilot and figure out how to use the MCP servers. There is a Godot one, Context7 is great for standards, Github MCP for finding similar...

        The biggest tip I can give is to spend time with VS Code Copilot and figure out how to use the MCP servers. There is a Godot one, Context7 is great for standards, Github MCP for finding similar projects, etc.

        What's MCP? Basically magic, but sorta. Most of the productive session I've had were to fill it up with as much knowledge as possible, have it break down into smaller steps (or patterns), then build on top of it. I wouldn't spend a ton of time figuring out the node structure, ask copilot what tools it would find valuable, search online for MCP servers, and generally give it as much structure as possible (in markdown documents) and it comes super duper close most of the time.

        Also Claude is generally better at figuring out meaning a lot of the time while GPT is more explicit in what it does, so try and play with different tools on the same problem. There's a copilot free trial, use that and I think you get 300 premium requests or something to that effect. At this point, I can give it a problem statement and execution steps, have it break it down into smaller segments, then tell it to run with the plan and can step away for 15min and be back with some output.

        Having it write tests for expected behavior is another way to have it enrich its own context, so that serves as documentation as well.

        2 votes
  3. [2]
    entitled-entilde
    Link
    Playing a bit with it, the setup they provide in chat mode seems much more dynamic. Before you’d hit it with a question and after reasoning you get a response. Now I think it knows to a certain...

    Playing a bit with it, the setup they provide in chat mode seems much more dynamic. Before you’d hit it with a question and after reasoning you get a response. Now I think it knows to a certain extent how hard the problem will be. So it might respond right away. Or it might start searching on the internet. That’s the biggest change I see so far. Overall I’m happy with the improvement

    10 votes
    1. EgoEimi
      Link Parent
      And while it tries to think harder, the interface offers the option to shorten the thinking for a more immediate albeit less thought out response. I noticed a modest improvement in accuracy and...

      And while it tries to think harder, the interface offers the option to shorten the thinking for a more immediate albeit less thought out response.

      I noticed a modest improvement in accuracy and tone. It uses less unnecessary language and is more succinct. There's definitely a lot less glazing.

      A groundbreaking version? Not really. But still it's a solid iteration over the 4x models.

      2 votes
  4. tauon
    (edited )
    Link
    Platformer on the model (models?): Yeah, I’m not sure we’re getting to AGI on the current iteration of model architectures. PhD levels of smart? Sure. But that’s not nearly the same as a...

    Platformer on the model (models?):

    GPT-5 arrives more than two years after GPT-4 — and in some ways, a completely different world. (Two of the people in the photo for a New York Times story about the launch later left OpenAI and now run their own AI companies with multibillion-dollar valuations.) OpenAI has consistently released notable new models in the time since, but has reserved the "5" designation in hopes of hitting a significant new milestone on the road to artificial intelligence.

    […]

    For [ChatGPT free tier users], I imagine GPT-5 will feel a bit like buying a new iPhone after a few years. On one hand, it's meaningfully improved across lots of dimensions. On the other, it's still just an iPhone.

    Yeah, I’m not sure we’re getting to AGI on the current iteration of model architectures. PhD levels of smart? Sure. But that’s not nearly the same as a self-improving, all-capable learning agent… But we’re definitely in interesting times with everyone racing to try!

    8 votes
  5. Oodelally
    Link
    I don't know when or how this has changed, but the biggest changes I've noticed since the last time I've used chagpt: 1 - It has a larger "context window". It can remember a lot more of a...

    I don't know when or how this has changed, but the biggest changes I've noticed since the last time I've used chagpt:

    1 - It has a larger "context window". It can remember a lot more of a conversation, or even now pull from previous conversations and can continue to discuss things it's discussed in the past.

    2 - The freely available release can pull information from the internet while in a conversation. So information is much more up-to-date.

    2 votes
  6. dlay
    Link
    I just wish this would stop. I can’t wait for OpenAI and the generative “AI” industry to implode.

    I just wish this would stop. I can’t wait for OpenAI and the generative “AI” industry to implode.

    6 votes