19 votes

Anthropic announces New Claude 3.5 Sonnet, Claude 3.5 Haiku and the Computer Use API

13 comments

  1. [11]
    delphi
    Link
    I've yet to fully test its limits, but from a cursory trial, New 3.5S (someone call the branding police) is indeed way more capable when it comes to text, especially paraphrasing naturally and...

    Today, we’re announcing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. The upgraded Claude 3.5 Sonnet delivers across-the-board improvements over its predecessor, with particularly significant gains in coding—an area where it already led the field. Claude 3.5 Haiku matches the performance of Claude 3 Opus, our prior largest model, on many evaluations for the same cost and similar speed to the previous generation of Haiku.

    I've yet to fully test its limits, but from a cursory trial, New 3.5S (someone call the branding police) is indeed way more capable when it comes to text, especially paraphrasing naturally and recreating text styles. It's also just more enjoyable to talk to. Can't see myself going down to the new 3.5H, cause 3.5S is simply cheap enough, but time will tell.

    We’re also introducing a groundbreaking new capability in public beta: computer use. Available today on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta. At this stage, it is still experimental—at times cumbersome and error-prone. We're releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time.

    Now THIS is a feature I can't wait to tinker with. It looks to me like giving Claude the ability to use a computer through the HID layer is way more in line with what Microsoft wanted the original Windows Copilot to be, and the demos are - even if they're very early - extremely impressive and promising.

    9 votes
    1. PetitPrince
      Link Parent
      I am wary of a future where there's no proper API and automation happens on the HID layer (why design a good API when GPT13 can use the same interface as our user?) . This seems like a tremendous...

      the ability to use a computer through the HID layer

      I am wary of a future where there's no proper API and automation happens on the HID layer (why design a good API when GPT13 can use the same interface as our user?) . This seems like a tremendous waste of resource (even though automating legacy software with no proper API seems appealing).

      8 votes
    2. [9]
      infpossibilityspace
      Link Parent
      The computer use demo seems needlessly wasteful for the sake of looking flashy to me? Why does it need to scroll through a document to look for information when it can use an API to search for the...

      The computer use demo seems needlessly wasteful for the sake of looking flashy to me?

      Why does it need to scroll through a document to look for information when it can use an API to search for the string in a fraction of the time? Or slowly type one character at a time rather than submitting a block of text into the box, again using an API?

      Making computers act like humans is like trying to make a car out of train parts - you can do it, but it'll be faster if you put in on some tracks instead...

      4 votes
      1. Grumble4681
        Link Parent
        Seems like there's a time and a place for HID automation solutions and perhaps it's planning to slot into that? It's not seemingly substantially different than a bunch of the other macro programs...

        Seems like there's a time and a place for HID automation solutions and perhaps it's planning to slot into that? It's not seemingly substantially different than a bunch of the other macro programs out there before it, or what people use AutoHotkey for, except with an LLM attached it might have someplace to go beyond where those could not.

        Microsoft's PowerAutomate is/was extremely powerful for an HID automation (I haven't used it in awhile because seemingly at some point they put it behind another gate that make it inaccessible to me). You could set it to look for certain pictures on the screen, or sets of pixels really, and it would follow commands based on recognition of that.

        To me it seems like they're basically trying to cross the barrier between developer and power user, an API is more of a developer layer and where the power user wants to automate is not likely going to have an API because they would ordinarily have no use for it.

        Rather than equating this to making a car out of train parts, I see it as more like making autonomous cars work on roads and systems built for humans. It's rather asinine in one perspective that a car is trying to interpret visual elements meant for humans when humans could build the equivalent of an API for autonomous cars to use on the roads, except I'd say it's fairly understandable why that doesn't happen in the short term for various reasons. You work with what is available now, not what is on your wishlist to be available for things you have no control over.

        8 votes
      2. [3]
        creesch
        Link Parent
        You are thinking too much like a capable engineer in a company who has their data in order. This is supposed to sell the product to managers in companies that have none of that and who don't...

        Why does it need to scroll through a document to look for information when it can use an API to search for the string in a fraction of the time?

        You are thinking too much like a capable engineer in a company who has their data in order. This is supposed to sell the product to managers in companies that have none of that and who don't realize that their dev teams are "slow to deliver" because they don't have their basics in order.

        It is the logical extension of "LLMs will magically clean up all your data" which is sounds incredible until you think about how LLMs are actually trained. By an army of humans carefully sifting through all input making sure no bullshit goes in.

        Something I wrote about last month as well.

        5 votes
        1. [2]
          infpossibilityspace
          Link Parent
          I agree. In my experience, one of the biggest problems is the lack of written documentation because it's all in people's heads who haven't had time to write it down, but then I still don't really...

          a company who has their data in order. This is supposed to sell the product to managers in companies that have none of that

          I agree. In my experience, one of the biggest problems is the lack of written documentation because it's all in people's heads who haven't had time to write it down, but then I still don't really see how this helps?

          Maybe if it sat in on all of your meetings and was able to message people after the fact when you had a question it wasn't able to answer? Perhaps that is the end goal, but we're still a ways from that I think...

          1 vote
          1. creesch
            Link Parent
            It doesn't, I am just saying that it appeals to people at companies who don't have their basic processes in order. As probably yet another magic "fix" so they can ignore their actual issues at hand.

            but then I still don't really see how this helps?

            It doesn't, I am just saying that it appeals to people at companies who don't have their basic processes in order. As probably yet another magic "fix" so they can ignore their actual issues at hand.

      3. [4]
        Greg
        Link Parent
        Does Windows desktop software commonly come with an API that can be accessed by other tools on the system? That was my first thought on who this is for: small to medium B2B customers who want to...

        Does Windows desktop software commonly come with an API that can be accessed by other tools on the system? That was my first thought on who this is for: small to medium B2B customers who want to automate whatever inefficient workflows they’ve built up around internal processes and/or task-specific proprietary tools.

        Could well be that there is some common automation standard there they could be hooking into that I’m not aware of - I don’t use or develop for Windows day to day - but my impression was that desktop applications tend to be tightly coupled to their UI toolkits as far as any external app is concerned.

        4 votes
        1. [3]
          infpossibilityspace
          Link Parent
          The example in the video of purely Web based apps like Excel, which should be easy. Office 365 has numerous APIs, not to mention Excel libraries for programming languages, so interfacing with...

          The example in the video of purely Web based apps like Excel, which should be easy. Office 365 has numerous APIs, not to mention Excel libraries for programming languages, so interfacing with Excel should be trivial.

          You can also use tools like Curl to GET and POST web data, or Everything/find+grep to search across your entire computer.

          If their example was of navigating some 15 year old offline legacy app, I'd be a lot more impressed. Right now it's just using OCR to do things like a human would, not like how a computer can.

          2 votes
          1. [2]
            Greg
            Link Parent
            I think that’s a failing of the demo video, not the tooling. I noticed that the “CRM” is running on localhost:3000, for example, so I’d make a heavy bet that this is just a developer testing...

            I think that’s a failing of the demo video, not the tooling. I noticed that the “CRM” is running on localhost:3000, for example, so I’d make a heavy bet that this is just a developer testing things/setting up a demo in the way that gives them the most visibility and flexibility. Claude is now the lead on the OSWorld benchmark by a significant margin, and that’s by no means limited to browser applications.

            Navigating like a human rather than in the theoretically more sensible machine readable ways makes it agnostic to whether there’s an API or not - there’d be no real way to move from web to desktop software if they hadn’t had it drive the mouse and keyboard. I think @Grumble4681’s self driving car analogy was spot on there: if it can handle navigating in the same way as humans, it doesn’t rely on updating the extremely long tail of places (or programs) that aren’t designed to be navigable by machines.

            Now, you could say that a zillion dollar company like Anthropic could have communicated that better in the video, and it’d be a fair comment, but I’ve also been the dev in the past who was told by marketing that they needed a screen recording by this afternoon and “just use whatever you’ve got set up”. I expect we’ll see examples of it navigating desktop software in the immediate future, assuming they aren’t out there already.

            2 votes
            1. Greg
              Link Parent
              In fact, I’ve just been recommended this video that shows it choosing which program to open, making its own decision to use a combination of visual OCR and browser devtools for a web based task,...

              In fact, I’ve just been recommended this video that shows it choosing which program to open, making its own decision to use a combination of visual OCR and browser devtools for a web based task, and controlling LibreOffice and XPaint for other tasks: https://youtu.be/DVRg0daTads

  2. [2]
    mantrid
    Link
    Computer Use sounds like what Rabbit R1's "large action model" was supposed to be.

    Computer Use sounds like what Rabbit R1's "large action model" was supposed to be.

    5 votes