17 votes

Megathread #8 for news/updates/discussion of AI chatbots and image generators

The hype seems to be dying down a bit? But I still find things to post. Here is the previous thread.

22 comments

  1. [5]
    streblo
    Link
    Google (researcher): "We Have No Moat, And Neither Does OpenAI" An allegedly leaked internal document from a Google researcher talking about how open source models are eating their (and OpenAIs)...

    Google (researcher): "We Have No Moat, And Neither Does OpenAI"

    An allegedly leaked internal document from a Google researcher talking about how open source models are eating their (and OpenAIs) lunch. And because this is all being done on top of the ‘leaked’ LLaMA, Meta stands to benefit the most.

    6 votes
    1. [2]
      Wes
      Link Parent
      I'm curious if Facebook will ultimately relicense LLaMA to make sure their platform is the universal one. They must be aware that the OSS competitors are catching up quickly. They also have a...

      I'm curious if Facebook will ultimately relicense LLaMA to make sure their platform is the universal one. They must be aware that the OSS competitors are catching up quickly. They also have a history of relicensing under public pressure, as happened with React.

      4 votes
      1. Greg
        Link Parent
        They relicensed EnCodec the other day, which is a dependency for a decent number of ML audio projects, so there's definitely recent precedent too. That's led to bark switching to an MIT license...

        They relicensed EnCodec the other day, which is a dependency for a decent number of ML audio projects, so there's definitely recent precedent too.

        That's led to bark switching to an MIT license this week, and I think it opens up at least one of the two VALL-E implementations I'm aware of as well - I have no idea how much is long term business 4D chess and how much is devs being devs, but it's good to see either way.

        3 votes
    2. [2]
      skybrian
      Link Parent
      I’ll repost my Hacker News comment: This gets attention due to being a leak, but it’s still just one Googler’s opinion and it has signs of being overstated for rhetorical effect. In particular,...

      I’ll repost my Hacker News comment:

      This gets attention due to being a leak, but it’s still just one Googler’s opinion and it has signs of being overstated for rhetorical effect.

      In particular, demos aren’t the same as products. Running a demo on one person’s phone is an important milestone, but if the device overheats and/or gets throttled then it’s not really something you’d want to run on your phone.

      It’s easy to claim that a problem is “solved” with a link to a demo when actually there’s more to do. People can link to projects they didn’t actually investigate. They can claim “parity” because they tried one thing and were impressed. Figuring out if something works well takes more effort. Could you write a product review, or did you just hear about it, or try it once?

      I haven’t investigated most projects either so I don’t know, but consider that things may not be moving quite as fast as demo-based hype indicates.

      4 votes
      1. DawnPaladin
        Link Parent
        Yep. For example, the paper says "Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening." That links to the Alpaca-LoRA repository. Having a personalized AI sounds...

        Yep. For example, the paper says "Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening." That links to the Alpaca-LoRA repository. Having a personalized AI sounds awesome, but I have no idea how to get there from here, or even what specifically a "personalized AI" means. Like, if I could train an AI on the particular programming languages and packages I use without having to shell out for GPT-4, that would be great, but just linking to the repository is a long way from proving their claims.

        2 votes
  2. [2]
    skybrian
    Link
    If you ask a chatbot why it wrote what it did, it has no idea, so it makes something up. It turns out there's no guarantee you will see its real thought process if you ask it to "think out loud"...

    If you ask a chatbot why it wrote what it did, it has no idea, so it makes something up. It turns out there's no guarantee you will see its real thought process if you ask it to "think out loud" either. (This is called "chain-of-thought" reasoning.)

    The researchers tested this using multiple choice questions where they bias the model. For example, they might bias it to believe that the answer is always A. The bot would never say that it noticed the pattern and that's why it picks A. It would pretend to think out loud in a way that results in picking A.

    They tested with GPT-3.5 and Claude.

    Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

    We demonstrate that CoT explanations can be heavily influenced by adding biasing features to model inputs -- e.g., by reordering the multiple-choice options in a few-shot prompt to make the answer always "(A)" -- which models systematically fail to mention in their explanations. When we bias models toward incorrect answers, they frequently generate CoT explanations supporting those answers.

    Also:

    On a social-bias task, model explanations justify giving answers in line with stereotypes without mentioning the influence of these social biases.

    6 votes
    1. Wes
      Link Parent
      (disclosure, only read the abstract and skimmed the paper) That pretty cool, as this is not dissimilar to how humans work. We often rationalize an explanation after we've already made it. This...

      (disclosure, only read the abstract and skimmed the paper)

      That pretty cool, as this is not dissimilar to how humans work. We often rationalize an explanation after we've already made it. This likely keeps cognitive dissonance down, but I think can also be explained because we grow the net of possible explanations once we elevate something from the unconscious to the conscious.

      It's so interesting that this current trend of AI research can be studied not only through a computer science lens, but also a psychological one. It's going to be really interesting once we start putting them in large groups, and seeing what sociological principles emerge...

      3 votes
  3. [2]
    unknown user
    Link
    A Catalog of “AI” Art Analogies From photography to stochastic parrot. The list also includes the usefulness and limitation of each analogy.

    A Catalog of “AI” Art Analogies From photography to stochastic parrot. The list also includes the usefulness and limitation of each analogy.

    4 votes
    1. skybrian
      Link Parent
      It’s nicely done, thanks for sharing. Other metaphors to be added: “bullshit” (disregard for the truth, often for self-serving reasons) versus “brainstorming” (disregard for the truth, done...

      It’s nicely done, thanks for sharing.

      Other metaphors to be added: “bullshit” (disregard for the truth, often for self-serving reasons) versus “brainstorming” (disregard for the truth, done intentionally for creative reasons).

      2 votes
  4. skybrian
    Link
    This company adopted AI. Here's what happened to its human workers (Planet Money) […] […] […]

    This company adopted AI. Here's what happened to its human workers (Planet Money)

    Last week, Brynjolfsson, together with MIT economists Danielle Li and Lindsey R. Raymond, released what is, to the best of our knowledge, the first empirical study of the real-world economic effects of new AI systems. They looked at what happened to a company and its workers after it incorporated a version of ChatGPT, a popular interactive AI chatbot, into workflows.

    […]

    The company's customer support agents are based primarily in the Philippines, but also the United States and other countries. And they spend their days helping small businesses tackle various kinds of technical problems with their software. Think like, "Why am I getting this error message?" or like, "Help! I can't log in!"

    Instead of talking to their customers on the phone, these customer service agents mostly communicate with them through online chat windows. These troubleshooting sessions can be quite long. The average conversation between the agents and customers lasts about 40 minutes. Agents need to know the ins and outs of their company's software, how to solve problems, and how to deal with sometimes irate customers. It's a stressful job, and there's high turnover. In the broader customer service industry, up to 60 percent of reps quit each year.

    Facing such high turnover rates, this software company was spending a lot of time and money training new staffers. And so, in late 2020, it decided to begin using an AI system to help its constantly churning customer support staff get better at their jobs faster. The company's goal was to improve the performance of their workers, not replace them.

    […]

    The economists examine the performance of over 5,000 agents, comparing the outcomes of old-school customer reps without AI against new, AI-enhanced cyborg customer reps.

    […]

    The economists' big finding: after the software company adopted AI, the average customer support representative became, on average, 14 percent more productive. They were able to resolve more customer issues per hour. That's huge. The company's workforce is now much faster and more effective. They're also, apparently, happier. Turnover has gone down, especially among new hires.

    Not only that, the company's customers are more satisfied. They give higher ratings to support staff. They also generally seem to be nicer in their conversations and are less likely to ask to speak to an agent's supervisor.

    So, yeah, AI seems to really help improve the work of the company's employees. But what's even more interesting is that not all employees gained equally from using AI. It turns out that the company's more experienced, highly skilled customer support agents saw little or no benefit from using it. It was mainly the less experienced, lower-skilled customer service reps who saw big gains in their job performance.

    3 votes
  5. [4]
    skybrian
    Link
    Midjourney is testing version 5.1 and I've been playing around with it. They say it's more opinionated, though there's a way to turn that off. I tried it out a bit, and I'm finding that it gives...

    Midjourney is testing version 5.1 and I've been playing around with it. They say it's more opinionated, though there's a way to turn that off. I tried it out a bit, and I'm finding that it gives impressive results but tends to ignore the style you give it.

    Here's an example.

    3 votes
    1. [3]
      Wes
      Link Parent
      I've been wanting to try out MidJourney, but the last time I looked into it, it required you to join a Discord server and interact with a bot. I found that workflow very clunky. For a service you...

      I've been wanting to try out MidJourney, but the last time I looked into it, it required you to join a Discord server and interact with a bot. I found that workflow very clunky. For a service you need to pay for, I expected something a little better.

      The results I see posted online are very impressive though. MJ does seem to have a more opinionated style, even before this update. The renditions are more cohesive than what I've seen in other models.

      3 votes
      1. skybrian
        Link Parent
        Yes, it’s a bit clunky, but you can use direct messages with MidJourney’s bot and it’s not so bad. It’s basically a command-line interface that can display pictures and links and buttons....

        Yes, it’s a bit clunky, but you can use direct messages with MidJourney’s bot and it’s not so bad. It’s basically a command-line interface that can display pictures and links and buttons.

        MidJourney has a website with a gallery of the images you generated, but it’s buggy. I need to log in twice since the first login fails. New images often don’t show up there and I think it’s due to caching, but also, some images never seem to show up there.

        Still, the results are good enough that I don’t bother to do comparisons with Dall-E or stable diffusion (dream studio) much.

        It’s not really getting better along some dimensions. One test I do is drawing a piano keyboard, and it still gets the number of black keys wrong. It’s still rubbish at drawing accordions. The pictures are much nicer when it it works, though.

        2 votes
      2. rosco
        Link Parent
        I hate the interface but boy is it addicting!

        I hate the interface but boy is it addicting!

        1 vote
  6. skybrian
    Link
    Here's a somewhat overly-excited blog post from someone who has early access to GPT-4 with plugins: [...] [...] [...] [...]

    Here's a somewhat overly-excited blog post from someone who has early access to GPT-4 with plugins:

    Code Interpreter is GPT-4 with three new capabilities: the AI can read files you upload (up to 100MB), it can let you download files, and it lets the AI run its own Python code. This may not seem like a huge advance, but, in practice, it is pretty stunning. And it works incredibly well without any technical knowledge or ability (I cannot code in Python, but I don’t need to).

    [...]

    I uploaded a Excel file, without providing any context, and asked three questions: "Can you do visualizations & descriptive analyses to help me understand the data? "Can you try regressions and look for patterns?" "Can you run regression diagnostics?"

    [...]

    I have similarly uploaded a 60MB US Census dataset and asked the AI to explore the data, generate its own hypotheses based on the data, conduct hypotheses tests, and write a paper based on its results. It tested three different hypotheses with regression analysis, found one that was supported, and proceeded to check it by conducting quantile and polynomial regressions, and followed up by running diagnostics like Q-Q plots of the residuals. Then it wrote an academic paper about it.

    [...]

    It is not a stunning paper (though the dataset I gave it did not have many interesting possible sources of variation, and I gave it no guidance), but it took just a few seconds, and it was completely solid.

    [...]

    There were other modes you saw in the image above - GPT with Plugins and GPT with Browsing. Both are very interesting, but don’t work very well yet. Plugins allow ChatGPT to work with other systems, most importantly the powerful math tool Wolfram Alpha, but also various travel and restaurant services. ChatGPT really struggles to make these work, as it does with web browsing. I have no doubt these will improve, but, for right now, they very much deserve their “alpha” label.

    3 votes
  7. [6]
    arghdos
    Link
    Has anyone used GPT-4? On a whim, I started using the free ChatGPT to learn something at work that I’m wholly unfamiliar with (Apache/SQLAlchemy/Pandoc) to create a customized report for my...

    Has anyone used GPT-4? On a whim, I started using the free ChatGPT to learn something at work that I’m wholly unfamiliar with (Apache/SQLAlchemy/Pandoc) to create a customized report for my nightly CDash builds, and it’s been… mostly useful? Like, … maybe more useful than a Google search?

    I feel as if it does a good job interpreting my lack of contextual knowledge (I.e., why Google searches would be hard: I don’t know enough of the terminology), but anytime I try to really drill down for specific examples, it falls apart. That plus it hallucinated >5 open-source libraries doing what I am building.

    Wondering if GPT-4 is a significant upgrade / worth the subscription?

    3 votes
    1. Wes
      Link Parent
      It might be a little better, but you should still expect GPT-4 to hallucinate when you drill down into specifics. If your question can't reasonably be found in the training data (because it's too...

      It might be a little better, but you should still expect GPT-4 to hallucinate when you drill down into specifics. If your question can't reasonably be found in the training data (because it's too specific, too obscure, or too recent), then the LLM will make more of an effort to fill in the gaps.

      Understanding how to use LLMs effectively is knowing when they're at their limit. They're great for breadth, and if the topic is well-trodden enough, they can often handle depth. But more skepticism is needed once you start drilling down. You need to verify anything it spits out at you.

      That said, the tools you listed are very well-known, and should have lots of historical information, so I would expect pretty good coverage. I'm actually a little surprised it's hallucinating as often as it is for you.

      3 votes
    2. streblo
      Link Parent
      I have not used it myself, but this paper is a) quite interesting and b) also does a good job of highlighting the difference in capability between GPT 3.5 and (a non-powered down version of) 4.

      I have not used it myself, but this paper is a) quite interesting and b) also does a good job of highlighting the difference in capability between GPT 3.5 and
      (a non-powered down version of) 4.

      2 votes
    3. skybrian
      Link Parent
      I’ve tried it a bit. It’s significantly slower and for simple queries you probably won’t notice a difference. I’ve found it better at writing code, but waiting for it to rewrite code with a bugfix...

      I’ve tried it a bit. It’s significantly slower and for simple queries you probably won’t notice a difference. I’ve found it better at writing code, but waiting for it to rewrite code with a bugfix is tedious.

      But I don’t use ChatGPT day to day and haven’t tried it that much.

      2 votes
    4. DawnPaladin
      Link Parent
      I've been able to solve programming problems with GPT-4 that 3.5 got stuck on. It's better at reasoning, plans out its responses better, and it seems to hallucinate less. I use it as a programming...

      I've been able to solve programming problems with GPT-4 that 3.5 got stuck on. It's better at reasoning, plans out its responses better, and it seems to hallucinate less. I use it as a programming assistant almost every day and I think it's worth the money.

      And you're correct: ChatGPT is most useful when you're a beginner and you don't know what questions to Google. The more information there is out there about the topic you're asking about, the better answers ChatGPT will give and the less likely it is to hallucinate. Once you really start to get into the weeds, ChatGPT will be less useful. But it will largely help you skip over the phase of being a total newbie who doesn't know how anything works.

      2 votes
    5. teaearlgraycold
      Link Parent
      For specifics I would recommend giving it more context to avoid hallucination. GPT-4 can handle a lot of context, so you should be able to give it an entire file and then ask for a suggestion on...

      For specifics I would recommend giving it more context to avoid hallucination. GPT-4 can handle a lot of context, so you should be able to give it an entire file and then ask for a suggestion on how to implement a change. And whenever possible tell it what libraries you'll planning to pull in if they're not already referenced in your code. There is still a level where you have very specific requirements and you're doing something that hasn't already been done a million times before where GPT just can not be of any use.

      2 votes
  8. skybrian
    Link
    Jsonformer: A Bulletproof Way to Generate Structured JSON from Language Models It looks like it even goes further than that? When generating a token, the LLM calculates the probability for the...

    Jsonformer: A Bulletproof Way to Generate Structured JSON from Language Models

    Jsonformer is a new approach to this problem. In structured data, many tokens are fixed and predictable. Jsonformer is a wrapper around Hugging Face models that fills in the fixed tokens during the generation process, and only delegates the generation of content tokens to the language model. This makes it more efficient and bulletproof than existing approaches.

    It looks like it even goes further than that? When generating a token, the LLM calculates the probability for the next token to come up. But sometimes only some tokens are allowed syntactically. So, set the probability to zero for those, and it will never generate an illegal token. (The code uses logits, so zero probability maps to -Infinity.)

    This reminds me a bit of randomly generating test data using quickcheck-style testing, where you give a set of items to pick randomly from. Except that it's not picking randomly, but based on the LLM's calculated probabilities.

    (Via Simon Willison)

    3 votes