43 votes

Library asks users to verify that books actually exist before making a loan request because AI invents book titles

15 comments

  1. [14]
    DanBC
    Link
    Sorry about the terrible title. I saw this post on Twitter / X: https://x.com/w_carruthers/status/1811326735251574917 . I linked to the library site because lots of people need to avoid X. I'm...

    Sorry about the terrible title. I saw this post on Twitter / X: https://x.com/w_carruthers/status/1811326735251574917 . I linked to the library site because lots of people need to avoid X.

    I'm linking this because I feel that it highlights a problem that humans have with tech. Wikipedia says (and has always said) "Don't use us as a source. Read the article, and then use the article sources as a source". But lots of people don't do that, they just cut and paste the wikipedia article. And when you check the source it'll say something different.

    And here, people are just trusting the output of the AI without checking whether it's real or not.

    I just asked Co-Pilot in Bing "What are some books that will teach me the science and craft of baking bread?" and it gave me a list, with links to stores selling those books. So, that sounds good. I then asked "Are these real books?" and it replied:

    "I apologize for any confusion. It appears I made an error in my previous response. Unfortunately, the books I mentioned do not exist. However, I can recommend some actual books on bread baking:" (and then it lists four more real books".

    I dunno, AI just seems like a mess at the moment.

    20 votes
    1. [7]
      Wes
      (edited )
      Link Parent
      Ultimately LLMs are just a tool, and like other tools they cannot be applied to every task. You wouldn't blame a hammer for being bad at installing screws, would you? LLMs as a tool are designed...

      Ultimately LLMs are just a tool, and like other tools they cannot be applied to every task. You wouldn't blame a hammer for being bad at installing screws, would you?

      LLMs as a tool are designed for generating tokens. They build a statistical model through extensive training, and can apply this for text completion and transformation tasks. If you use them for this purpose, they hardly ever hallucinate. This makes them exceptional for data manipulation, translation, editing, error detection, and so on.

      What LLMs are not however is a database of information. If you ask them for specific facts, especially for recent or niche topics, a hallucination is very likely. Getting the correct answer is a statistical probability, and one based on a number of factors probably unknown to the asker. If somebody takes information from an LLM at face value, they are not using it responsibly. Essentially all LLM providers includes warnings on this topic before they may be used, but maybe they should be more aggressively plastered.

      This doesn't mean that LLMs are useless for fact finding. They still often provide an excellent introduction to a topic. They also allow you to ask the questions that you don't really know how to ask yet; you can't Google a term you don't know the name for yet. But their limitations need to be kept in mind, and they should be used more as a jumping off point than the final say on any topic.

      Twenty years ago we needed to learn how to effectively use search engines. Which keywords to use, the order to place them in, and the search operators to use. LLMs are really no different. Using them effectively requires practice, and their features and limitations are still being worked out.

      When you asked Copilot for a list of books, those titles may or may not have existed. If it's a common enough topic then there's a good chance that information was actually incorporated into its model, but you'd have to double check to be sure. When you asked it again however if they were real, you primed it to respond in a certain way. It was generating the most likely tokens based on your follow-up question, which is why it went on to claim they weren't real. This is a hallucination. In reality, it had no idea if they were real or not. Such is their nature.

      Hammers are useful, and so are LLMs. They are more complex tools though, and require more practice to get right. But even in their early form, they're providing a lot of value when used correctly, and are definitely worth keeping around in your toolbox.

      22 votes
      1. [5]
        PuddleOfKittens
        Link Parent
        If you notice, we're not talking about LLMs, we're talking about "AI" (yes I know they're the same thing). Most complaints about LLMs aren't their fundamental purpose, they're about how humans are...

        Ultimately LLMs are just a tool, and like other tools they cannot be applied to every task.

        If you notice, we're not talking about LLMs, we're talking about "AI" (yes I know they're the same thing). Most complaints about LLMs aren't their fundamental purpose, they're about how humans are misusing and misrepresenting the tech. For instance, using them to launder art (which isn't inherently bad, but is stupidly hypocritical in an industry that's dogmatically draconian otherwise when it comes to big business's IP) or selling them as able to reason and solve problems.

        Seriously, all the marketing says "AI". AI-enabled, AI-powered, AI-driven. It's not stupid pills, it's just trying to promote a misconception and then profit off it.

        14 votes
        1. [4]
          kfwyre
          (edited )
          Link Parent
          Agreed. If I weren't on Tildes and didn't have smart and knowledgeable people like @Wes helping me understand what "AI" really is, then I'd be completely lost in the weeds with it. When I first...

          Most complaints about LLMs aren't their fundamental purpose, they're about how humans are misusing and misrepresenting the tech.

          Agreed.

          If I weren't on Tildes and didn't have smart and knowledgeable people like @Wes helping me understand what "AI" really is, then I'd be completely lost in the weeds with it. When I first saw ChatGPT, I was mystified. It felt like a complete paradigm shift -- like the bedrock for my life had suddenly uprooted and I was now living in a brand new world:

          There's a power here that's simultaneously impressive and horrifying. I feel like I'm a tiny ant, and I'm looking up and seeing, for the first time, a gigantic magnifying glass in the sky above me. I don't know who's wielding it or why -- I only know that it's way bigger and more powerful than tiny little insignificant me, and I'm hoping that whoever's holding it is planning on using it as a tool rather than a weapon.

          In my time since then, I've absorbed a lot of what our resident experts here have posted, much of which has been about defining the parameters for what AI appears to be versus what it actually is. When it first debuted AI occupied an almost God-like space in my mind, and it's taken a lot of time and effort and knowhow from kind people here to help me see that the big important Wizard's head of AI is definitely more of a man-behind-the-curtain situation.

          The lay belief of what AI is and does is completely out of alignment with what AI actually does in part because it's so powerful and convincing, but mostly because of what you mentioned: it's being marketed as something that far exceeds its actual capabilities. My parents think it's "thinking." My coworkers think it's a "brain." If I said "LLM" instead of "AI" to anybody in my life, they wouldn't know what I was talking about. AI is not widely understood to be effectively "really good autocomplete" because no company that's pushing AI is being honest about that aspect of it.

          I think we're seeing a mass overextension of it based on hype. I went from thinking it was effectively God to now seeing it as almost a new vaporware. Like Wes said, it has its utilities, but those are far narrower and more specific than the current broad perception of AI. I also think AI is being pushed in areas that exceed its current capabilities in hopes that it'll one day be able to make those work. I don't know enough about it to make any prediction one way or the other, but issues like the one brought up in the original tweet1 exist because of the gap between the public's expectations of AI and its current abilities.


          1.Are posts still called "tweets" on X?

          9 votes
          1. [3]
            DFGdanger
            Link Parent
            Last I read, they are now just "posts". The funniest suggestion I saw was to call them "xeets".

            Are posts still called "tweets" on X?

            Last I read, they are now just "posts". The funniest suggestion I saw was to call them "xeets".

            5 votes
            1. [2]
              Thrabalen
              Link Parent
              I'm a fan of the term crements.

              I'm a fan of the term crements.

              4 votes
              1. zipf_slaw
                Link Parent
                x-cretions. Goes well with "xitter", (pronouncing the X like the Chinese do, "sh").

                x-cretions.

                Goes well with "xitter", (pronouncing the X like the Chinese do, "sh").

      2. DanBC
        Link Parent
        I might, if someone was making and selling hammers and saying "you can use this for screws". But also, perhaps I should have emphasised this: "I feel that it highlights a problem that humans have...

        You wouldn't blame a hammer for being bad at installing screws, would you?

        I might, if someone was making and selling hammers and saying "you can use this for screws".

        But also, perhaps I should have emphasised this: "I feel that it highlights a problem that humans have with tech."

        When you asked Copilot for a list of books, those titles may or may not have existed.

        This is not using a hammer as a screw driver territory. Asking a search engine to provide a list of books that meet certain criteria should never return fictional books. And when the user says "do these books exist?" the answer should be yes if the books exist and no if they don't.

        9 votes
    2. [4]
      chocobean
      Link Parent
      How LLM AI (non AGI) is being used right now is a mess. Anything that requires things to be true shouldn't be using them, and yet that's exactly how they're being marketed and purchased: customer...

      How LLM AI (non AGI) is being used right now is a mess. Anything that requires things to be true shouldn't be using them, and yet that's exactly how they're being marketed and purchased: customer service to provide accurate company info; replacement for data acquisition and report generation staff; news articles; self driving cars.... We're not there yet and people are being idiots thinking we are.

      Not quite there. We can be. This library example is prime: make use of the friendliness of language models, but have a second layer to cross reference fact check, and we're mostly there n

      6 votes
      1. blivet
        Link Parent
        It’s especially annoying that Google and Bing both place “AI”-generated content front and center in their results. This is inevitably going to make naive users think that the way LLMs work is by...

        It’s especially annoying that Google and Bing both place “AI”-generated content front and center in their results. This is inevitably going to make naive users think that the way LLMs work is by retrieving data and writing a response incorporating it.

        10 votes
      2. [2]
        Englerdy
        Link Parent
        I think useing retrieval augment generation (RAG) with LLMs helps with the truthfulness issue. Taking a model that's good at talking and generating meaningful responses and then pairing it with a...

        I think useing retrieval augment generation (RAG) with LLMs helps with the truthfulness issue. Taking a model that's good at talking and generating meaningful responses and then pairing it with a later (which I think is what you're referring to but wasn't completely sure) that can searching through a factual dataset creates surprisingly meaningful responses that in my experience so far tremendously improve the ability to use the tools for tasks related to factual information.

        I'm not sure how many companies are doing this vs overtraining the customers model so it replicates the dataset more accurately, but I've been very surprised how much better ChatGPT responses are when I ask it to do a web search (and it actually recognizes the command vs pretending) or give it a document that it incorporated in its responses. RAG has made me much less skeptical of using LLMs for tasks that require true information, but that still doesn't stop me from spot checking a few spots to make sure nothing erroneous has crept in. The 'ol trust but verify approach.

        3 votes
        1. chocobean
          Link Parent
          Thanks for the RAG introduction, how interesting, and I can see that being a game changer to how much we can trust auto generated output - it would only take a bit more time per query to check...

          Thanks for the RAG introduction, how interesting, and I can see that being a game changer to how much we can trust auto generated output - it would only take a bit more time per query to check things a few times and to teach it to site from sources

          1 vote
    3. [2]
      DavesWorld
      Link Parent
      People keep trying to use current LLMs and AI portals as if the technology is in its final mature form. It isn't yet. Right now, an LLM is just a piece of sophisticated software that can assemble...

      People keep trying to use current LLMs and AI portals as if the technology is in its final mature form. It isn't yet.

      Right now, an LLM is just a piece of sophisticated software that can assemble words into coherent order. So coherent that it can do it in reverse and form pretty good guesses as to what a human "means" when that human gives it text input. The LLM can then formulate a coherent response.

      That's not the same thing as the LLM having actual, fully accurate, detailed, trustworthy ability to comb through the bulk of human knowledge (aka, the Internet) and "know" what bits and pieces are relevant.

      You ask a physics professor about physics, and she'll have relevant things to say. When you ask specific questions, and continue dialing down by degrees of detail, she'll still be able to stay relevant and on-point with her answers. She knows physics.

      The LLM just knows how to sound like a human. That's not the same thing as having a human brain to use on the information (databases).

      That physics professor, should she be unskilled or otherwise lack aptitude with human communication, could tell an LLM what to say and the LLM could come up with ways for her to say it. And with some of the other AI technologies, even proceed to verbally say it in almost any voice desired. So she could "hide" behind the tech and use it to communicate, and then it would pretty much do what most people seem to think AI/LLM tech can do without the human in the loop.

      But without her, the tech is still all form and surface-only function. The form, that surface flash, is the technological advancement. The form is very impressive. A problem that was thought to be unsolvable has been solved, that of enabling a computer to use language on a human level. But the technology only sounds like Spock or Data; it isn't Spock or Data yet, much less the Enterprise's library computer.

      Yet.

      Patience. If people would RTFM more often, and be aware of limitations and restrictions, the rest of us wouldn't have to put up with this parade of "I tried to get new tech to do something new tech doesn't do yet and it couldn't; new tech suxxxxxx!!!!" articles that continue to roll out.

      Trying to play gotcha when the tech currently isn't intended to get that thing yet.

      2 votes
      1. DefinitelyNotAFae
        Link Parent
        I disagree about where the onus of responsibility is here. Yes, industries and institutions buying into "just replace your live support with an AI bot" should be doing more research. But all of...

        I disagree about where the onus of responsibility is here. Yes, industries and institutions buying into "just replace your live support with an AI bot" should be doing more research.

        But all of this "look at how much it can do" is coming from the companies selling the tech. That's who's going and convincing the less tech savvy execs to spend stupid money on it. And you can argue that's just business or whatever, but it's still causing these problems. And the average user is getting AI interaction and those false pieces of information thrown in their faces and shoved down their throats in everything from Google to Snapchat as ways to "maximize engagement". We're not given choices to turn it off. And there's no manual being provided.

        And it turns into elitism about how the mob is so dumb for thinking AI/LLM does X or Y when the LLMs pretend they can do X or Y. And the people making the LLMs aren't actually clearing shit up

        6 votes
  2. Noox
    Link
    Reading stuff like this actually makes me cautiously optimistic that we're not in the 'darkest' timeline, but the silliest one.

    Reading stuff like this actually makes me cautiously optimistic that we're not in the 'darkest' timeline, but the silliest one.

    8 votes