15 votes

Why do LLMs freak out over the seahorse emoji?

3 comments

  1. skybrian
    Link
    Here’s a fun bug where several LLM’s paint themselves in a corner over a non-existent emoji. Unfortunately, the author debugged the part where it tries to output the emoji, but the root cause is...

    Here’s a fun bug where several LLM’s paint themselves in a corner over a non-existent emoji. Unfortunately, the author debugged the part where it tries to output the emoji, but the root cause is the part where it writes “yes.” I wonder how that happens?

    7 votes
  2. unkz
    Link
    In humans we would call this paraphasia. In mathematics, this is something akin to a limit point not contained in a set. For a more in depth explanation of what is going on here which was alluded...

    In humans we would call this paraphasia.

    In mathematics, this is something akin to a limit point not contained in a set.

    For a more in depth explanation of what is going on here which was alluded to in the article but not given a full treatment, this article on word2vec additive and subtractive relationships for embeddings might be an interesting read.

    https://medium.com/@manansuri/a-dummys-guide-to-word2vec-456444f3c673

    In a nutshell, concepts can be represented as high dimensional vectors of real numbers. One number might represent a concept like male (say 1,0,0), another might represent female (0,1,0) , while another might represent something like royalty (0,0,1). Often we will then find that adding these vectors numerically gives interesting results like royalty+male=king (1,0,1), or royalty+female=queen (0,1,1).

    Embeddings, on the other hand, basically correspond to the reverse mappings of these high dimensional vectors to individual integers, or tokens. You can more or less think of this mapping as finding the token that is closest to a given vector. In other words many of these vectors will end up mapping to royalty, male, female, and so on, and we will basically sort them by similarity to decide the best match. This must necessarily only contain a tiny fraction of the possible vectors that one could come up with.

    What is kind of happening here is we have a vector in the earlier layers that is probably encoding a concept that is very close to emoji+seahorse but there is no embedding at exactly that point in the space, however we do have many embeddings very close to that point — you can see how fish+emoji and horse+emoji would be very close to fish+horse+emoji and seahorse would be very close to fish+horse.

    3 votes
  3. chundissimo
    Link
    In a tech world plagued by amoral tech bros claiming AGI is a year away, it’s very enjoyable to see models shit the bed on silly things like this. I didn’t however find the “explanation” all that...

    In a tech world plagued by amoral tech bros claiming AGI is a year away, it’s very enjoyable to see models shit the bed on silly things like this.

    I didn’t however find the “explanation” all that interesting or compelling

    2 votes