31 votes

AI and the American smile

23 comments

  1. [6]
    Sodliddesu
    Link
    Nothing too unexpected in the article, my background isn't in any related fields, but as someone with a smile that's, at best, captured as a smirk bordering on a sneer I've been aware of this...

    Nothing too unexpected in the article, my background isn't in any related fields, but as someone with a smile that's, at best, captured as a smirk bordering on a sneer I've been aware of this American phenomenon.

    tl;Dr? AI is painting historical looking faces with 'modern' Western expressions that don't match historic or cultural imagery.

    Though, I must say, at one point they showed a 'traditional Māori Haka' as an example of how they expect Ancient Polynesians 'should have' taken their selfie - But C'mon folks, people of other cultures don't only have one available emote option when a camera is around.

    16 votes
    1. [5]
      creesch
      (edited )
      Link Parent
      Yup, also it is important to note that people on old photos often looked serious because having your photo taken was considered serious business. At least in the west, there is this rather famous...

      But C'mon folks, people of other cultures don't only have one available emote option when a camera is around.

      Yup, also it is important to note that people on old photos often looked serious because having your photo taken was considered serious business. At least in the west, there is this rather famous photo of a guy taken in China. And the story behind it is that they had no expectation of how to pose for a photo so they just started goofing off. I am not entirely sure how credible that story is, but the point is that photos are not only influenced by culture but also the period they are taken in.

      26 votes
      1. [4]
        PetitPrince
        Link Parent
        Some of the early photographic technologies (I'm thinking daguerréotype) have exposure time ranging from seconds to minute; it's hard to hold a smile (be it fake or genuine) that long.

        old photos often looked serious because having your photo taken was considered serious business

        Some of the early photographic technologies (I'm thinking daguerréotype) have exposure time ranging from seconds to minute; it's hard to hold a smile (be it fake or genuine) that long.

        17 votes
        1. [3]
          creesch
          Link Parent
          This is only really true for very old photos, the majority of old photos where people have a serious look did use techniques where the exposure time was not that high.

          This is only really true for very old photos, the majority of old photos where people have a serious look did use techniques where the exposure time was not that high.

          6 votes
          1. [2]
            Sodliddesu
            Link Parent
            But there may have been a cultural expectation of seriousness due to previous photos taking so long and so everyone 'expected' to sit still because it's what you did for photos and it's turtles...

            But there may have been a cultural expectation of seriousness due to previous photos taking so long and so everyone 'expected' to sit still because it's what you did for photos and it's turtles all the way down.

            7 votes
            1. creesch
              Link Parent
              Well yeah, that is effectively what I talk about in my initial comment.

              Well yeah, that is effectively what I talk about in my initial comment.

              2 votes
  2. [8]
    Johz
    Link
    This is interesting, but I feel like the author is trying to imply a far stronger point than they ought. Firstly, as the author themself points out, this isn't an American smile, it's something...

    This is interesting, but I feel like the author is trying to imply a far stronger point than they ought.

    Firstly, as the author themself points out, this isn't an American smile, it's something common across much of Western Europe as well. I'd argue in terms of selfie culture, it's common in parts of East Asia as well. So the discussion about attaching selfie smiles to specific US cultural mores feels like a large stretch.

    Secondly, the images weren't meant to be any old photos, they were meant to be selfies - a style of photography deeply specific to the 21st Century, and even to certain generations. It doesn't really make sense to compare a fake selfie of Native American warriors to an actual photo of Native American warriors, because Native American warriors never took 21st Century selfies. They photographed in the common style of the 19th Century, which is to say stern and still. Similarly, selfies are almost always posed, and are never action shots, so it doesn't make sense to generate a selfie of a warrior in the middle of a Haka. The selfie is the specific lens through which the AI is generating these images, and it defines a lot about how people will stand, what they will be doing, and what expressions they will have on their faces. Even the comparison between the fake Soviet soldiers and the real Ukrainian soldiers shows that there is a particularly stance and style that is universal there - even if the rictus grin is more of a cultural artifact that the AI can't get away from.

    I'd have been interested to see more gender-based comparison. All the images that I saw were of men, but it would have been interesting to see, say, a selfie of a group of WRENs (a British all-female military group during WWII), and see if the AI would have approached that any differently.

    14 votes
    1. [7]
      creesch
      Link Parent
      I get where you're coming from, and I agree that the "selfie" context does weaken the argument they are trying to make. That said, I think the author might be trying to make a broader point about...

      I get where you're coming from, and I agree that the "selfie" context does weaken the argument they are trying to make.

      That said, I think the author might be trying to make a broader point about how certain cultural norms, like the American emphasis on friendliness or positivity, get baked into AI training data. Even in subtle ways. Smiles in selfies aren’t exclusively American, the specific style of smiling that AI often generates could still reflect biases in the dataset. Which is something that has been observed before.

      9 votes
      1. [6]
        post_below
        Link Parent
        I agree that the author is trying to make potentially interesting points about culture and how it can influence AI. However the fact that all the prompts used "selfie" completely torpedos their...

        I agree that the author is trying to make potentially interesting points about culture and how it can influence AI. However the fact that all the prompts used "selfie" completely torpedos their attempt.

        Almost all the selfies in the training data have smiles because that's how people take selfies. Which is so obvious that it's hard to understand how the author bulldozed past it.

        It's genuinely interesting that some cultures have different norms around facial expressions and responses, but AI "selfie" prompts doesn't seem to be a very good basis for that conversation.

        8 votes
        1. [2]
          Gaywallet
          Link Parent
          There are plenty of selfies which don't include smiles. But perhaps more importantly, I think this also highlights that how data is tagged and trained is important as well. Who tagged the photos...

          However the fact that all the prompts used "selfie" completely torpedos their attempt.

          There are plenty of selfies which don't include smiles. But perhaps more importantly, I think this also highlights that how data is tagged and trained is important as well. Who tagged the photos as selfies when they were trained? Were they tagged, or did the AI learn to associate the two together? What other biases does this bake into the outputs?

          Furthermore, how is the average user going to interact with the AI and are they aware that word choice could so drastically change the output? In the case of selfie, there isn't an easy way to represent the same idea (a picture of oneself) without using quite a few more words- but for words which are more easily interchangeable because there exist a plethora of synonyms, how does each synonym influence the output? What different results would we get from "A woman performing", "A woman executing", "A woman accomplishing", "A woman achieving", "A woman fulfilling", "A woman implementing" and so on.

          3 votes
          1. balooga
            Link Parent
            Splitting hairs here but the word “self-portrait” was widely used before “selfie” was coined. Including for photography. It would be interesting to see how differently Midjourney would handle that...

            In the case of selfie, there isn't an easy way to represent the same idea (a picture of oneself) without using quite a few more words

            Splitting hairs here but the word “self-portrait” was widely used before “selfie” was coined. Including for photography. It would be interesting to see how differently Midjourney would handle that change in otherwise identical prompts.

            1 vote
        2. [3]
          Drewbahr
          Link Parent
          Is it "how people take selfies", or is it "how people from western European-descended cultures take selfies"? I don't think you're necessarily disagreeing with the author of the article; you seem...

          Is it "how people take selfies", or is it "how people from western European-descended cultures take selfies"?

          I don't think you're necessarily disagreeing with the author of the article; you seem to be taking a baseline about what the "basis" of a photo is, aligning with the western European notion of smiling when in a photo.

          2 votes
          1. DefinitelyNotAFae
            Link Parent
            How cultures that use the word "selfie" take "selfies" probably comes into play.

            How cultures that use the word "selfie" take "selfies" probably comes into play.

            2 votes
          2. post_below
            Link Parent
            I'm not making a statement about what the basis of a photo is, rather the basis of a "selfie". A modern term with a specific meaning. Are you saying that non western people take selfies...

            I'm not making a statement about what the basis of a photo is, rather the basis of a "selfie". A modern term with a specific meaning.

            Are you saying that non western people take selfies differently? Have I just been on a statistically unlikely streak of seeing smiling faces in selfies from people in asian countries when they're actually unusual?

            I did a getty image search for chinese selfie

            And a getty search for indian selfie

            I chose those countries for their populations, they'd have the volume of contributions needed to impact the LLM dataset in a way that would show up in response to simple prompts.

            I'm seeing almost all smiling faces. And particularly in group shots, which the author used in their examples.

            And sure, maybe getty isn't the best source. Maybe there's a better one that shows a majority of unsmiling selfies?

            Since the training data is essentially the internet, there is certainly an interesting conversation to be had about the ways that's biased. But if you wanted to investigate that you wouldn't pick a modern term associated with exactly the thing your're talking about about (smiling) and then apply it to time periods and cultures that didn't even have the term.

            The number of selfies in the dataset is going to overwhelm any less well represented topic unless you do some serious prompt engineering. The prompt may as well have been "show me ancient people smiling" after which it's silly to yell "ah ha! they wouldn't really have been smiling!"

            1 vote
  3. balooga
    Link
    Like others have said it’s a bit silly to project some idea of “how X people would pose for a photo” when the people in question lived before photography was invented. There simply is no cultural...

    Like others have said it’s a bit silly to project some idea of “how X people would pose for a photo” when the people in question lived before photography was invented. There simply is no cultural frame of reference to say one way or the other what their expressions would’ve looked like. @Sodliddesu rightly pointed out that “traditional Māori Haka” !== “traditional Māori photo pose,” for example.

    But I am really curious about historical human happiness in general. My impression of the past is roughly that, until maybe the 1920s or so, “most” (vague scare quotes around “most”) people’s lives were on the whole “nasty, brutish, and short.” As much as I love the concept of time travel, there is literally no time in the past that I would want to live in more than today. Just in terms of prosperity, medicine, technology, etc. We’ve got our problems but almost any way you slice it there were more of them in the past. In most respects my humdrum middle-class life is significantly more comfortable than Henry VIII’s was.

    So were our ancestors all perpetually miserable and frowning? I’m sure they weren’t. Humans have a remarkable capacity for adapting to whatever condition they find themselves in. The most destitute slave in the ancient world was just as capable of joy as the wealthiest American celebrity is capable of depression and despair. It’s all relative.

    But it’s hard to imagine what daily life was really like for average people in history, with no photographic record of it. Most of the written accounts we have are biased toward the wealthy and educated. And those cultures seem outrageously rigid and formal, to me, at least in terms of language and fashion. Which reads to me as “serious” and likely “unhappy.” But I really have no idea. I’d be interested to hear what historians can say about this that I’m not aware of.

    7 votes
  4. [8]
    creesch
    Link
    This is very similar to the output of LLMs. The text output in English for example also very much resembles a very typical “bubbly” version of U.S. English I personally associate closely with both...

    This is very similar to the output of LLMs. The text output in English for example also very much resembles a very typical “bubbly” version of U.S. English I personally associate closely with both marketing and management level style communication. Which would make sense given how much training material likely were corporate blogs, marketing material, etc.

    You can adjust for it somewhat through the (system) prompt, but not entirely.

    To get back to image generation, I don't have a lot of experience with Midjourney, but in my limited experimenting with DALL-E it seems next to impossible to even adjust the prompt for it. Possibly because the training material is much more specific or less diverse compared to text?

    6 votes
    1. [7]
      sparksbet
      Link Parent
      I've been working with LLMs for work and one of the funniest things is when I ask it to use casual language in the prompt. It immediately starts adding "Yo" and "Hold up" to the beginnings of...

      I've been working with LLMs for work and one of the funniest things is when I ask it to use casual language in the prompt. It immediately starts adding "Yo" and "Hold up" to the beginnings of sentences, to the extent that I had to add a line to the prompt explicitly telling it not to. Very "hello fellow kids" energy.

      10 votes
      1. [5]
        chocobean
        Link Parent
        Not unlike my LLM request for a Terry Pratchett style short story, and getting generic bland children's fairy tales complete with a literal "the moral is...". Fantasy writing = fairy tales, to an...

        Not unlike my LLM request for a Terry Pratchett style short story, and getting generic bland children's fairy tales complete with a literal "the moral is...". Fantasy writing = fairy tales, to an LLM, the way casual is yo holup -- it slides into tangentially related lowest common denominators because that's all it knows how to do as a probability text prediction tool.

        4 votes
        1. [4]
          creesch
          Link Parent
          Yes, but also no. At the very core you are technically correct, as LLMs do use next word prediction. In practice this is such a simplification that it is nothing more than a derisive sneer. And I...

          Yes, but also no. At the very core you are technically correct, as LLMs do use next word prediction. In practice this is such a simplification that it is nothing more than a derisive sneer.

          And I don't think it is very useful to frame things like that when discussing their abilities and inabilities. Regardless of what end of the spectrum you fall on with your opinion about them.

          As I hinted at you can get pretty far with specific styles if you are more explicit in your expectations.

          Depending on the style you are looking for you can get pretty close with just a few examples.

          With Terry Pratchett I do admit most LLMs do likely need a bit more than a few examples.

          1 vote
          1. [3]
            TangibleLight
            (edited )
            Link Parent
            I don't think so. Anyone using that phrasing is clearly biased against the thing, but it's not wrong. The thing is a statistical model, so it makes sense to consider statistical questions about...

            In practice this is such a simplification that it is nothing more than a derisive sneer.

            I don't think so. Anyone using that phrasing is clearly biased against the thing, but it's not wrong.

            The thing is a statistical model, so it makes sense to consider statistical questions about it. Fundamentally, a "generative" model just samples whatever abstract space the model uses to parameterize the training data. Even if it seems naive in context, basic statistical ideas like regression to the mean and central limit theorem still apply.

            Qualitatively, I'd say regression to the mean manifests as "tangentially related lowest common denominators". In layman's terms, I'd describe intuition on central limit theorem as "all it knows how to do".

            And yes, by definition the output is related to the input, so you can bias the expected result one way or the other. Be more specific with the prompt, you can get pretty far! You're still subject to the same basic statistical ideas, though. The "less average" you want the output to be, the more work you have to put in the prompt - and context size puts a hard limit on that.

            I can't speak for chocobean, but I personally tend to use that derisive subtext because the general appearance of the people who proudly use LLMs is that they believe regressing to the mean and eliminating variance is somehow desirable and efficient. In practice, it sucks the voice and soul and culture out of every piece of content that LLMs touch. It's backwards and wasteful.

            5 votes
            1. [2]
              creesch
              Link Parent
              That's why I said technically correct. Practically, it doesn't really contribute anything to a conversation about them. Specifically in conversations about their capabilities and limitations,...

              but it's not wrong.

              That's why I said technically correct. Practically, it doesn't really contribute anything to a conversation about them. Specifically in conversations about their capabilities and limitations, which this comment section very much belongs to.

              I understand that LLMs are fundamentally statistical models. That they use next-word prediction based on patterns observed in their training data. Concepts like regression to the mean and probability distributions are useful to understand how they function at a technical level. But while they provide insight in underlying structures, they aren't always helpful when trying to assess their actual capabilities or judge where LLMs might be of use.

              Basically, knowing that LLMs work in a certain way does not translate to knowing how well they perform in specific tasks and situations. In fact, it can stop you from exploring these possibilities, which in turn limits your practical understanding of them.

              I really am not one of the advocates you mention in your last paragraph. Yes, LLMs will average out over the data they are trained on. At the same time, I do see that the range of what LLMs are capable of is often greater than a lot of people realize. Even if that by definition is not even close to the range that is being produced by humans.
              Do I advocate they are used for all purposes as a replacement of our current human works? Absolutely not, because I do agree with you there.

              That doesn't mean that I don't want to know what LLMs are capable of. Given how much they are being pushed in all sorts of areas, I think that in order to be critical of them, you do need to have a fairly good understanding of these capabilities and limitations. To see if you can actually dismiss them as non-viable or as a non-threat, or if they might be close to good in some capacities to the degree that this is no longer the case.

              As I alluded to when we talk about works of writers like Terry Pratchett, I also don't think that LLMs can even come close to replicating that. Most certainly not a story from scratch, most likely not even rewriting a provided story in that very specific style.

              At the same time, with the right (system) prompt, an LLM might help a non-native English speaker elevate their written text to a level they would otherwise struggle with. Tools like Grammarly and LanguageTool are already incredibly popular amongst the internet population who write English as their second language. LLMs in that area are much more capable and can actually help rephrase or even rework emails, comments, etc, etc.

              Which is, again, why I think these sorts of remarks are not all that useful. Yes, they are just word-prediction algorithms, and no, they are not a replacement for creative work. But yes, they can still be very useful in language adjacent areas.

              2 votes
              1. TangibleLight
                (edited )
                Link Parent
                While I was writing my comment I thought about adding a section at the end addressing situations where LLMs are useful. You brought up both of the examples I considered, grammar checking and new...

                While I was writing my comment I thought about adding a section at the end addressing situations where LLMs are useful. You brought up both of the examples I considered, grammar checking and new languages, so here's my take on those from this lens of basic statistics.

                Both are contexts where uniformity, middling quality, and lack of soul are acceptable or beneficial. For perfect grammar, you want to eliminate variance. To learn a new language, it's an improvement to regress to the mean.

                In general the people I see who are proud of using generative AI appear to believe it has a place everywhere. Your comment I replied to seemed to suggest that AI can compose meaningful text, if only you manage to prompt it just the right way. Until someone gives a nuanced description of their opinions as you've just done, they are indistinguishable from grifters trying to push AI into contexts where it doesn't make sense. So I apologize for incorrectly placing you in that group.

                The real argument in my first comment is that I categorically reject the idea that generative AI should have any place in the humanities, aside from some limited applications in language and visual arts. I think it's more important to be critical and intolerant of AI grift than to be polite to benign enthusiasts, so the "derisive sneer" is justified.

                I didn't address it in that comment, but I also categorically reject that they have any place in the sciences. The argument from basic statistics is that higher-accuracy events have lower probability, so generative AI can't be applied to contexts where accuracy or correctness is important.

                The new o1 demos certainly look accurate, so they challenge this accuracy argument, but I'm still skeptical. I haven't had a chance yet to interact with 1o myself, but what I've heard from others is that the improvement from 4o isn't as substantial as the demos make it seem. My experience with other "multi-step workflow" AI products is not good, and OpenAI hasn't given me much confidence that o1 has any real secret sauce over the others apart from sheer volume of compute resources. The rake kickflip meme comes to mind.

                Basically, knowing that LLMs work in a certain way does not translate to knowing how well they perform in specific tasks and situations. In fact, it can stop you from exploring these possibilities, which in turn limits your practical understanding of them.

                Fair enough. The models are black boxes, so it is impossible to make accurate predictions about how well any one performs in a given context.

                But I counter that the basic statistics is a good lens to predict which contexts generative AI as a technology could possibly do well. And as long as they remain black boxes, it will be impossible to reliably engineer their output to do well in non-obvious contexts.

                1 vote
      2. creesch
        Link Parent
        Yeah, you really do need to be very specific and provide examples, otherwise they tend to go for stereotypical things like you mention.

        Yeah, you really do need to be very specific and provide examples, otherwise they tend to go for stereotypical things like you mention.

        1 vote