16 votes

Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding

10 comments

  1. skybrian
    Link
    Here's the abstract: Do check out the photos.

    Here's the abstract:

    We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

    Do check out the photos.

    3 votes
  2. [4]
    skybrian
    Link
    Incidentally, I got into the midjourney beta on Saturday. It's less capable than DALL-E or this new image generator, but I had fun generating bad but interesting accordion images.

    Incidentally, I got into the midjourney beta on Saturday. It's less capable than DALL-E or this new image generator, but I had fun generating bad but interesting accordion images.

    3 votes
    1. unknown user
      Link Parent
      I dig the design of the top-leftmost one on the third image (the red ones). No clue what it is (it's not an accordion), but it looks like it could be something interesting.

      I dig the design of the top-leftmost one on the third image (the red ones). No clue what it is (it's not an accordion), but it looks like it could be something interesting.

      5 votes
    2. [2]
      space_cowboy
      Link Parent
      Interesting; I had never heard of midjourney Their website could certainly be more informative. They don't give any explanation or background.

      Interesting; I had never heard of midjourney

      Their website could certainly be more informative. They don't give any explanation or background.

      1 vote
  3. [3]
    archevel
    Link
    I hope it is a few years away, but I imagine the photorealistic nets will be weaponized for propaganda. You could likely generate photos of celebrities waving the flag of Isis if you trained the...

    I hope it is a few years away, but I imagine the photorealistic nets will be weaponized for propaganda. You could likely generate photos of celebrities waving the flag of Isis if you trained the network some more with their likenesses. The next step of being able to generate realistic video is also likely a few years away... While I imagine all this could be used for fun and entertainment, I am pessimistic.

    2 votes
    1. [2]
      teaearlgraycold
      Link Parent
      The propaganda generated by these neutral nets will mostly be used by religious and religious-adjacent parties (QAnon, flat earthers). They don’t and have never cared about what’s real. They just...

      The propaganda generated by these neutral nets will mostly be used by religious and religious-adjacent parties (QAnon, flat earthers). They don’t and have never cared about what’s real. They just want something good enough to satisfy the small part of their mind that can still call out BS.

      Nations won’t need it as much. They can afford the luxury of expensive forms of propaganda. Why risk someone proving you’re using DALL-E when you can just pay the right people to legitimately create the perfect photo.

      Although power to create meat space propaganda and need for convincing propaganda are two different axes. But in one corner you have world superpowers. In the opposite is QAnon. In the middle is ISIS. These tools disproportionately aid the QAnon types.

      5 votes
      1. helloworld
        Link Parent
        Looking at US politics (not that my country is doing much better), where do religious/adjacent parties stop and governments begin? Over time, a dominant faction can take over and old habits die...

        Looking at US politics (not that my country is doing much better), where do religious/adjacent parties stop and governments begin? Over time, a dominant faction can take over and old habits die hard.

        Edit: Also in my country, politicians regularly peruse ways to advance their image that are prone to exposé, often fall flat on their faces, and make a glorious comeback a year or two later. For them DALL-E is just another tool with risk/reward on slightly extreme side of spectrum.

        4 votes
  4. Diff
    Link
    This is kind of an obnoxious website to browse with the header periodically and drastically changing height

    This is kind of an obnoxious website to browse with the header periodically and drastically changing height

    3 votes