16 votes

Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding

Posted May 24, 2022 by skybrian

Tags: google, machine learning, text to image, imagen

https://gweb-research-imagen.appspot.com/

Link information

This data is scraped automatically and may be incorrect.

Title: Imagen
Word count: 1381 words

10 comments

skybrian (OP)
May 24, 2022
Link
Here's the abstract: Do check out the photos.

Here's the abstract:

We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boosts both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, without ever training on COCO, and human raters find Imagen samples to be on par with the COCO data itself in image-text alignment. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. With DrawBench, we compare Imagen with recent methods including VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality and image-text alignment.

Do check out the photos.

3 votes
[4]
skybrian (OP)
May 24, 2022
Link
Incidentally, I got into the midjourney beta on Saturday. It's less capable than DALL-E or this new image generator, but I had fun generating bad but interesting accordion images.

Incidentally, I got into the midjourney beta on Saturday. It's less capable than DALL-E or this new image generator, but I had fun generating bad but interesting accordion images.

3 votes
1. unknown user
  May 24, 2022
  Link Parent
  I dig the design of the top-leftmost one on the third image (the red ones). No clue what it is (it's not an accordion), but it looks like it could be something interesting.
  
  I dig the design of the top-leftmost one on the third image (the red ones). No clue what it is (it's not an accordion), but it looks like it could be something interesting.
  
  5 votes
2. [2]
  space_cowboy
  May 25, 2022
  Link Parent
  Interesting; I had never heard of midjourney Their website could certainly be more informative. They don't give any explanation or background.
  
  Interesting; I had never heard of midjourney
  
  Their website could certainly be more informative. They don't give any explanation or background.
  
  1 vote
  1. skybrian (OP)
    May 25, 2022
    Link Parent
    Here's an article I shared here in March. I don't remember where I first heard of it.
    
    Here's an article I shared here in March. I don't remember where I first heard of it.
    
    2 votes
petrichor
May 27, 2022
Link
See also: an open-source implementation of Imagen in Pytorch

See also: an open-source implementation of Imagen in Pytorch

3 votes
[3]
archevel
May 24, 2022
Link
I hope it is a few years away, but I imagine the photorealistic nets will be weaponized for propaganda. You could likely generate photos of celebrities waving the flag of Isis if you trained the...

I hope it is a few years away, but I imagine the photorealistic nets will be weaponized for propaganda. You could likely generate photos of celebrities waving the flag of Isis if you trained the network some more with their likenesses. The next step of being able to generate realistic video is also likely a few years away... While I imagine all this could be used for fun and entertainment, I am pessimistic.

2 votes
1. [2]
  teaearlgraycold
  May 24, 2022
  Link Parent
  The propaganda generated by these neutral nets will mostly be used by religious and religious-adjacent parties (QAnon, flat earthers). They don’t and have never cared about what’s real. They just...
  
  The propaganda generated by these neutral nets will mostly be used by religious and religious-adjacent parties (QAnon, flat earthers). They don’t and have never cared about what’s real. They just want something good enough to satisfy the small part of their mind that can still call out BS.
  
  Nations won’t need it as much. They can afford the luxury of expensive forms of propaganda. Why risk someone proving you’re using DALL-E when you can just pay the right people to legitimately create the perfect photo.
  
  Although power to create meat space propaganda and need for convincing propaganda are two different axes. But in one corner you have world superpowers. In the opposite is QAnon. In the middle is ISIS. These tools disproportionately aid the QAnon types.
  
  5 votes
  1. helloworld
    May 24, 2022
    Link Parent
    Looking at US politics (not that my country is doing much better), where do religious/adjacent parties stop and governments begin? Over time, a dominant faction can take over and old habits die...
    
    Looking at US politics (not that my country is doing much better), where do religious/adjacent parties stop and governments begin? Over time, a dominant faction can take over and old habits die hard.
    
    Edit: Also in my country, politicians regularly peruse ways to advance their image that are prone to exposé, often fall flat on their faces, and make a glorious comeback a year or two later. For them DALL-E is just another tool with risk/reward on slightly extreme side of spectrum.
    
    4 votes
Diff
May 24, 2022
Link
This is kind of an obnoxious website to browse with the header periodically and drastically changing height

This is kind of an obnoxious website to browse with the header periodically and drastically changing height

3 votes