30 votes

Midjourney version 5.2 adds support for "zoom out" feature

Posted June 24, 2023 by Neko

Tags: artificial intelligence, midjourney, updates, features, author.benj edwards, source.ars technica

https://arstechnica.com/information-technology/2023/06/stunning-midjourney-update-wows-ai-artists-with-camera-like-feature/

Link information

This data is scraped automatically and may be incorrect.

Title: "Stunning"-Midjourney update wows AI artists with camera-like feature
Authors: Benj Edwards
Published: Jun 23 2023
Word count: 589 words

20 comments

[13]
skybrian
June 24, 2023
Link
Whenever I try out a new version of an image generator, I check if it can draw a plausible accordion and whether it gets the black keys of a piano right. Nope and nope. Hasn't happened yet. But...

Whenever I try out a new version of an image generator, I check if it can draw a plausible accordion and whether it gets the black keys of a piano right. Nope and nope. Hasn't happened yet.

But another challenge I found today is to try to generate pictures of people carrying things. Can it do a woman carrying a garden rake? No, that's not a garden rake, it's a broom. How about waving a flag? The flag is there but the hand that's supposed to be holding it is all wrong.

So it's not just accordions. I can't really handle anyone carrying anything too unusual. A flower pot is sort of okay. Digging with a shovel worked two times out of four.

The people look great, but apparently they are much better at posing for pictures than shlepping random stuff.

12 votes
1. [3]
  pum
  June 24, 2023
  Link Parent
  I find that getting cohesive interactions between subjects in general is very challenging with generative art (I've only tried Stable Diffusion, for context). Every time I tried to create a...
  
  I find that getting cohesive interactions between subjects in general is very challenging with generative art (I've only tried Stable Diffusion, for context). Every time I tried to create a composition with two people or position a person exactly the way I want, it either took hours of fiddling with weights and seeds or I just gave up. It's happy to give me the "ingredients" I requested, but not combine them in a meaningful way.
  
  There are ongoing improvements to this area, such as the developments with ControlNet, so I hope it will become less painful relatively soon.
  
  8 votes
  1. [2]
    Dr_Amazing
    June 24, 2023
    Link Parent
    It's definitely one of the most challenging parts and basically impossible without control nets. Took me forever just to get this picture of a knight fighting a skeleton I also recommend checking...
    
    It's definitely one of the most challenging parts and basically impossible without control nets. Took me forever just to get this picture of a knight fighting a skeleton
    
    I also recommend checking out an extension called "Latent Couple" if you haven't yet. It let's you divide up the picture into sections and describe each one separately. It's not super great with interacting subjects unless they're separated into different parts of the screen, but it works well for just making two different things in the same picture.
    
    I was trying to illustrate a scene from a D&D game where a player almost had a dragon fly off with their horse. Most attempts at describing it, gave me weird horse/dragon hybrids. Latent couple made is a bit better, but still kind of weird. I think in the end I just went into photoshop dropped in a super rough picture of a dragon above a horse and used a control net to match it. I ended up with this which I think came out pretty good.
    
    2 votes
    
    pum
    June 25, 2023
    Link Parent
    I've played around with Latent Couple before. As you say, it's not super great for interacting subjects, but I've found it pretty useful for honing in the prompt on a certain area of the image,...
    
    I've played around with Latent Couple before. As you say, it's not super great for interacting subjects, but I've found it pretty useful for honing in the prompt on a certain area of the image, like getting a specific hand shape. Though hires fix messes it all back up again anyway, but what can you do...
2. [7]
  asukii
  June 24, 2023
  Link Parent
  I have a large grid of a few dozen test prompts that I use to evaluate different models across a variety of different criteria. How good is it at handling little details like that? What about two...
  
  I have a large grid of a few dozen test prompts that I use to evaluate different models across a variety of different criteria. How good is it at handling little details like that? What about two different things interacting in specific ways? What about stylizing the result, instead of having everything look like about the same generic pseudo painterly style? And so on and so forth. The prompts themselves were designed with a friend after lots of trial and error, to try to stress test model across a number of different axes, and I've found them to be a very helpful way to assess the strengths and weaknesses of each. I have older versions of the same sheet with many more older models, btw, which I'm also happy to share if anyone is interested - but these are the ones I feel are most relevant today, filtered down so the sheet doesn't have an overwhelming number of columns to parse at once.
  
  Unfortunately I don't have Midjourney v5 in here, only v4, as I let my subscription lapse before v5 came out. That said, if anyone here has v5 and would like to help out with updating the grid, let me know!
  
  4 votes
  1. [2]
    throwaway58945
    July 2, 2023
    Link Parent
    The "details" tab here is fun :D And many of the others quite striking. The occasional test I did with stable diffusion was as a 4-year-old's crayons stands to the outputs here.
    
    The "details" tab here is fun :D
    And many of the others quite striking. The occasional test I did with stable diffusion was as a 4-year-old's crayons stands to the outputs here.
    
    1 vote
    
    asukii
    July 2, 2023
    Link Parent
    Well, to be fair, earlier versions of stable diffusion were 4-year-old crayon scribbles compared to the current state of the art. It's kind of insane how quickly the field is advancing. Here was...
    
    Well, to be fair, earlier versions of stable diffusion were 4-year-old crayon scribbles compared to the current state of the art. It's kind of insane how quickly the field is advancing. Here was my original sheet, last updated about 8 months ago, with the same prompts run through a ton of different older models for comparison. (Note that there's no "details" tab yet because there wouldn't have been any point to it - all the models back then were bad enough that they all would have failed at it uniformly.)
    
    I got started with AI art as a hobby back when the far right column in this sheet, BigGANxCLIP, got its first open release back in Jan 2021. And let me tell you, it was SO exciting to spend like half an hour waiting for one image to generate, and to have it look like some weird blob that vaguely had some of the features from your prompt, haha. About a year ago now is when the field really started to go exponential with the rate of improvement, and it's been so cool to follow closely.
    
    1 vote
  2. [2]
    multubunu
    July 2, 2023
    Link Parent
    Funny how the human characters are almost all white, except when the prompt included "dark skinned". There also seems to be a preference of "young" versus "old", but less prominent.
    
    I have a large grid
    
    Funny how the human characters are almost all white, except when the prompt included "dark skinned". There also seems to be a preference of "young" versus "old", but less prominent.
    
    1 vote
    
    asukii
    July 2, 2023
    Link Parent
    Yep, there are definitely biases towards young, conventionally attractive, white women when it comes to portraits in most models, to a greater or lesser extent. That bias is extremely strong in...
    
    Yep, there are definitely biases towards young, conventionally attractive, white women when it comes to portraits in most models, to a greater or lesser extent. That bias is extremely strong in Midjourney especially - heck, if you type "asdhfkdhjxj" for your prompt, you'll probably get a pretty young white woman back. The issue lies in the training data being used - there are just disproportionately many of that type of picture around to base the model's learning off of - making it an actually really difficult problem to address systematically.
    
    DALL•E ended up rolling out an update at one point that amended underspecified prompts in the background, to make sure the race and gender reflected real world statistics (e.g. if X% of people are white, then just saying "a man" will only get you a white man X% of the time). It ended up getting a decent amount of backlash, though, since it would never take context into account - for example, asking for a sumo wrestler might get you back a portrait of an overweight black woman, or saying "a woman holding up a sign" might have her holding a sign with "ASIAN" written on it (because it would just append "asian" to the end of the prompt).
    
    I know Emad, CEO of Stability AI (the company behind Stable Diffusion), at one point went on record saying he thought this type of issue was best resolved by making it really easy for people to train their own custom model forks -- like, if you want to generate photos of a specific type of non-western architecture, just feed it a few photos of what you like and continue on your merry way. And that certainly has become much more quick and accessible lately with the advent of DreamBooth -- you can make a custom model now for free with a half dozen images and maybe an hour or so (and no good GPU needed either thanks to Google colab's free servers). But that always felt a bit like a cop-out answer to me. Curating a more balanced dataset from the beginning would be ideal, but that does seem like a pretty mammoth challenge... I'm not sure what the best approach is, to be honest. But these days, unless you're using either DALL•E or Bing Image Creator (which is derived from DALL•E), it's usually best to over-specify your prompts where you can to avoid that kind of sameyness.
    
    3 votes
  3. [2]
    the_eon
    July 2, 2023
    Link Parent
    I could run these through Midjourney 5.2 if you want.
    
    I could run these through Midjourney 5.2 if you want.
    
    1 vote
    
    asukii
    July 3, 2023
    Link Parent
    That would be great! If you want to DM me your email address, I can give you edit access to the sheet - or you can upload the images wherever you'd like and I can slot them in myself, whatever...
    
    That would be great! If you want to DM me your email address, I can give you edit access to the sheet - or you can upload the images wherever you'd like and I can slot them in myself, whatever you'd prefer.
3. guppy
  June 24, 2023
  Link Parent
  It can't do everything at the moment, but it's great at generating starting points. With a dozen good starter images and some photoshop I can make pretty any composition I want.
  
  It can't do everything at the moment, but it's great at generating starting points. With a dozen good starter images and some photoshop I can make pretty any composition I want.
  
  1 vote
4. skybrian
  June 24, 2023
  Link Parent
  Here's another test that seems pretty tough: "A woman pointing a flashlight." She's either holding it like a torch or it looks like a gun.
  
  Here's another test that seems pretty tough: "A woman pointing a flashlight." She's either holding it like a torch or it looks like a gun.
[5]
asukii
June 24, 2023 (edited June 24, 2023)
Link
Oh nice, outpainting support (with a friendlier, more accessible name - "zoom out"). This has been a thing for quite some time now in other models, so I think the article overhypes it a little,...

Oh nice, outpainting support (with a friendlier, more accessible name - "zoom out"). This has been a thing for quite some time now in other models, so I think the article overhypes it a little, but it's still a nice add-in.

I'm curious to watch how well Midjourney's paid subscription model holds up over time, given how rapidly the AI art field is advancing. Bing Image Creator is comparable in quality, imo, and is free to use. Stable Diffusion (and all its thousands of custom fine-tuned model variants/spinoffs) usually takes a little more effort massaging your prompt language to get the most out of it, but when it works, it works very well, and it's not only free but open source. Curious to watch if and how Midjourney's paid plans end up competing as more and more of these alternatives get more traction. Midjourney has a lot of "name recognition" and a lot of inertia right now in the AI art world, but that alone isn't enough to build a lasting business model off of.

8 votes
1. [2]
  Grzmot
  June 24, 2023
  Link Parent
  That link didn't work for me, did you mean https://www.bing.com/images/create ?
  
  That link didn't work for me, did you mean https://www.bing.com/images/create ?
  
  2 votes
  1. asukii
    June 24, 2023
    Link Parent
    I did, thank you! Not sure why the markdown link pointed to a relative tildes url in mine... Edit: never mind, figured it out - was because I didn't include https:// at the front in my link. Good...
    
    I did, thank you! Not sure why the markdown link pointed to a relative tildes url in mine...
    
    Edit: never mind, figured it out - was because I didn't include https:// at the front in my link. Good thing to remember for next time. Fixed now.
    
    2 votes
2. [2]
  skybrian
  June 24, 2023
  Link Parent
  Looks like "Bing Image Creator" is "powered by DALL-E." It looks pretty similar.
  
  Looks like "Bing Image Creator" is "powered by DALL-E." It looks pretty similar.
  
  1 vote
  1. asukii
    June 24, 2023
    Link Parent
    They for sure have a similar baseline, but I can also confidently say they aren't the same model. If nothing else, Bing is much better than DALL-E at stylizing its results: D2 is much more skewed...
    
    They for sure have a similar baseline, but I can also confidently say they aren't the same model. If nothing else, Bing is much better than DALL-E at stylizing its results: D2 is much more skewed towards always replicating that classic "stock photo esque" style, whereas Bing is much better at incorporating the styles of different artists and art movements you might prompt it with. A little further down this thread, I left another comment with a pretty thorough set of images I use to compare different models: imo, you can see the biggest differences between Bing and DALL-E in the style application, interiors, and portrait categories.
    
    4 votes
[2]
pum
June 24, 2023
Link
I don't really understand why the title is written in such an excited tone. As far as I can tell, this is the same as outpainting that was introduced a year ago in DALL-E 2, which even the article...

I don't really understand why the title is written in such an excited tone. As far as I can tell, this is the same as outpainting that was introduced a year ago in DALL-E 2, which even the article itself references:

Similar to outpainting—an AI imagery technique introduced by OpenAI's DALL-E 2 in August 2022—Midjourney's zoom-out feature can take an existing AI-generated image and expand its borders while keeping its original subject centered in the new image. But unlike DALL-E and Photoshop's Generative Fill feature, you can't select a custom image to expand. At the moment, v5.2's zoom-out only works on images generated within Midjourney, a subscription AI image-generator service.

The level of detail and realism is certainly impressive, but this sort of functionality is nothing new.

5 votes
1. Deimos
  June 24, 2023
  Link Parent
  Agreed, I edited the title to de-sensationalize it.
  
  Agreed, I edited the title to de-sensationalize it.
  
  7 votes