30 votes

Midjourney version 5.2 adds support for "zoom out" feature

20 comments

  1. [13]
    skybrian
    Link
    Whenever I try out a new version of an image generator, I check if it can draw a plausible accordion and whether it gets the black keys of a piano right. Nope and nope. Hasn't happened yet. But...

    Whenever I try out a new version of an image generator, I check if it can draw a plausible accordion and whether it gets the black keys of a piano right. Nope and nope. Hasn't happened yet.

    But another challenge I found today is to try to generate pictures of people carrying things. Can it do a woman carrying a garden rake? No, that's not a garden rake, it's a broom. How about waving a flag? The flag is there but the hand that's supposed to be holding it is all wrong.

    So it's not just accordions. I can't really handle anyone carrying anything too unusual. A flower pot is sort of okay. Digging with a shovel worked two times out of four.

    The people look great, but apparently they are much better at posing for pictures than shlepping random stuff.

    12 votes
    1. [3]
      pum
      Link Parent
      I find that getting cohesive interactions between subjects in general is very challenging with generative art (I've only tried Stable Diffusion, for context). Every time I tried to create a...

      I find that getting cohesive interactions between subjects in general is very challenging with generative art (I've only tried Stable Diffusion, for context). Every time I tried to create a composition with two people or position a person exactly the way I want, it either took hours of fiddling with weights and seeds or I just gave up. It's happy to give me the "ingredients" I requested, but not combine them in a meaningful way.

      There are ongoing improvements to this area, such as the developments with ControlNet, so I hope it will become less painful relatively soon.

      8 votes
      1. [2]
        Dr_Amazing
        Link Parent
        It's definitely one of the most challenging parts and basically impossible without control nets. Took me forever just to get this picture of a knight fighting a skeleton I also recommend checking...

        It's definitely one of the most challenging parts and basically impossible without control nets. Took me forever just to get this picture of a knight fighting a skeleton

        I also recommend checking out an extension called "Latent Couple" if you haven't yet. It let's you divide up the picture into sections and describe each one separately. It's not super great with interacting subjects unless they're separated into different parts of the screen, but it works well for just making two different things in the same picture.

        I was trying to illustrate a scene from a D&D game where a player almost had a dragon fly off with their horse. Most attempts at describing it, gave me weird horse/dragon hybrids. Latent couple made is a bit better, but still kind of weird. I think in the end I just went into photoshop dropped in a super rough picture of a dragon above a horse and used a control net to match it. I ended up with this which I think came out pretty good.

        2 votes
        1. pum
          Link Parent
          I've played around with Latent Couple before. As you say, it's not super great for interacting subjects, but I've found it pretty useful for honing in the prompt on a certain area of the image,...

          I've played around with Latent Couple before. As you say, it's not super great for interacting subjects, but I've found it pretty useful for honing in the prompt on a certain area of the image, like getting a specific hand shape. Though hires fix messes it all back up again anyway, but what can you do...

    2. [7]
      asukii
      Link Parent
      I have a large grid of a few dozen test prompts that I use to evaluate different models across a variety of different criteria. How good is it at handling little details like that? What about two...

      I have a large grid of a few dozen test prompts that I use to evaluate different models across a variety of different criteria. How good is it at handling little details like that? What about two different things interacting in specific ways? What about stylizing the result, instead of having everything look like about the same generic pseudo painterly style? And so on and so forth. The prompts themselves were designed with a friend after lots of trial and error, to try to stress test model across a number of different axes, and I've found them to be a very helpful way to assess the strengths and weaknesses of each. I have older versions of the same sheet with many more older models, btw, which I'm also happy to share if anyone is interested - but these are the ones I feel are most relevant today, filtered down so the sheet doesn't have an overwhelming number of columns to parse at once.

      Unfortunately I don't have Midjourney v5 in here, only v4, as I let my subscription lapse before v5 came out. That said, if anyone here has v5 and would like to help out with updating the grid, let me know!

      4 votes
      1. [2]
        throwaway58945
        Link Parent
        The "details" tab here is fun :D And many of the others quite striking. The occasional test I did with stable diffusion was as a 4-year-old's crayons stands to the outputs here.

        The "details" tab here is fun :D
        And many of the others quite striking. The occasional test I did with stable diffusion was as a 4-year-old's crayons stands to the outputs here.

        1 vote
        1. asukii
          Link Parent
          Well, to be fair, earlier versions of stable diffusion were 4-year-old crayon scribbles compared to the current state of the art. It's kind of insane how quickly the field is advancing. Here was...

          Well, to be fair, earlier versions of stable diffusion were 4-year-old crayon scribbles compared to the current state of the art. It's kind of insane how quickly the field is advancing. Here was my original sheet, last updated about 8 months ago, with the same prompts run through a ton of different older models for comparison. (Note that there's no "details" tab yet because there wouldn't have been any point to it - all the models back then were bad enough that they all would have failed at it uniformly.)

          I got started with AI art as a hobby back when the far right column in this sheet, BigGANxCLIP, got its first open release back in Jan 2021. And let me tell you, it was SO exciting to spend like half an hour waiting for one image to generate, and to have it look like some weird blob that vaguely had some of the features from your prompt, haha. About a year ago now is when the field really started to go exponential with the rate of improvement, and it's been so cool to follow closely.

          1 vote
      2. [2]
        multubunu
        Link Parent
        Funny how the human characters are almost all white, except when the prompt included "dark skinned". There also seems to be a preference of "young" versus "old", but less prominent.

        I have a large grid

        Funny how the human characters are almost all white, except when the prompt included "dark skinned". There also seems to be a preference of "young" versus "old", but less prominent.

        1 vote
        1. asukii
          Link Parent
          Yep, there are definitely biases towards young, conventionally attractive, white women when it comes to portraits in most models, to a greater or lesser extent. That bias is extremely strong in...

          Yep, there are definitely biases towards young, conventionally attractive, white women when it comes to portraits in most models, to a greater or lesser extent. That bias is extremely strong in Midjourney especially - heck, if you type "asdhfkdhjxj" for your prompt, you'll probably get a pretty young white woman back. The issue lies in the training data being used - there are just disproportionately many of that type of picture around to base the model's learning off of - making it an actually really difficult problem to address systematically.

          DALL•E ended up rolling out an update at one point that amended underspecified prompts in the background, to make sure the race and gender reflected real world statistics (e.g. if X% of people are white, then just saying "a man" will only get you a white man X% of the time). It ended up getting a decent amount of backlash, though, since it would never take context into account - for example, asking for a sumo wrestler might get you back a portrait of an overweight black woman, or saying "a woman holding up a sign" might have her holding a sign with "ASIAN" written on it (because it would just append "asian" to the end of the prompt).

          I know Emad, CEO of Stability AI (the company behind Stable Diffusion), at one point went on record saying he thought this type of issue was best resolved by making it really easy for people to train their own custom model forks -- like, if you want to generate photos of a specific type of non-western architecture, just feed it a few photos of what you like and continue on your merry way. And that certainly has become much more quick and accessible lately with the advent of DreamBooth -- you can make a custom model now for free with a half dozen images and maybe an hour or so (and no good GPU needed either thanks to Google colab's free servers). But that always felt a bit like a cop-out answer to me. Curating a more balanced dataset from the beginning would be ideal, but that does seem like a pretty mammoth challenge... I'm not sure what the best approach is, to be honest. But these days, unless you're using either DALL•E or Bing Image Creator (which is derived from DALL•E), it's usually best to over-specify your prompts where you can to avoid that kind of sameyness.

          3 votes
      3. [2]
        the_eon
        Link Parent
        I could run these through Midjourney 5.2 if you want.

        I could run these through Midjourney 5.2 if you want.

        1 vote
        1. asukii
          Link Parent
          That would be great! If you want to DM me your email address, I can give you edit access to the sheet - or you can upload the images wherever you'd like and I can slot them in myself, whatever...

          That would be great! If you want to DM me your email address, I can give you edit access to the sheet - or you can upload the images wherever you'd like and I can slot them in myself, whatever you'd prefer.

    3. guppy
      Link Parent
      It can't do everything at the moment, but it's great at generating starting points. With a dozen good starter images and some photoshop I can make pretty any composition I want.

      It can't do everything at the moment, but it's great at generating starting points. With a dozen good starter images and some photoshop I can make pretty any composition I want.

      1 vote
    4. skybrian
      Link Parent
      Here's another test that seems pretty tough: "A woman pointing a flashlight." She's either holding it like a torch or it looks like a gun.

      Here's another test that seems pretty tough: "A woman pointing a flashlight." She's either holding it like a torch or it looks like a gun.

  2. [5]
    asukii
    (edited )
    Link
    Oh nice, outpainting support (with a friendlier, more accessible name - "zoom out"). This has been a thing for quite some time now in other models, so I think the article overhypes it a little,...

    Oh nice, outpainting support (with a friendlier, more accessible name - "zoom out"). This has been a thing for quite some time now in other models, so I think the article overhypes it a little, but it's still a nice add-in.

    I'm curious to watch how well Midjourney's paid subscription model holds up over time, given how rapidly the AI art field is advancing. Bing Image Creator is comparable in quality, imo, and is free to use. Stable Diffusion (and all its thousands of custom fine-tuned model variants/spinoffs) usually takes a little more effort massaging your prompt language to get the most out of it, but when it works, it works very well, and it's not only free but open source. Curious to watch if and how Midjourney's paid plans end up competing as more and more of these alternatives get more traction. Midjourney has a lot of "name recognition" and a lot of inertia right now in the AI art world, but that alone isn't enough to build a lasting business model off of.

    8 votes
    1. [2]
      Grzmot
      Link Parent
      That link didn't work for me, did you mean https://www.bing.com/images/create ?

      That link didn't work for me, did you mean https://www.bing.com/images/create ?

      2 votes
      1. asukii
        Link Parent
        I did, thank you! Not sure why the markdown link pointed to a relative tildes url in mine... Edit: never mind, figured it out - was because I didn't include https:// at the front in my link. Good...

        I did, thank you! Not sure why the markdown link pointed to a relative tildes url in mine...

        Edit: never mind, figured it out - was because I didn't include https:// at the front in my link. Good thing to remember for next time. Fixed now.

        2 votes
    2. [2]
      skybrian
      Link Parent
      Looks like "Bing Image Creator" is "powered by DALL-E." It looks pretty similar.

      Looks like "Bing Image Creator" is "powered by DALL-E." It looks pretty similar.

      1 vote
      1. asukii
        Link Parent
        They for sure have a similar baseline, but I can also confidently say they aren't the same model. If nothing else, Bing is much better than DALL-E at stylizing its results: D2 is much more skewed...

        They for sure have a similar baseline, but I can also confidently say they aren't the same model. If nothing else, Bing is much better than DALL-E at stylizing its results: D2 is much more skewed towards always replicating that classic "stock photo esque" style, whereas Bing is much better at incorporating the styles of different artists and art movements you might prompt it with. A little further down this thread, I left another comment with a pretty thorough set of images I use to compare different models: imo, you can see the biggest differences between Bing and DALL-E in the style application, interiors, and portrait categories.

        4 votes
  3. [2]
    pum
    Link
    I don't really understand why the title is written in such an excited tone. As far as I can tell, this is the same as outpainting that was introduced a year ago in DALL-E 2, which even the article...

    I don't really understand why the title is written in such an excited tone. As far as I can tell, this is the same as outpainting that was introduced a year ago in DALL-E 2, which even the article itself references:

    Similar to outpainting—an AI imagery technique introduced by OpenAI's DALL-E 2 in August 2022—Midjourney's zoom-out feature can take an existing AI-generated image and expand its borders while keeping its original subject centered in the new image. But unlike DALL-E and Photoshop's Generative Fill feature, you can't select a custom image to expand. At the moment, v5.2's zoom-out only works on images generated within Midjourney, a subscription AI image-generator service.

    The level of detail and realism is certainly impressive, but this sort of functionality is nothing new.

    5 votes
    1. Deimos
      Link Parent
      Agreed, I edited the title to de-sensationalize it.

      Agreed, I edited the title to de-sensationalize it.

      7 votes