13 votes

AI, Stable Diffusion, Models and Prompts

Howdy Tildes wizards.

I decided to have a looksy at Stable Diffusion on my local computer (Manjaro, AMD 7500x CPU, 32GB) using Easy Diffusion. I've gotten my head around the basics and grabbed MidJourney V4 LLM, and now I'm learning how to prompt.

So far I've generated some cool cyberpunk cyborg things, landscapes, etc. One of the things I wanted to use Stable Diffusion for is generating silhouettes. Sounds weird, I know, but they're great to use with decal and vinyl printing for my wife's business.

Any ideas on ways to do silhouette generation?

Next is, what's good to read to learn about model types and what all of the settings really do?

I'm ordering a GPU (3060) to improve the horrendous render times, so don't worry about the under powered rig, I'm still in toy mode.

6 comments

  1. cold_porridge
    Link
    For sure check out civitai for rendering models to play around with. Most of them are designed to produce different styles of women to be frank, but a lot aren't and are very good. Read up a bit...
    • Exemplary

    For sure check out civitai for rendering models to play around with. Most of them are designed to produce different styles of women to be frank, but a lot aren't and are very good.

    Read up a bit on unfamiliar parameters, such as you'll want to know about noise schedulers and guidance scale.

    Toy around with inference steps (roughly correlated with image quality at cost of speed).

    When you're ready, start playing with inpainting for finer control.

    And finally regarding silhouettes - unless you find a particular model trained for it, I suggest this might best be done with a little manual labor atop stable diffusion. I know that's not the most fun answer, but I believe it's probably the best answer. Use photoshop or equivalent to break down images into their composite shapes, turn those shapes to black. If you're lucky you might be able to specify "in the style of a silloute" though and get solid results. If you try let me know how it goes.

    Most importantly have fun with image generation you can run on your own computer, and hope that helps!

    8 votes
  2. kru
    Link
    You piqued my curiosity and I did a few quick prompt tests. I got some pretty good silhouette generations with the prompt: "silhouette, flat 2d style, blank white background, <content prompt>"...

    You piqued my curiosity and I did a few quick prompt tests. I got some pretty good silhouette generations with the prompt:

    "silhouette, flat 2d style, blank white background,
    <content prompt>"

    where <content prompt> is replaced by what type of shape I am hoping to see. some prompts I tried were:

    "silhouette, flat 2d style, blank white background,
    action shot, warrior, sword"
    and I got some nice silhouettes of warriors holding swords. The dark sushi mix model also tended to give some nice "double exposure" shots along with pure silhouettes.

    "silhouette, flat 2d style, blank white background,
    basketball, dribbling, hoop, sneakers"
    which gave images that could be put next to a nike logo

    I tried 3 different models, an anime, cartoon and photoreal model, and each provided some good results:
    dark sushi mix,
    cartunafied,
    aZoyvaPhotoreal v2

    4 votes
  3. DataWraith
    Link
    I haven't tried generating silhouettes with Stable Diffusion directly, but I've recently played with Facebook's Segment Anything Model, which might fit the bill. If you're familiar with Python, or...

    I haven't tried generating silhouettes with Stable Diffusion directly, but I've recently played with Facebook's Segment Anything Model, which might fit the bill.

    If you're familiar with Python, or willing to work through the hassle of installing the required libraries (PyTorch, etc.), this could be an alternative for generating silhouettes.

    Segment Anything works fairly well for separating a given subject or object from the background in arbitrary images. You pass in an image to load and the coordinate of a single pixel (ideally somewhere at the center of the thing you're trying to silhouette),
    and the neural network gives you back a mask of the object at that position. Turning that into a silhouette is fairly simple once you've installed the required libraries:

    Python code for silhouetting
    import cv2
    import numpy as np
    import torch
    from segment_anything import SamPredictor, sam_model_registry
    
    # Read the input image
    image = cv2.imread("dragon.jpg")
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    
    # Select a pixel coordinate from inside the object you want to segment
    input_point = np.array([[512, 512]])
    input_label = np.array([1])
    
    # Create the model (checkpoint must have been downloaded into the current folder)
    sam = sam_model_registry["default"](checkpoint="sam_vit_h_4b8939.pth")
    
    # Move it to the GPU if possible
    device = "cuda" if torch.cuda.is_available() else "cpu"
    sam.to(device)
    
    # Feed the image into the model
    predictor = SamPredictor(sam)
    predictor.set_image(image)
    
    # Predict masks
    masks, _, _ = predictor.predict(
        point_coords=input_point, point_labels=input_label, multimask_output=True
    )
    
    # Convert masks to PNGs and save them as "mask_{1,2,3}.png"
    for i, mask in enumerate(masks):
        height, width = mask.shape[-2:]
    
        color = np.array([0, 0, 0, 255]) # Black
    
        # Convert mask into pixels
        mask_image = np.clip(
            mask.reshape(height, width, 1) * color.reshape(1, 1, -1), a_min=0, a_max=255
        ).astype(np.uint8)
    
        # Fill in transparent parts with white. There's probably a proper
        # way to do this, but this works.
        for x in range(width):
            for y in range(height):
                if mask_image[x, y][-1] == 0:
                    mask_image[x, y] = [255, 255, 255, 255] # White
    
        # Save the image
        cv2.imwrite(f"mask_{i+1}.png", mask_image)
    

    For a 1024x1024 image, this takes about a minute to process without a GPU.

    I've applied it to an image I've generated with Stable Diffusion XL as an example: https://imgur.com/a/IrZdkSE.

    3 votes
  4. [3]
    Greg
    Link
    Worth mentioning that Google Cloud will give you $300 of free trial credits for signing up - at spot prices that’s about 900 hours on a g2-standard-8, which has an L4 GPU - roughly equivalent to a...

    I'm ordering a GPU (3060) to improve the horrendous render times, so don't worry about the under powered rig, I'm still in toy mode.

    Worth mentioning that Google Cloud will give you $300 of free trial credits for signing up - at spot prices that’s about 900 hours on a g2-standard-8, which has an L4 GPU - roughly equivalent to a 3090 for this work. You can spin up an instance just for the time you need on a given day, so those hours can go a fairly long way.

    You’ll definitely need to have (or learn) a bit of familiarity with SSH to set things up, but I figure given you’re a Linux user looking to self-host ML models that might not be too much of a stretch?

    2 votes
    1. [2]
      DataWraith
      Link Parent
      There's also the possibility to use volunteer GPUs from the AI Horde, which is a compute grid where people donate GPU time to others who want to use Stable Diffusion or LLMs. It's slightly...

      There's also the possibility to use volunteer GPUs from the AI Horde, which is a compute grid where people donate GPU time to others who want to use Stable Diffusion or LLMs. It's slightly gamified so that people with more Kudos have priority for generation, but, except on weekends, it will usually return an image within a few minutes.

      3 votes
      1. Greg
        Link Parent
        Oh that's cool, it's one I hadn't come across before - always makes me smile to see people pitching in like that!

        Oh that's cool, it's one I hadn't come across before - always makes me smile to see people pitching in like that!

        1 vote