13 votes

Cheap options(?) to run local AI models

Posted December 21, 2023 by graphmeme

Tags: stable diffusion, ollama, local, artificial intelligence

I have been having fun learning about generative AI. All in the cloud -- I got some models on hugging face to work, tried out Colab Pro, and found another cloud provider that runs SD models (dreamlook.ai if anyone is interested).

It's got me curious about trying to run something locally (mostly stable diffusion/dreambooth, possibly ollama).
I currently have a Thinkpad T490 with 16 gb ram and the base-level graphics card. I haven't actually tried to run anything locally, on the assumption that it would be extremely slow. I saw that you can get an external GPU, though I also saw some reports of headaches trying to get external GPUs up and running.

I am curious what a workstation might cost that could do a reasonable job running local models. I am not a huge gamer or have any other high performance needs that are not currently served by the Thinkpad; not sure I can justify a $3000 workstation just to make a few jpgs.

I would be happy to buy something secondhand, like if there was a good source of off-lease workstations.

Alternatively-- if you have a similar computer to the T490 and do run models locally, what sort of performance is reasonable to expect? Would it be enough to buy some more RAM for this laptop?

Thanks for any advice!

16 comments

[8]
adutchman
December 21, 2023
Link
I don't have specific advice but for AI, pretty much only GPU power counts. This discounts laptops and external graphics cards. Laptops are worse because they are optimized for power consumption,...

I don't have specific advice but for AI, pretty much only GPU power counts. This discounts laptops and external graphics cards. Laptops are worse because they are optimized for power consumption, not performance. External GPUs have a fair amount of bottlenecks (last time I checked, which admittadly is a while ago). So if you truly want to invest in an "AI station" to train AI on, invest in a (second-hand) computer. Also keep in mind that you need an Nvidia (CUDA compatible) GPU. That is sadly not an exageration: barely anything works on AMD. That said, I would look into shelling out some money to rent an AI server of some sort, that is a lot cheaper and still allows you to tinker with AI. I am studying AI rn so I am lucky that the school provides resources and that I have a gaming PC as a backup.

Also: keep in mind that training AI is orders of magnitude more power intensive than running it. You can get quite far running AI on lower spec hardware.

11 votes
1. [2]
  Minty
  December 21, 2023
  Link Parent
  VRAM to be able to run a thing at all, power to be able to run it fast.
  
  GPU power
  
  VRAM to be able to run a thing at all, power to be able to run it fast.
  
  9 votes
  1. adutchman
    December 21, 2023
    Link Parent
    Yes indeed, great addition
    
    Yes indeed, great addition
    
    2 votes
2. [3]
  teaearlgraycold
  December 21, 2023
  Link Parent
  In addition to CUDA there's quite a lot of development effort around Apple Silicon. Granted, you'll be spending quite a lot for a powerful enough Mac.
  
  In addition to CUDA there's quite a lot of development effort around Apple Silicon. Granted, you'll be spending quite a lot for a powerful enough Mac.
  
  2 votes
  1. [2]
    balooga
    December 21, 2023
    Link Parent
    I didn't want to post this because a beefy Mac isn't cheap, but it's the route I took and I'm happy with it. It's a great general-purpose machine that also runs Stable Diffusion pretty well. The...
    
    I didn't want to post this because a beefy Mac isn't cheap, but it's the route I took and I'm happy with it. It's a great general-purpose machine that also runs Stable Diffusion pretty well. The unified memory architecture is the most interesting thing about it, I think. I've got a 32GB model and most of that can be allocated as VRAM on the fly. My only regret is that I didn't spring for more — 64GB would be amazing. Even so I don't have any trouble generating large SDXL output.
    
    4 votes
    
    teaearlgraycold
    December 21, 2023
    Link Parent
    Yeah I use an M2 Max MBP as a workstation. It's nice to have the ability to run everything locally. Just a random tidbit in case someone does want to spend way too much money - for most...
    
    Yeah I use an M2 Max MBP as a workstation. It's nice to have the ability to run everything locally.
    
    Just a random tidbit in case someone does want to spend way too much money - for most configurations the M2 Pro/Max series has more memory bandwidth than the M3s. You only get a comparable amount of memory bandwidth with an M3 Max if you load it up with 64 or 96GB.
    
    2 votes
3. [2]
  streblo
  December 21, 2023
  Link Parent
  I don't think this is true anymore, especially at the level of a hobbyist. I found setting up stable diffusion w/ the web gui in Arch Linux with an AMD 7800xt to be mostly straightforward, I just...
  
  Also keep in mind that you need an Nvidia (CUDA compatible) GPU.
  
  I don't think this is true anymore, especially at the level of a hobbyist.
  
  I found setting up stable diffusion w/ the web gui in Arch Linux with an AMD 7800xt to be mostly straightforward, I just needed the rocm libraries and I had to export a few variables while building the aur package for python-torchvision-rocm:
  
  set -gx HIP_ROOT_DIR "/opt/rocm"; set -gx PYTORCH_ROCM_ARCH "gfx1101" ; makepkg -si
  
  Apart from that, everything else just worked out of the box and the performance seems pretty good although I don't have a great frame of reference for that.
  
  2 votes
  1. adutchman
    December 22, 2023
    Link Parent
    Fair enough. Pytorch is one of the framework that has AMD support but the only one I have seen so far. Granted, it is probably the most used NN framework so that might be all that matters for...
    
    Fair enough. Pytorch is one of the framework that has AMD support but the only one I have seen so far. Granted, it is probably the most used NN framework so that might be all that matters for tinkering with just NN/AI
    
    1 vote
[5]
Protected
December 21, 2023
Link
Well, just now, using Fooocus, realistic preset, all defaults, SDXL Film Photography beta 0.4 LoRA at 0.25 weight and the prompt middle aged man. business casual. man sitting at desk. computer on...

Well, just now, using Fooocus, realistic preset, all defaults, SDXL Film Photography beta 0.4 LoRA at 0.25 weight and the prompt middle aged man. business casual. man sitting at desk. computer on desk. typing on keyboard. office backdrop. high angle. (results not great, but that's another story!), it took my (desktop) RTX 3070ti 25 seconds to generate one sample and 21 seconds to generate another.

While running, Windows reports that Stable Diffusion is using about 6gb of VRAM and 9gb of RAM, but I don't know how accurate that is.

The hardware you need would depend on how this compares to your expectations. I think depending on the resolution, prompts and other parameters this can be faster but it can also be quite a bit slower.

EDIT: They do mention on the github the minimum memory for each card; 4gb/8gb in my case.

7 votes
1. [3]
  graphmeme (OP)
  December 21, 2023
  Link Parent
  Thanks for the fast reply! 30s to 1 min definitely meets/exceeds my expectations here. I think another way to frame my question is: Can you get reasonable performance (say, less than 5 min per...
  
  Thanks for the fast reply! 30s to 1 min definitely meets/exceeds my expectations here.
  
  I think another way to frame my question is: Can you get reasonable performance (say, less than 5 min per prompt) for less than $500? Or should I reset my expectations here?
  
  5 votes
  1. Greg
    December 21, 2023
    Link Parent
    Specifically for running locally, given you already have a laptop, the best value would probably be ~$300 on a 3060 12GB and another ~$200 on a Thunderbolt dock and PSU for it. The interface...
    
    Specifically for running locally, given you already have a laptop, the best value would probably be ~$300 on a 3060 12GB and another ~$200 on a Thunderbolt dock and PSU for it. The interface bandwidth bottleneck won’t be too significant for a card like that, and for the kind of work you can do with it - what you’re looking to do is basically moving the tensor into VRAM once, running for a minute or so, and then moving the result back off again afterwards.
    
    The best value use of $500 with no other constraints would be spot instances in the cloud. You can get L4 machines (24GB VRAM, somewhere around 3090 performance) for $0.20/hour, and only spin them up as long as you need. Also means you can jump up to the big A100 or H100 hardware to play with for a bit if you do want to try your hand at fine tuning or LoRA work.
    
    2 votes
  2. ZarK
    December 21, 2023
    Link Parent
    The problem is that the VRAM of the GPU is the limiting factor on how large models you can run. The cheapest GPU with enough VRAM for all models available currently is the RTX 4060 Ti with 16 GB....
    
    The problem is that the VRAM of the GPU is the limiting factor on how large models you can run. The cheapest GPU with enough VRAM for all models available currently is the RTX 4060 Ti with 16 GB. So regardless of performance you’ll be limited in what results you can achieve if you don’t have enough VRAM.
    
    Now, if you’re only after image generation and Stable Diffusion specifically, you’ll probably get away with 8GB VRAM for now (and a very simple way to run it locally is though https://easydiffusion.github.io/).
    
    1 vote
2. teaearlgraycold
  December 21, 2023
  Link Parent
  Fooocus is really great. We use Stable Diffusion at work so I'm familiar with its baseline performance. The results from fooocus are amazing out-of-the-box, on par with DALLE-3 IMO. Just using...
  
  Fooocus is really great. We use Stable Diffusion at work so I'm familiar with its baseline performance. The results from fooocus are amazing out-of-the-box, on par with DALLE-3 IMO. Just using SDXL without any prompt engineering or LoRAs feels like you're 2 years behind the state of the art.
  
  2 votes
pbmonster
December 21, 2023
Link
That's enough to start playing with the smaller (7b and 13b) LLMs. I'm running the 4bit quants of some Mistral fine-tunes on the CPU (no GPU at all). 16GByte RAM is enough to load a 13b model, but...

I currently have a Thinkpad T490 with 16 gb ram and the base-level graphics card.

That's enough to start playing with the smaller (7b and 13b) LLMs. I'm running the 4bit quants of some Mistral fine-tunes on the CPU (no GPU at all). 16GByte RAM is enough to load a 13b model, but on my CPU that only runs at 2-3 tokens per second, which is a bit frustrating to chat with.

A 7b model runs at 5+ tokens per second, and that's OK for messing around. I use SillyTavern to manage the prompts and context and have it connected to Kobold.cpp for inference.

If you want to quickly finetune a model yourself, or run large models very quickly, I can recommend runpod.io. You can rent amazing hardware (A100, H100 if you catch them when they are available, or several 4070s) for an entire afternoon for under $10. I personally would do that for a long time before ever putting down the money for real GPU power, especially if you never plan to do any performance gaming.

5 votes
AntsInside
December 21, 2023
Link
I have also been playing with Fooocus and finding it usable if rather slow with a PC I put together last year for about $1300 with Radeon RX 6600 (8GB RAM). I get some system instability and run...

I have also been playing with Fooocus and finding it usable if rather slow with a PC I put together last year for about $1300 with Radeon RX 6600 (8GB RAM). I get some system instability and run out of VRAM if I try some operations like upscaling. It feels like any less VRAM would be too little, but maybe it would work better with an nvidia card.

2 votes
purpuraRana
December 21, 2023
Link
I've played around a bit with Stable Diffusion through Automatic1111 with a Ryzen 7600X, Nvidia 3070, 32GB RAM, and 2TB SSD. Adding in mobo and PC case/cooler, I think it was around $1.3k. Since...

I've played around a bit with Stable Diffusion through Automatic1111 with a Ryzen 7600X, Nvidia 3070, 32GB RAM, and 2TB SSD. Adding in mobo and PC case/cooler, I think it was around $1.3k.
Since this build was intended to be a general college student workstation for CAD and light gaming, it's not the best you can get for AI image generation at this price. You can get away with getting a cheaper and earlier gen CPU, which also saves cost in the RAM (7600X needs DDR5 RAM, which is 1.5-2x more expensive than DDR4). As far as I can tell, the CPU doesn't do much; it's all GPU and RAM. The 8GB of VRAM in the 3070 is also a limiting factor, so you'd probably be better off getting a 12GB 3060 instead. It's a bit slower (less CUDA), but it's cheaper, you can upscale to higher resolutions, and won't get out of memory errors as much.
Throwing together the cheapest stuff I could find on pcpartpicker, I got this for $607: https://pcpartpicker.com/list/x2726D
I do NOT suggest actually buying these parts- I just looked for the cheapest stuff I could find. It would probably work, but it's more of a proof of concept. Prepare to spend $100-200 more, and do your own research.

With SD1.5 models, I can generate 512x512 images pretty quickly. Been a while since I ran it, so I don't have an exact number, but I'm pretty sure it was under 20 seconds/image. I can upscale to somewhere around 3000x3000 before running out of VRAM.
With SDXL models, it takes a minute or two to generate a 1024x1024. Upscaling is possible to around 3000x3000 as well, but it is much slower- I think it took around 1.5 hours on the final upscale.
RAM was never a limiting factor on SD1.5, but I've been 20MB away from maxing out my 32GB on SDXL.

Hope this helps!

2 votes