12 votes

How to setup a local LLM ("AI") on Windows

3 comments

  1. [2]
    JakeTheDog
    Link
    I've run local llama models and, while it was fun, I found the middle-sized models to be the least useful. I have a 4070 SUPER, which gives pretty good performance with a quantized model but at...

    I've run local llama models and, while it was fun, I found the middle-sized models to be the least useful.

    I have a 4070 SUPER, which gives pretty good performance with a quantized model but at only 12GB you can't fit much on there. I'd like to try splitting some larger models across GPU and CPU to use RAM but I'm skeptical that the tok/sec performance will be worth the extra 'intelligence'.

    As I see it the best two use cases are 1) the largest, most 'intelligent' models run from a server to help with the complex tasks that we need help with and 2) the smallest, most narrow task-limited models to run locally on devices like a phone (e.g. the two most recent 1B and 3B llama models) for manifest what we all expected Siri to actually be years ago, a decently-competent assistant. The middle ground is worst of both worlds. And to be honest, I find the 20 bucks/month to be worth the extra features of a service, like with Anthropic's projects organization and artifacts feature and also having an app on my phone. In the long term, maybe 5 years, I'm hoping I could run something beefy locally.

    Although I must say, it was really cool to see and interact with an offline LLM, watching my GPU spin up after asking questions. Sort of felt like my computer was alive.

    I'd be happy to hear of contrary opinions though. I'm torn between the value I'm getting and the price I'm paying overall (20 bucks a month each) for Perplexity and Anthropic pro. But considering I use both at least 5 times a day, and how much I've learned personally and professionally, and the amount of work I've accomplished with them, it still feels worth it.

    7 votes
    1. creesch
      Link Parent
      With a custom UI like Open WebUI you can also make use of the Anthropic API instead. Generally speaking, that is much cheaper than making use of the chat interface. Open WebUI also does support...

      I'm torn between the value I'm getting and the price I'm paying overall (20 bucks a month each) for Perplexity and Anthropic pro

      With a custom UI like Open WebUI you can also make use of the Anthropic API instead. Generally speaking, that is much cheaper than making use of the chat interface. Open WebUI also does support various search engine integrations. Although I am sure there might also be products out there that aim to mimic Perplexity more specifically.

      The neat thing is that using Open WebUI also lets you switch between your local LLM for lighter tasks and picking the heavier more capable remote ones through the api.

      1 vote
  2. creesch
    Link
    Didn't watch the video quite yet, but given the description it likely is about ollama. Oddly enough they seem to be running it in wsl on windows, that hasn't been needed for months now as there is...

    Didn't watch the video quite yet, but given the description it likely is about ollama. Oddly enough they seem to be running it in wsl on windows, that hasn't been needed for months now as there is a native windows version available as well.

    In case you don't want to bother with command lines and docket at all there is also jan.ai which is also pretty decent. A bit more limited though in both the functionality in the interface as well as the amount of models that are easily available.

    5 votes