35 votes

Is it worthwhile to run local LLMs for coding today?

Posted March 6 by Akir

Tags: programming, ask.advice, vibecoding, language models.large, artificial intelligence

I've made the decision to purchase a new M5 Macbook Air because of the memorypocalypse. My current M1 model is already upgraded to the amount of memory and storage as the current base model and I'm wondering if it's worth spending the extra 2-4 hundred dollars on memory upgrades today.

My current computer is more than good enough for today but I figure I should probably future proof just in case. I was thinking the 16GB would be enough, but I also know that I'm kind of falling behind by not embracing AI coding agents. According to my research the maximum 32GB is recommended for most coding-relevant models - almost as a minimum.

I work in education so coding is not actually much of a need, and obviously there are cloud providers I could use if I end up needing them in the future. I also have less than a teacher's salary because I work part time, which is the greatest reason why I'm sticking with the 16GB base for the moment, but other than that I also don't do many memory-intensive programs. But I thought I would get some recommendations before they start shipping.

I'd also be interested on people's opinions on trading in my old one, since it'll only get me ~$275 back. I'm considering reneging on that part and keeping it around to act as a web server or give it to my husband who has a computer that still runs Windows 7 and barely uses it.

40 comments

[9]
whs
March 6
Link
I went on this route, and even buying a RTX 5090, and I'd say I'm nowhere near usable if you don't have the budget for two of them and probably 128GB memory, all that in in one machine. That being...
I went on this route, and even buying a RTX 5090, and I'd say I'm nowhere near usable if you don't have the budget for two of them and probably 128GB memory, all that in in one machine. That being said, people reported some success on Mac Studio's unified memory but due to the slower memory bandwidth it will be slower than proper NVIDIA setup.

The reason this doesn't work is:
1. You can test a lot of open weight models on OpenRouter. I'd say Qwen3.5 9B is quite good for getting non-coding tasks done (like querying), for coding Qwen Coder is probably the best model you can run with a single RTX 3090 but it is nowhere near GLM-4.7 (you could point Claude Code to Qwen Coder but it's not as agentic as it should be).
2. You need a LOT of VRAM to get the model to load. The top open weight models today, GLM-4.7, Kimi K2.5, MiniMax M2.5, etc. are not possible to be loaded on a single gaming GPU at all.
- I think there are tricks about MoE models that you'd only load specific experts to the GPU, but you still need enough RAM for it (or you can tolerate if it loads from your SSD - I tried HDD and it takes over 5 min to load Gemma3), and it requires some tuning with your LLM runtime.
1. There are quantized models that may fit in to consumer GPU, but it will be significantly watered down. Q4, the common variant to run most models on consumer GPUs is about 10% loss.
2. Not only you need the VRAM to load model, but you also need RAM to support large context window. I'm not sure if you need VRAM or RAM, because at this point I already have exhausted both of them on my gaming PC. I was planning to get 128GB RAM, but with XMP and doubling memory cost I don't think I'd be doing that any time soon. Currently I think my max context window with Gemma3 27B is about 50-60k compared to Claude and Qwen3.5's max context window of ~200k.
That being said, that one time I used Qwen Coder locally it was the 2nd fastest coding model I've ever seen, only beaten by Copilot's GPT4o (which I suspect that Microsoft probably bought provisioned throughput).
16 votes
1. teaearlgraycold
  March 6
  Link Parent
  The M3 Ultra has pretty good bandwidth (820 GB/s) but limited compute compared to high end GPUs.
  
  people reported some success on Mac Studio's unified memory but due to the slower memory bandwidth it will be slower than proper NVIDIA setup
  
  The M3 Ultra has pretty good bandwidth (820 GB/s) but limited compute compared to high end GPUs.
  
  4 votes
2. [6]
  cutmetal
  March 6
  Link Parent
  So interesting timing on this question. Replying to you because you seem to know what you're talking about and I'm curious what you think. I'm waiting on delivery of a pair of Nvidia Tesla P40...
  
  So interesting timing on this question. Replying to you because you seem to know what you're talking about and I'm curious what you think.
  
  I'm waiting on delivery of a pair of Nvidia Tesla P40 data center GPUs. You can get them used on ebay right now for a little over $200/ea, shipped. They each have 24gb vram, and I'm planning to put them into a machine with 64gb ram. (You do have to come up with a cooling solution as they're made to have server-grade blower fans wind-tunneling them, I found some cheap 3d printed shrouds that funnel 120mm fans through them. They also have non-standard power inputs so you need adapters for that too.)
  
  My understanding is I should be able to run 70b parameter models with a decent context window and speeds - does that sound realistic to you? In any case I'll find out next week!
  
  4 votes
  1. [5]
    tauon
    March 6 (edited March 6)
    Link Parent
    70B parameters will probably be a rather tight fit for 48 GB of VRAM at say an 8-bit quantization, based on the model size numbers for the newly released Qwen 3.5 family here. Edit: But that...
    
    70B parameters will probably be a rather tight fit for 48 GB of VRAM at say an 8-bit quantization, based on the model size numbers for the newly released Qwen 3.5 family here.
    
    Edit: But that doesn’t mean the smaller models aren’t capable! I mean, I’ve tested both qwen3.5:2b and qwen3.5:9b on nearly four year old fanless laptop hardware and was surprised at the quality of outputs.
    Innovation in the space is going pretty crazy currently, IMO. Even if most any locally-ran models won’t compare to the big guns you can get from cloud hardware… especially once you start to factor in the price of things. Subsidized models like GPT 5.3 (Codex) or Claude as Opus 4.6 are hard to beat in that regard, at least from what I can tell so far.
    
    2 votes
    
    [3]
    teaearlgraycold
    March 7
    Link Parent
    4bit is generally the optimal tradeoff from my testing. Yes you lose some quality, but you'll lose a lot more by going with a smaller model with more precision per weight.
    
    4bit is generally the optimal tradeoff from my testing. Yes you lose some quality, but you'll lose a lot more by going with a smaller model with more precision per weight.
    
    4 votes
    
    [2]
    tauon
    March 7
    Link Parent
    Interesting! I’ll have to do some more testing, then. :-) I haven’t really gotten into the details with that yet so far as my hardware isn’t that great, so I’ll generally only play around with...
    
    Interesting! I’ll have to do some more testing, then. :-)
    
    I haven’t really gotten into the details with that yet so far as my hardware isn’t that great, so I’ll generally only play around with local models, but not actually try to be productive with them, but I’ll keep this in mind.
    I was especially surprised with this recent Qwen 3.5 release as they published both 27B and 35B models, two distillations so “close” together in a range above 10B parameters hasn’t really been usual as far as I can tell? And I might even be able to run them as 4-bit!
    
    1 vote
    
    Greg
    March 7
    Link Parent
    The 27B model is monolithic, and it'll fit on higher end consumer GPUs at reasonable quantisation (should be able to run on a 32GB card at 8 bit and a 16GB card a 4 bit, I think) - and I'd bet...
    
    The 27B model is monolithic, and it'll fit on higher end consumer GPUs at reasonable quantisation (should be able to run on a 32GB card at 8 bit and a 16GB card a 4 bit, I think) - and I'd bet they were very specifically considering that when they chose the parameter count. If you ignore overheads a parameter is one byte at 8 bit quant, half a byte at 4 bit quant, so you're looking at nominal 27GB or 13.5GB VRAM for the model itself and then that leaves a sensible amount free for context window etc. on a 32GB or 16GB card, and lets you do long context and extra caching on a 24GB card at 4 bit.
    
    The 35B model is MoE rather than monolithic - the full name is 35B-A3B, meaning "active 3B", so it'll run on hardware that's too small for the 27B altogether or run much faster on the same hardware by offloading inactive parts of the model. Basically the 27B model is "what's the best quality we can get within the bounds of current consumer GPUs?" and the 35B-A3B is "what's a reasonable balance of quality and performance with the fewest possible active parameters?". They're using more parameters total (which they couldn't do on the 27B version without pushing it out of reach of most users) to somewhat compensate for the fact that fewer will be active at any given time.
    
    4 votes
    
    whs
    March 7
    Link Parent
    I'd like to say I don't know much about running models locally. I'm just recently got past the Ollama stage. A tips I'd recommend is that you can add your hardware to HuggingFace which on GGUF...
    
    I'd like to say I don't know much about running models locally. I'm just recently got past the Ollama stage.
    
    A tips I'd recommend is that you can add your hardware to HuggingFace which on GGUF model pages it will show whether you can load that specific model. Although loading is just the first step and running it with cache & context window & other workloads (eg. desktop GUI) you'd need to add your own safety margin.
    
    1 vote
3. kacey
  March 6
  Link Parent
  Btw, may I ask if you've given Qwen3.5 27B a shot? Some -- admittedly kinda bad -- benchmarks figure that it's about as good as Anthropic's cheap model (Haiku 3.5).
  
  Btw, may I ask if you've given Qwen3.5 27B a shot? Some -- admittedly kinda bad -- benchmarks figure that it's about as good as Anthropic's cheap model (Haiku 3.5).
  
  4 votes
[6]
shrike
March 6 (edited March 6)
Link
If you want to learn how to use AI Agents, spend the 20€ or something for Claude Pro for a month and start using it. Both the app and Claude Code on the CLI. Try creating a tool with it that fixes...

If you want to learn how to use AI Agents, spend the 20€ or something for Claude Pro for a month and start using it. Both the app and Claude Code on the CLI. Try creating a tool with it that fixes an issue in your day to day life or work, see how it goes. Automate something you need to do manually 50 times a week. Or make a silly game.

Now you have a baseline of what the state of the art can do

Then you can start experimenting with local models. Grab LM Studio, Ollama and ComfyUI and see what kind of free/open models there are. Some are good for coding, others can describe and even generate images.

Find the limits of the mainstream models, what will they do and what they won't. Try writing an AI assisted short story about a murderer and see how the model starts moralising on the character's actions or refuses to write about some things. Then grab some uncensored ones and get REALLY FUCKING WORRIED because they will generate detailed stories of the most heinous shit with no limits at all.

Try to recreate the same app you did with Claude with local models, it's slower of course, but how is the quality compared to it? Good enough? Try using a local model in an IDE for autocomplete or agentic workflows, how does it feel?

ComfyUI is fun too, you can easily create "pipelines" for image generation, fully locally. Also see the bits about limited and unlimited models from the paragraph about text-only models. Oof.

11 votes
1. [5]
  Akir (OP)
  March 7
  Link Parent
  I took your advice and it was quite eye-opening. One of the things Anthropic said it could do was onboarding, and it just so happened I have a project that has been on hiatus for about a year and...
  
  I took your advice and it was quite eye-opening. One of the things Anthropic said it could do was onboarding, and it just so happened I have a project that has been on hiatus for about a year and could use some brushing up on it. The response that it came up with was significantly better than I thought it would be, though probably not actually good enough for what I wanted it to be still. But then it continued by telling me there were some minor bugs that should be fixed and it's pushed me through a bunch of code review which happens to be a great way to familiarize myself with my old code.
  
  But god damn, even if I did spend the extra $400 for 32GB of RAM, a local model wouldn't be able to get anywhere near what this is doing. I doubt I'd be getting anything this good even if I spent $5000 on the highest end MacBook Pro.
  
  This has really given me a new perspective on the storagepocalypse and why these companies are buying up these resources like there's no tomorrow. It's also really got me wondering if the current AI boom really is a bubble.
  
  6 votes
  1. [2]
    kacey
    March 7
    Link Parent
    Not the OP, but IMO -- probably still a bubble, if only because there's still a gap between revenue and investment. Competitors appear capable of keeping pace with Anthropic/OpenAI at a steady...
    
    It's also really got me wondering if the current AI boom really is a bubble.
    
    Not the OP, but IMO -- probably still a bubble, if only because there's still a gap between revenue and investment. Competitors appear capable of keeping pace with Anthropic/OpenAI at a steady ~6-12 months gap in capability, and they're doing so for pennies on the dollar. If the large, American AI firms can't demonstrate a way to keep their advantages proprietary, then a lot of the R&D investment which is going into making these systems will end up being written off: why would consumers/companies pay 10x for Anthropic/OpenAI when another service is available for mere fractions of the price?
    
    But yeah, agreed that my expectations were blown out of the water while working with some frontier models. Even if they stay just as they are now, this will be massively disruptive to nearly all work done in front of a computer.
    
    8 votes
    
    shrike
    March 7
    Link Parent
    It's been said many times, but using a frontier model for programming is like being a team lead or project manager for programmers. You tell them what you want and check the result. Either just...
    
    But yeah, agreed that my expectations were blown out of the water while working with some frontier models. Even if they stay just as they are now, this will be massively disruptive to nearly all work done in front of a computer.
    
    It's been said many times, but using a frontier model for programming is like being a team lead or project manager for programmers.
    
    You tell them what you want and check the result. Either just use it and see it does what it was designed to do or read the generated code before accepting it. I've shipped full features and bug fixes at $dayjob without touching a single line of code myself.
    
    I've read a shit-ton of generated code though, my job is to deliver code I have proven to work after all and I need to be able to explain the code I'm delivering. The biggest outward change in my PRs is that there are a fuck ton of tests for each case.
    
    When I had to type every test by hand, I'd maybe add one test that checks the specific case. Claude on the other hand will write 10-15 tests from slightly different angles including integration tests to see that the bug can't appear at all, and it won't get bored or complain.
    
    Same goes for my personal projects, very very few had even a single test. But currently every project that moves from "hmm, would cool to do this" to me actually using them has pretty robust tests.
    
    3 votes
  2. tauon
    March 7
    Link Parent
    I’d like to point you to some of the lengthy, very in-depth industry analyses of Ed Zitron’s blog to re-change your mind again :-) Most recently I started reading his recent NVIDIA analysis....
    
    wondering if the current AI boom really is a bubble
    
    I’d like to point you to some of the lengthy, very in-depth industry analyses of Ed Zitron’s blog to re-change your mind again :-)
    
    Most recently I started reading his recent NVIDIA analysis.
    
    Genuine thoughts at the moment: The tech itself in the AI and LLM space can be truly incredible, and these companies’ valuations can be massively overblown currently. Both could easily hold true at the same time.
    
    7 votes
  3. shrike
    March 7
    Link Parent
    It is and it isn't. The amount of money being circulated around is completely bonkers, they're just throwing billions everywhere like they can just create it out of thin air. That shit IS a bubble...
    
    It's also really got me wondering if the current AI boom really is a bubble.
    
    It is and it isn't. The amount of money being circulated around is completely bonkers, they're just throwing billions everywhere like they can just create it out of thin air. That shit IS a bubble and will burst at some point, how epic the pop is - nobody really knows
    
    But that just mostly means that the 20€/month plans will go away and people have to start paying market price for online AI use. Either in cash or via ads or something similar.
    
    4 votes
[5]
kacey
March 6 (edited March 6)
Link
I wouldn't, tbh? Thoughts, as a case-wise analysis: Either the AI bubble pops, and all the hardware in those data centres (plus all the purchase orders and contracts) go up in flames, which will...
I wouldn't, tbh? Thoughts, as a case-wise analysis:
- Either the AI bubble pops, and all the hardware in those data centres (plus all the purchase orders and contracts) go up in flames, which will probably have ... an effect on computer hardware prices,
- or the AI bubble succeeds, and we have fully autonomous AGI-class intelligences running amock in the cloud. The team which was responsible for getting the current state-of-the-art local model running (Qwen3.5-397B-A17B) was fired from Alibaba in the last week, so it's not terribly certain that local models will continue becoming better as fast as they have been over the last year. Which means that you'll probably want to use cloud models anyways.
(edit) Ah, two addendums:
1. If you want to have better hardware over the next, say, two years for some other reason (e.g. gaming), now's a decent time to spec up. IMO.
2. In case it helps, as a point of data, I've been running a 4-bit quant of Qwen3.5 35B A3B on a PC (9950x w/64 GB RAM and an RTX 2060), which has been inferring at ~20-30 tokens/second (depending on batch size; it trends closer to 30 than not). It still requires handholding, but it's mostly capable of handling a decently technical workload (atm. it's implementing an ML project I've been mulling for a while, and it's doing OK enough). It'd be fine for simple web apps, or quick one off scripts.
9 votes
1. [4]
  babypuncher
  March 6
  Link Parent
  Now is a horrible time because hardware prices are grotesquely inflated by the billionaire class's insatiable appetite for shoving slop down our throats.
  
  now's a decent time to spec up. IMO.
  
  Now is a horrible time because hardware prices are grotesquely inflated by the billionaire class's insatiable appetite for shoving slop down our throats.
  
  19 votes
  1. teaearlgraycold
    March 6 (edited March 7)
    Link Parent
    Not Apple’s. I’d recommend people buy Apple hardware now before their fixed pre-inflation contracts run out. For general consumers it’s hard to justify alternatives in the <$1,000 range. Edit: I...
    
    Not Apple’s. I’d recommend people buy Apple hardware now before their fixed pre-inflation contracts run out. For general consumers it’s hard to justify alternatives in the <$1,000 range.
    
    Edit: I just noticed they’ve increased the starting prices on their laptops by $100-$200 this generation. You can still buy an M4 series laptop so I recommend that to any readers looking to save a bit of money.
    
    17 votes
  2. kacey
    March 6
    Link Parent
    Sorry, I didn't mean to offend. Apple products were -- IIRC -- one of the few computing products that haven't seen a price jump because of the recent spike in the cost of everything. For people...
    
    Sorry, I didn't mean to offend. Apple products were -- IIRC -- one of the few computing products that haven't seen a price jump because of the recent spike in the cost of everything. For people who really need the extra capacity, selecting the beefiest spec you can makes the most sense of any time between now and ~2029, since prices are only going to go up.
    
    Genuinely, if Akir has the spare cash and has a reason to spec up for the next two years, now will probably be the most cost effective time to do so. Who knows what happens next. If you disagree, please feel free to make an argument ...? I can quote random blogspam if it'd help make mine, but I'm hopeful that we can discuss this point instead of screaming at each other.
    
    10 votes
  3. d32
    March 6
    Link Parent
    It's not going to get better for years to come, most likely. Memory chips are already fully pre-ordered for three years.
    
    It's not going to get better for years to come, most likely.
    Memory chips are already fully pre-ordered for three years.
    
    4 votes
[3]
LukeZaz
March 6
Link
Doubtful. The way I've been watching people use these things has been nothing short of reckless; this technology has failed to prove itself countless times, and the cases in which it has worked...
but I also know that I'm kind of falling behind by not embracing AI coding agents.

Doubtful. The way I've been watching people use these things has been nothing short of reckless; this technology has failed to prove itself countless times, and the cases in which it has worked have been few and far between. To say I am hesitant to believe that it is actually useful for coding and not simply fooling people into thinking it's useful for coding is an understatement. A good software professional is supposed to be wary of software.

But that's just one angle. And not one I prefer, frankly, since I still believe the tech could be good in a hypothetical future where a lot of things were different.

The advice I'd offer here is to consider more than just the cost or a hypothetical future wherein AI magically becomes everything it claims to be. People here have already answered that for you. I suggest instead to consider the following two factors:
1. Generated code is significantly more prone to errors, due to hallucinations, the fact that you didn't write it and thus understand it less (if at all, depending on how much you read), and the fact that LLMs do not have brains and cannot think.
2. From at least my personal viewpoint, generated code is a moral failing. LLMs cause numerous problems from the environmental to the sociological, both for yourself and for everyone — knowing this and using them anyway is to declare to the world that one considers convenience more important than both. This is not me pointing a finger at you – knowing the harms of AI is not knowledge anyone's born with – this is me saying this is something you should be thinking about.
So when you ask yourself if a local LLM is worthwhile, please don't just stop at price. Ask yourself if the code will really be as good as some people claim it is, and more importantly, ask yourself if the risks and problems that LLMs (even local models!) create are worth the alleged utility.
6 votes
1. [2]
  shrike
  March 6
  Link Parent
  No 1 depends a lot when you end the generation loop. If you don't give the AI Agent (you are using an agent and not copypasting from ChatGPT web, right?) tools to validate its work by building,...
  
  No 1 depends a lot when you end the generation loop. If you don't give the AI Agent (you are using an agent and not copypasting from ChatGPT web, right?) tools to validate its work by building, testing and linting the code - of course it's going to be shit.
  
  I've personally done full-ass PRs and bug fixes with nothing but prompts to Claude + Opus 4.6 and the code quality is on par or a bit over what I'd write. Mostly because the model knows a few language/library tricks better than I do. Zero errors, zero hallucinations. And me writing it doesn't matter, I READ it before I submit to any human eyes - as you should. My job is to deliver code I have proven to work
  
  AI isn't going anywhere, someone opened the Pandora's box or large language models and assossiated tech. The only thing that varies is that whether we get local models that are good enough for daily use or do we need to rely on online models.
  
  5 votes
  1. LukeZaz
    March 8
    Link Parent
    So perhaps it's a failure of formatting on my part, but it should be noted my first point was also somewhat elaborated on in the first paragraph, which was followed by this: So, sure. Maybe using...
    
    So perhaps it's a failure of formatting on my part, but it should be noted my first point was also somewhat elaborated on in the first paragraph, which was followed by this:
    
    But that's just one angle. And not one I prefer, frankly, since I still believe the tech could be good in a hypothetical future where a lot of things were different.
    
    So, sure. Maybe using an LLM can speed up the creation of a codebase. You're still making a reverse centaur of yourself, encouraging the same for others, and yes — you are understanding your code less, because writing something involves active thought that leads you to remember what you've done and why much better than you otherwise do.
    
    But just maybe it's useful enough to set all that aside. Let's assume so. Here's my verdict, then:
    
    I don't care.¹
    
    Because fundamentally, you've spent three full paragraphs addressing a point that I myself already said I don't prefer, because other areas of concern are more important. LLMs are a bad idea, because right now they hurt the world. Even if you swap to local models exclusively, thus partially addressing environmental concerns, using them still helps encourage a trend of their rash overapplication and makes the idea that they can be used to fire people all the more tantalizing. After all, if it can partially automate your work, then they don't need as many of you, do they? And that's not even the only problem. I linked a whole list.
    
    So long as this rampant abuse of the technology is the norm – and it very much is – using AI is irresponsible at best. Even if I assume that everything you've output with it is just as good as what you once wrote, you're still focusing on the end result of your job and ignoring all the side effects of what you're doing. That's bad. That's the kind of attitude people take with them when they work for Lockheed Martin.
    
    We don't need AI. We were just fine without it, and what little use has been found with it can still be had without this absurd excess shoving it everywhere and on everyone. Which means you can also stop using it just fine. Just because the box got opened doesn't mean you have to dig your hands into it to survive. The sooner this bubble pops, the less awful it will be for everyone; and when that happens, then we'll be able to safely start considering how best to use the tech.
    
    _{1. I also don't want it. Part of why I'm a programmer is because I like to code. I don't want a robot to do it for me. That's not fun, and it even shines a light on another of the huge problems this tech has introduced: It's being sold as a way to automate things that people enjoy, and I don't think that's what the idea of "progress" should be striving for. It's a terrible future.}
    
    1 vote
0xSim
March 6
Link
Unless you have a very powerful GPU and/or tons of RAM, you local model will just give you a slower but slightly better autocomplete. You'll pay thousands to run an LLM that is way worse than the...

Unless you have a very powerful GPU and/or tons of RAM, you local model will just give you a slower but slightly better autocomplete. You'll pay thousands to run an LLM that is way worse than the current best models (Claude Opus 4.5/4.6) available on relatively cheap subscriptions.

It could be worth it if you use that computing power for something else, but investing that much money to run a local LLM is an awful idea.

5 votes
entitled-entilde
March 6
Link
Running local models is fun for hobby purposes like building an agent, fine tuning, or digging into text analysis with embeddings. For coding, not so much. Remember if your model makes a mistake...

Running local models is fun for hobby purposes like building an agent, fine tuning, or digging into text analysis with embeddings. For coding, not so much. Remember if your model makes a mistake in a tool call, or runs out of context, the whole thing grinds to a halt. For me, AI coding should be fun and ergonomic, and this spoils it. If you want to try AI coding just go commercial. Claud code is 20$ a month which is not so bad. It’s possible that 2 years from now a local model is developed that could do great on any task with 32GB, so I get the temptation. But if you run a local model and hate the whole experience, that’s worse. At least have a backup plan for that memory if you do buy it

4 votes
[2]
teaearlgraycold
March 6
Link
I hardly use local LLMs for coding, but I am pretty sure you'll want a 128GB MacBook Pro if you're looking to run anything remotely comparable to hosted models. Even then, a 256GB or 512GB Mac...

I hardly use local LLMs for coding, but I am pretty sure you'll want a 128GB MacBook Pro if you're looking to run anything remotely comparable to hosted models. Even then, a 256GB or 512GB Mac Studio is more of the right choice to run the best open weight models.

But as you don't seem to be a professional software engineer I don't think I can anticipate your needs. If you just need an LLM that can help write some small scripts and navigate the command line then I can see something useful fitting into 32GB. I've gotten some use out of GPT-OSS-20B on my 24GB MacBook Air at times when I didn't have internet access. But it was really just a fancy natural language CSS documentation lookup tool at that time. Not anything remotely comparable to modern "agentic" coding tools. The context window is much too small for that.

If you don't need the AI to be local then the free tiers for cloud hosted models will be your best option.

3 votes
1. Akir (OP)
  March 6
  Link Parent
  Yeah, honestly the more I think about it the more dumb the idea becomes. I guess I just got bit by the FOMO bug. I wouldn't expect a local model to run at the level of the hosted ones, so that...
  
  Yeah, honestly the more I think about it the more dumb the idea becomes. I guess I just got bit by the FOMO bug.
  
  I wouldn't expect a local model to run at the level of the hosted ones, so that isn't really a concern. My expectation was more along the lines of a debugging helper. I think it probably makes a lot more sense to just use their stuff as a pay-as-you-go thing if I ever feel the need to do some vibe coding or something like that. And for debugging I honestly find it somewhat rare to have LLMs be able to tell me something I couldn't find out by talking to the proverbial rubber duck.
  
  8 votes
[2]
clayh
March 6
Link
I can run 4B Qwen 3.5 on a 24Gb M3 MacBook Air without a problem. However, I think you’ll need much larger models for good coding assistance. I agree that you’d probably want 128 or 256 Gb of RAM...

I can run 4B Qwen 3.5 on a 24Gb M3 MacBook Air without a problem. However, I think you’ll need much larger models for good coding assistance. I agree that you’d probably want 128 or 256 Gb of RAM if you’re doing this for anything other than a hobby.

3 votes
1. teaearlgraycold
  March 7 (edited March 8)
  Link Parent
  I just tried running Qwen3.5 27B @ 4b quantization on an M3 with 24GB. It loads. But it runs at 2.2 tok/s (slow). It can work with the Pi coding agent so I gave that a shot. After a few minutes it...
  
  I just tried running Qwen3.5 27B @ 4b quantization on an M3 with 24GB. It loads. But it runs at 2.2 tok/s (slow). It can work with the Pi coding agent so I gave that a shot. After a few minutes it was 50% through processing the 16,000 token agent prompt, at which point Pi killed the request to the LLM because it had taken too long. I guess for simple questions it might take 15-20 minutes to give an answer. You’ll definitely need one of Apple’s Max chips with 64-128GB of memory to do even half decent agentic tasks. The M5 series’ reported 4x prompt processing speeds sounds pretty appealing now.
  
  Edit: I switched to Qwen3.5 35B-A3B @ 3b quantization. I can now actually get it to work with Oh My Pi. It's slow but it does work. It runs 7-12x faster than the 27B monolithic model from what I've seen. It's cool to see an agent running locally on a relatively low-end machine, tool calling and giving me a correct answer to a simple question.
  
  2 votes
pete_the_paper_boat
March 6
Link
Local inference is probably worth it for stuff like autocomplete, etc. That's a small and complete context that can be performed by tiny models. Idk if full development is very feasible for a...

Local inference is probably worth it for stuff like autocomplete, etc. That's a small and complete context that can be performed by tiny models.

Idk if full development is very feasible for a competitive price, unless speed isn't an issue, then maybe it's possible to set up a task, walk away, and come back to a finished problem.

3 votes
[10]
babypuncher
March 6
Link
Oh god please don't become part of the problem. This insatiable appetite for slop is making computers too expensive for the rest of us

Oh god please don't become part of the problem. This insatiable appetite for slop is making computers too expensive for the rest of us

5 votes
1. kacey
  March 6
  Link Parent
  Uh ... individuals buying computers aren't driving up the price of components, it's OpenAI buying 40% of the world's RAM manufacturing capacity and the like. I'm sure Akir has reasons for wanting...
  
  Uh ... individuals buying computers aren't driving up the price of components, it's OpenAI buying 40% of the world's RAM manufacturing capacity and the like. I'm sure Akir has reasons for wanting to code locally, and anyways, isn't their desire to use their computer just as valuable as "the rest of us"?
  
  33 votes
2. [8]
  everythingisblue
  March 7
  Link Parent
  I get that the AI craze in general is creating a lot of slop and has its own problems, but as far as coding goes, it’s a game changer. As a software engineer for nearly 10 years, it has improved...
  
  I get that the AI craze in general is creating a lot of slop and has its own problems, but as far as coding goes, it’s a game changer. As a software engineer for nearly 10 years, it has improved my development efficiency an absolutely insane amount. I spend all my time now solutioning rather than hand typing lines of code. I tell the agent what to write and it writes it, often better than I would have, and certainly faster (so much faster). When I really care about the implementation, I’ll tell it exactly how to write it. But often I leave some ambiguity in my requests so I can see if what it does is what I had in mind or if there was a better way that it came up with.
  
  I actually miss handwriting code, I really did enjoy that part of my job. But if I’m taking a sober look at the state of development, I don’t think there will really be a need for that anymore. I’m just thankful I got to learn how to code when writing it was still necessary, because reading it certainly still is, and writing helps you learn to read it.
  
  4 votes
  1. [7]
    babypuncher
    March 7
    Link Parent
    "Everything's more expensive, the internet is flooded with slop, and the job market sucks ass. But hey at least my employer can now expect me to be twice as productive for less pay"
    
    "Everything's more expensive, the internet is flooded with slop, and the job market sucks ass. But hey at least my employer can now expect me to be twice as productive for less pay"
    
    2 votes
    
    [6]
    kacey
    March 7
    Link Parent
    Why do you feel it's appropriate to mock someone else's perspective?
    
    Why do you feel it's appropriate to mock someone else's perspective?
    
    8 votes
    
    [5]
    babypuncher
    March 7
    Link Parent
    Because the entire world is going to shit around us and AI is a huge part of it. I genuinely think we do not hate billionaire tech CEOs enough and we shouldn't have to take their shit just because...
    
    Because the entire world is going to shit around us and AI is a huge part of it. I genuinely think we do not hate billionaire tech CEOs enough and we shouldn't have to take their shit just because some people have been fooled into thinking they are our friends.
    
    We're talking about a technology whose key selling point is the potential to create mass unemployment. That is what the billionaire class wants. We are locked in a class war and the wrong side is winning.
    
    1 vote
    
    Greg
    March 7
    Link Parent
    Very strong agree. That doesn’t seem like an inherently bad thing, though? There’s no point doing a task that a computer can do unless you enjoy it or you can do it better than the computer, and I...
    
    I genuinely think we do not hate billionaire tech CEOs enough and we shouldn't have to take their shit just because some people have been fooled into thinking they are our friends.
    
    Very strong agree.
    
    We're talking about a technology whose key selling point is the potential to create mass unemployment.
    
    That doesn’t seem like an inherently bad thing, though? There’s no point doing a task that a computer can do unless you enjoy it or you can do it better than the computer, and I don’t really get people’s desire to pretend otherwise. Doing things manually just for the sake of employment seems dystopian to me - it’s like the idea of making people dig holes and fill them in again just so they can be paid for the “work”.
    
    I know (or at least I hope!) that’s not what you’re advocating, but it’s the logical end state when you worry about jobs and about specific technologies. Protecting employment is basically never the answer; structuring society so that people can live by doing only the work that’s actually necessary and contributing is.
    
    And if that sounds depressingly unrealistic? Well yeah, I agree on that too. Like you say, the billionaires are winning, so ultimately most of our anger is probably futile either way. So if I’m going to burn energy on it either way, I’m gonna be angry in the direction of restructuring work itself rather than protecting jobs that don’t need to be done.
    
    3 votes
    
    [3]
    kacey
    March 7
    Link Parent
    Do you think making fun of OP is going to change their opinions? Or that bullying them publicly will make people more sympathetic to your cause? I didn't see anyone glazing Sam Altman in this thread.
    
    Do you think making fun of OP is going to change their opinions? Or that bullying them publicly will make people more sympathetic to your cause? I didn't see anyone glazing Sam Altman in this thread.
    
    2 votes
    
    [2]
    babypuncher
    March 7
    Link Parent
    I don't see it as making fun of OP, I simply reframed their statement to make a point. Am I not entitled to my perspective?
    
    I don't see it as making fun of OP, I simply reframed their statement to make a point. Am I not entitled to my perspective?
    
    gary
    March 7
    Link Parent
    You reframed their opinion into something they didn't come even close to saying. That adds to the distaste imo. Not only that, but what does it even add to the discussion? Are we supposed to...
    
    You reframed their opinion into something they didn't come even close to saying. That adds to the distaste imo. Not only that, but what does it even add to the discussion? Are we supposed to pretend like AI doesn't exist and be less competitive in the job market because the technology has negative externalities that are outside of our control?
    
    4 votes