12
votes
What programming/technical projects have you been working on?
This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?
Operation "Prolong the Lifespan of My PC By Keeping It Cool" Continues
New Fans
Installed 2 case fans (92x25 + 92x15). My case is SFF, so the install ended up requiring an uninstall of my motherboard + wrestling with internal cabling for clearance.
My motherboard only has one chassis fan header, but I learned that you can plug a fan into the AIO pump header after noticing that the two looked suspiciously similar. Your motherboard will likely default to running that fan at 100% all the time, but there's a decent chance the behavior is configurable in BIOS.
GPU Deshrouding + Repasting
I was always under the impression that deshrouding was a really invasive mod that would take forever to do. Turns out, it's dead simple and (in most cases) reversible. All you really need is a GPU whose heat sink is flat so you can lay your fans flat on it.
Removing the builtin shroud + fan ensemble is just a matter of a couple of screws. Annoyingly, those screws on my GPU are only accessible if you detach the GPU die from the cooler. So I was forced into repasting my GPU (also pretty easy if you're comfortable doing it on a CPU). Once the shroud is gone, you can just ziptie your fans of choice (Two Artic P12 Slim) to the heatsink.
Depending on your gpu, you might need an adapter to plug into the card's fan headings. Or you could choose to just use a fan header on your motherboard instead, and adjust the fan curve in software.
And that's it!
Results
Roughly a -8C delta in average GPU temps under load (deshroud + repaste + case fans).
Which is nice, but I've been way happier about the noise difference. Ever since I got the card, I've hated how obnoxiously loud and shrill the fans are under load. The artic fans have a lower max RPM, and are WAY quieter. With how easy the process can be, it blows my mind that card manufacturers aren't selling cards that can be easily deshrouded.
What did I learn
What's next
Undervolting!
Nice work! I have an old system that's a server now that I originally built and used for music production and as part of making it as silent as possible i deshrouded an Nvidia 750ti and used two noctua fans with zip ties and an adapter, and it was never audible again after that. Now that it's a server, it's similarly completely silent, which I love. I've thought about doing it to the card in my new music production system but because it's never stressed at all and it's a card with built-in "zero rpm when idle" mode, I don't feel as much of a need to bother. Something in that PC is still occasionally ramping a fan up in a way I haven't figured out yet (doesn't show up on fan monitoring) causing it not to be completely silent, and I'm thinking it may be the PSU. Might swap in a fanless PSU eventually.
What case do you have? I also have a SFF PC (and so does my wife, cause I built it for her). Mine is a Sliger I don’t recall the exact model number of (maybe a 550?). Hers is in a Silverstone SG13 (the shoebox) with a big fan in the front, and an extra side fan I command stripped to the vent on the side.
Both have the smallest possible Noctua CPU coolers cause that’s all that would fit, but they seem to be doing OK.
My big life hack, along with the AIO header being just a bonus fan, was getting a fan splitter cable so I could take one fan port on the board and run two fans off of it.
Dan A4 SFX. I was really married to idea of a "backpackable" PC when I got it. In hindsight, I've never put it in a backpack and wish I'd gone for something marginally bigger.
That was the original plan! But then I got bit by the "I really don't want to spend $10 on another adapter" and the "I'm really running out of space in my case" bugs haha.
I got mine explicitly to be able to fit into a carry-on suitcase so I could fly with it home for summer break while at college.
I got very used to getting stopped by the TSA (I’m pretty sure it got stopped every time), but I also never had any issues with it (even when I had it water cooled not fan cooled). Usually just had to tell the TSA agent that it was a computer, or “like an XBOX.” A couple times I did get told “oh that’s cool”. Then they’d swab it, and then on I’d go.
I've just pulled apart, cleaned, and updated my rackserver somewhat- replacing the ProArt B650-Creator board with the newer ProArt X870E-Creator, largely because of the jump from PCIE 4.0 to PCIE 5.0 on all slots but the chipset-provided slot, and partly for the jump from 2.5G to 10G networking without needing a PCIE-based NIC of some description. I also jumped from the Intel Pro B50 to the Intel Pro B70 for the 32GB of VRAM, allowing me to experiment with Qwen 3.6 27B locally. On that front, I have a few queries:
- How do you figure out which model will get the best performance to quality ratio given what hardware you have? I've seen a lot of discussion about whether to use the 27B model or the 35B A4B (mixture of experts?) model, and how the former is better for complex programming tasks, whereas the latter will generate tokens a lot quicker due to only loading a portion of the parameters into memory at a time, but at a loss of quality. And that's just two sub-models- what about comparing Llama to Gemma to Qwen, etc?
- Does anybody use Open WebUI? How much more comprehensive is it than llamacpp's built-in llama-ui? Is it easy to switch them out?
On other fronts, I continue to refine a number of tools which I use for my smart home control, various REST endpoints, and my RSS / misc. app bot.
In my experience, the answer is a very unfortunate "you use them and decide for yourself". Depending on how many models you want to try, how many quantizations of each model you want to try, and how much you want to tune your inference engine's parameters, it can take a long time to figure out what works best for you. I recommend keeping notes.
In the meantime, you might be able to find benchmarks, and you will be able to find plenty of anecdotes, that can reinforce or diminish any opinion you may have about a model's quality. Personally, I would not take those too seriously. Different models are good at different things. Sometimes a smaller, "weaker" model is better at doing a specific thing than a larger, "stronger" model is. Benchmarks are not useless, but they are not definitive, and online anecdotes are mostly noise.
I mean, it's "quality", do I really need to pull out the S-word?1.
Thankfully, performance is much less S-word, but there are still some things to figure out. For example, what does "performance" even mean in the context of LLMs? I would say "prompt processing speed" and "response generation speed" are probably the two most important characteristics to measure to define a model's performance, but there might be others. (If you don't know what I mean by "prompt processing speed" and "response generation speed" let me know. I'd be happy to explain them.)
If you are using llama.cpp as your inference engine: the
llama-clicommand accepts a-por--promptflag, so you can automatically prompt a model and get a response back. That means you could use a tool like hyperfine in a benchmark script to runllama-cliwith the same prompt on many different models, quantizations, and combinations of other flags passed tollama-cli. Writing such a benchmark script would be a good exercise to get a feel for a model's quality.Aside from model, quantization, and inference engine, tooling can also affect quality and performance.
1.
unsheathing noises
I agree that "quality" is going to be largely subjective- though I wish it weren't so. I have some 2TB of models fetched to local, of different origins and different quantisations, and lordt knows it might take some time to get an appreciable idea of what feels "good" for me.
By "prompt processing speed" and "response generation speed", I'm assuming you're referring to what is commonly abbreviated to PP and TG (Prompt-Processing and Text/Token-Generation). Benchmarking is certainly a good idea, and I have found a few benchmarks scattered around which use similar-or-same hardware to what I have, at least for the GPU. At the end of the day, this might end up being something I experiment with and then drop for another year or two until the results are actually useful for me- I don't know. I appreciate the answers and the info, though.
As for tooling, MCP servers and such are a little overwhelming from an outside perspective! I understand the concept, but wow has the space absolutely blown-up, and I'm not sure much of it is well thought-through / designed, instead of just functional vibe-coded trash.
I don't really want remote control or anything but monitoring-only, so I am planning to take one of my unused raspberry pis and a cheap webcam (and maybe eventually a cheap pi camera module) and use RTSP to serve an in-home video stream of my 3D printer so i can monitor its progress at my computer from another room that i am usually in
Might use this learning experience to then set up a similar camera for monitoring my front door/porch from a nearby window. I don't have the desire for a full outdoor-installed obvious security system or anything but just having a bit of a view of the front of my home would be nice
EDIT: Also found this esphome setup and its associated 3D print to make a nice physical button interface for HomeAssistant that is made up of very inexpensive parts from Aliexpress. For a few bucks and some prints this could be a fun project to have a nice physical control of my lights.
I am continuing my exploration of unikernels. Currently I am looking at running postgres inside unikraft. Postgres by default doesn't run as root, but there is an example in the unikraft repo that makes a patch of postgres that removes those checks. That's necessary since there's no user space in the same sense in a unikernel.
I've started on this path going down a rabbit hole of extreme optimization. So I am exploring how to remove the network stack between postgres and a client application. Turns out there is something called vsock which is a socket that can be used for communication between guest-host or guest-guest in VMs. It behaves like an ordinary socket, but it's basically shared memory so it can be super efficient. Now, postgres don't listen to no vsock... Buuuuut, there is support for domain sockets for talking to postgres, so I figured I'd attempt t make a patch for postgres that adds vsock support largely based on that. So far I've only been looking at code and querying an LLM, but it feels doable! I don't think it is something that would be suitable to upstream since it is fairly niche. Besides, talking to postgres over a vsock could be achieved by proxying the communication, but then I'd still be using all those precious cpu cycles for network stuff.