hungariantoast's recent activity
-
Comment on Donald Trump says it's 'not possible' for the US to pay for Medicaid, Medicare and day care: 'We’re fighting wars' in ~society
-
Comment on Donald Trump says it's 'not possible' for the US to pay for Medicaid, Medicare and day care: 'We’re fighting wars' in ~society
hungariantoast Link ParentInstead of asking OP to worry about the tags, you should message Deimos and ask to be granted tagging permission so you can add them yourself. Tags are not compulsory. When someone posts a topic...Instead of asking OP to worry about the tags, you should message Deimos and ask to be granted tagging permission so you can add them yourself.
Tags are not compulsory. When someone posts a topic on Tildes, they have no responsibility or obligation to add tags to their topic. I think that's a good thing because it reduces "friction to post".
Tags are cool though, and having more people involved in tagging stuff would also be cool.
-
Comment on Claude Code's source code leaked in ~tech
hungariantoast Link ParentYeah how about a link from The Register? That's probably not slop (god please let it not be slop)Yeah how about a link from The Register? That's probably not slop (god please let it not be slop)
-
Comment on Claude Code's source code leaked in ~tech
hungariantoast Link ParentI changed the topic link to point to that blog post instead of the news article.I did some digging, this post actually seems to have a human behind it. Although it mostly seems to be a summarizing the comments from the thread on that orange website. I guess the person behind did a better job creating a summary than the venturebeat AI did.
I changed the topic link to point to that blog post instead of the news article.
-
Comment on What are people using instead of VS Code? in ~comp
hungariantoast Link ParentI think the primary packages I use that make Emacs look nicer are doom-modeline and ef-themes. Here is my config for doom-modeline: (Note that I use the elpaca package manager for Emacs, so you...I think the primary packages I use that make Emacs look nicer are doom-modeline and ef-themes.
Here is my config for doom-modeline:
(Note that I use the elpaca package manager for Emacs, so you won't be able to copy this code directly into your own
init.el.)(use-package doom-modeline :ensure (:host github :repo "seagle0128/doom-modeline") :custom (doom-modeline-unicode-fallback t) (doom-modeline-enable-word-count t) (doom-modeline-indent-info t) (doom-modeline-total-line-number t) :init (doom-modeline-mode 1))My ef-themes config includes a lot of custom code specific to my system, so I'll just say that I use
ef-springas my light theme andef-dreamas my dark theme.(Also, for any Neovim users who happen to read this, there is a ef-themes.nvim plugin for Neovim.)
For code syntax highlighting, be sure to adjust
treesit-font-lock-levelto some (integer) value from one to four. That controls how much stuff in code syntax is actually highlighted, with level one being very sparse highlighting, and level four (my preferred) being full-skittle.How Emacs functions is half of how it looks though, so I will also give you some packages to look up, and my configs for them, that I think make the Emacs minibuffer the single nicest feature in any text editor or IDE ever to be created:
- vertico
- marginalia
- consult
- orderless
;;;;; vertico (use-package vertico :ensure (:host github :repo "minad/vertico") :custom (vertico-cycle t) :init (vertico-mode)) ;;;;; marginalia (use-package marginalia :ensure (:host github :repo "minad/marginalia") :after vertico :bind (:map minibuffer-local-map ("M-A" . marginalia-cycle)) :init (marginalia-mode)) ;;;;; consult (use-package consult :ensure (:host github :repo "minad/consult") :bind (;; ("C-s" . consult-line) ;; ("C-S-s" . consult-imenu) ("C-S-s" . consult-outline))) ;;;;; orderless (use-package orderless :ensure (:host github :repo "oantolin/orderless") :custom (completion-styles '(orderless basic)) (completion-category-overrides '((file (styles basic partial-completion)))) (completion-pcm-leading-wildcard t))Although, I actually still use counsel/ivy/swiper for searching inside a buffer. Here are my configs for those:
;;;;; counsel (use-package counsel :ensure (:host github :repo "abo-abo/swiper")) ;;;;; ivy (use-package ivy :ensure (:host github :repo "abo-abo/swiper")) ;;;;; swiper (use-package swiper :ensure (:host github :repo "abo-abo/swiper") :bind (("C-s" . swiper-isearch) ("C-r" . swiper-isearch-backward) ("M-s ." . swiper-isearch-thing-at-point) (:map swiper-map ("<escape>" . keyboard-escape-quit)))) -
Comment on Nvim 0.12 released in ~comp
hungariantoast LinkWith this release, Neovim now has a built-in package manager: PackWith this release, Neovim now has a built-in package manager: Pack
-
The cognitive dark forest
31 votes -
Nvim 0.12 released
16 votes -
Comment on A.T.L.A.S: outperform Claude Sonnet with a 14B local model and RTX 5060 Ti in ~tech
hungariantoast Link ParentModels.dev is similar to OpenRouter in that it lists information for inference providers. However, it's just a database, not a service. The idea is: you figure out what model you want to use, then...Models.dev is similar to OpenRouter in that it lists information for inference providers. However, it's just a database, not a service. The idea is: you figure out what model you want to use, then go to models.dev to find the best provider for that model. If you are only interested in using a single model (or family of models, like Qwen or GLM), then it is often cheaper to use the provider's API directly (or their subscription, depending on usage) than to go through OpenRouter.
OpenCode Zen is a "provider aggregator" like OpenRouter, where you have a single account, but get access to many providers and models. Zen does not have a "platform fee" like OpenRouter. The models and providers available through Zen are also more curated, but that means Zen's selection of models and providers is limited compared to OpenRouter.
There's also OpenCode Go. I actually bought this a few days ago, because the first month is only $5, and I wanted to try larger models that I cannot self host. The subscription is normally $10/month though, and only gives access to four models:
- GLM-5
- Kimi K2.5
- MiniMax M2.7
- MiniMax M2.5
If you have never worked with big models before, OpenCode Go is what I would recommend trying first. The GLM and Kimi models are good. The MiniMax models are fine for most things, but not as capable in my (limited) experience.
I also think Go's usage limits are good. I have read people online complain about hitting their weekly limits in a single day. I don't know how they could manage that, unless they were running a model in an automated loop to slopcode a project entirely with AI. With the way that I use models[1], I have struggled to hit 10% of the subscription's five-hour usage limit, let alone an entire week's worth. I'm still very new to using LLMs though. Your experience might be different.
If your AI usage is very high, then a subscription from a specific provider will (most likely) work out to be cheaper than API pricing (API pricing through that provider alone, or through an aggregator). I am not aware of any "provider aggregators" like OpenRouter or OpenCode Zen that offer a subscription instead of API pricing.
- Right now the main way I use AI is to do maintenance work. For example I might tell the model something like: "scan this code file, find any bugs or uncaught errors, report them in
bugs.mdwith your recommended fix".
-
Comment on I think Tildes moderators and admins may need to make a decision regarding how to handle Harry Potter related posts in ~tildes
hungariantoast Link ParentAutocorrect is a motherfucker. The correct word is "segue". I'll delete this after you fix it. Vaya con Dios.Segway
Autocorrect is a motherfucker. The correct word is "segue". I'll delete this after you fix it. Vaya con Dios.
-
Comment on Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x in ~tech
hungariantoast Link ParentNo problem! Like I said, I've spent way too much time lately messing with this stuff. I'm happy to have an excuse to write about it One quick note about AMD though: I generally get much faster...No problem! Like I said, I've spent way too much time lately messing with this stuff. I'm happy to have an excuse to write about it
One quick note about AMD though: I generally get much faster processing and generation speeds when I run llama.cpp with ROCm than I do with Vulkan.
However, getting ROCm installed can be a huge pain, and whether it supports your card (and what capabilities it supports on your card) is difficult to figure out.
On top of that, when I run llama.cpp with ROCm and have a model loaded (not even doing anything, just loaded into VRAM), my computer becomes almost unusable. I can't even switch focus to another window without gnarly stuttering. ROCm seems much more aggressive with how it allocates VRAM. I think if I were running llama.cpp/ROCm "headless" on this computer, and doing all my other work on another device, it would work great, but I haven't got around to trying that yet.
-
Comment on Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x in ~tech
hungariantoast Link ParentI did a quick test with Qwen3.5-4B. Specifically, I used the UD-Q4_K_XL GGUF from Unsloth for llama.cpp, and the default Q4_K_M quant selected by Ollama. Here is the command for llama.cpp:...I did a quick test with Qwen3.5-4B. Specifically, I used the UD-Q4_K_XL GGUF from Unsloth for llama.cpp, and the default Q4_K_M quant selected by Ollama.
Here is the command for llama.cpp:
llama-server --model ~/.local/models/hugging_face/unsloth/Qwen3.5-4B-UD-Q4_K_XL.gguf \ --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 \ --repeat-penalty 1.1 --reasoning on --verbosity 3The command for Ollama:
ollama --verbose run qwen3.5:4bFor both Ollama and llama.cpp I used Vulkan for GPU acceleration.
The prompt was:
What does the word "erudite" mean?The results were:
Ollama llama.cpp Prompt size 21 tokens 21 tokens Prompt processing duration 89.01ms 83.57ms Prompt processing speed 235.92 tokens/s 251.29 tokens/s Generation size 794 tokens 742 tokens Generation time 13.80s 9.59s Generation speed 57.53 tokens/s 77.44 tokens/s There was a noticeable bump in generation speeds for llama.cpp. This was not a perfectly fair comparison though. I did not use the exact same model file for each test, but that is because Ollama hashes the model files it downloads and I don't know how to run them in llama.cpp. I also don't know where Ollam downloads its model files from, and I am too lazy to find out.
However, llama.cpp should actually be at a disadvantage, because I used the maximum context window size and a much larger model file (5.6 GiB) for its test, than I did with Ollama (3.2 GiB).
Of course, Ollama uses (a fork of) llama.cpp under the hood, so I'm sure there are ways to tweak it to perform similarly. If you're going to do that though, you might as well use llama.cpp directly. It has a web UI, a routing mode, automatic model offloading, and recently added support for MCP servers. I'm not sure if Ollama offers anything llama.cpp does not.
Recently I have spent more time than I care to admit experimenting with llama.cpp, OpenCode, and generally just finding ways to make local models useful. If you have any more questions, feel free to ask.
-
Comment on Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x in ~tech
hungariantoast (edited )Link ParentAnyone can implement TurboQuant. There's already work being done to add it to llama.cpp (what Ollama uses under the hood) and other inference software. It's also possible to use TurboQuant on...Anyone can implement TurboQuant. There's already work being done to add it to llama.cpp (what Ollama uses under the hood) and other inference software. It's also possible to use TurboQuant on existing models.
Also, Ollama is kind of bad :( . I'd recommend just using llama.cpp directly because it gives you more control over how a model is ran, and you will get better performance.
For reference, here is the command I use to run a GGUF formatted Qwen3.5-4B model on my system:
llama-server --model ~/.local/models/hugging_face/unsloth/Qwen3.5-4B-GGUF/Qwen3.5-4B-UD-Q4_K_XL.gguf \ --ctx-size 262144 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 \ --reasoning on --repeat-penalty 1.1--ctx-size 262144: Sets the size of the "context window" (how many tokens the model can "remember" at once). I set it to 262,144, the maximum size supported by Qwen3.5 (without shenanigans). However, I am also using a GPU with 12GB VRAM, so you should probably drop this number down to about 128,000 (at least).
--temp 0.6: Sets the "termperature" of the model's output to 0.6 on a scale of 0 to 1. This controls how deterministic or random a model's responses will be. A lower temperature is more deterministic and "focused", while a higher temperature is more random and "creative". What type of work you want the model to do will determine the temperature you want to run it at. Keep in mind that even if you set the temperature to 1, a model's output is never truly, completely deterministic.
--top-p 0.95: Sets the "nucleus sampling probability" to 0.95. I don't understand this flag very well, but I think the way it works is that the model forms a list of tokens for the next prediction (such as the next word in a sentence). These tokens are sorted with their individual probabilities, in descending order (t1=0.5, t2=0.2, t3=0.1, t4=0.09, etc). The model then collects the first X number of tokens until the cumulative probability of all collected tokens is 0.95.
--top-k 20: Does kind of the same thing as
top-p, but limits the amount of tokens for the next prediction to 20.--mini-p 0.00: Filters out tokens whose relative probability to the highest probability token is below the threshold. A value of 0.00 disables the feature and allows for broader sampling for the next prediction.
--reasoning on: Enables the model's "reasoning" mode where it "thinks" about the response it will give before actually responding. This is disabled by-default for the 9B and smaller Qwen3.5 models. Whatever model you are using might not even support reasoning, and whether you want to enable it or not will depend on the model and the work it will do.
--repeat-penalty 1.1: Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully discourage idiot doom spirals. Applies a slight penalty to repeated tokens to hopefully disc^C
-
Comment on A.T.L.A.S: outperform Claude Sonnet with a 14B local model and RTX 5060 Ti in ~tech
hungariantoast LinkI originally came across this on Lobsters and the OP there had a pretty good summary of the methodology:A.T.L.A.S achieves 74.6% LiveCodeBench pass@1-v(k=3) with a frozen 14B model on a single consumer GPU -- up from 36-41% in V2 -- through constraint-driven generation and self-verified iterative refinement. The premise: wrap a frozen smaller model in intelligent infrastructure -- structured generation, energy-based verification, self-verified repair -- and it can compete with frontier API models at a fraction of the cost. No fine-tuning, no API calls, no cloud. Fully self-hosted -- no data leaves the machine, no API keys required, no usage metering. One GPU, one box.
I originally came across this on Lobsters and the OP there had a pretty good summary of the methodology:
You'd have to try it out to see how it works for you, but the trick they use is pretty clever. When you ask an AI to write code, it doesn’t always get it right. Sometimes the code has bugs, sometimes it misunderstands the problem entirely. A naive way to address that is to generate a few solutions and test each one. The odds that at least one works go way up. ATLAS generates multiple attempts, running each through a test suite. Each retry also gets told what went wrong with the previous attempt, so it can try to avoid the same mistake.
But this can be pretty slow since you have to run the code in an isolated environment, check the outputs, wait for it to finish. Doing that for every candidate quickly adds up. So ATLAS has another shortcut for avoiding unnecessary testing. Instead of simply generating solutions and testing all of them, it tries to predict which one is most likely correct before running any tests.
ATLAS also asks the model for an embedding of what it just wrote which acts as a fingerprint. Two similar pieces of code will produce similar fingerprints. A well-written, confident solution will produce a different fingerprint than a confused, buggy one.
These fingerprints get fed into a separate, much smaller neural network called the Cost Field. This little network was trained ahead of time on examples where they already knew which solutions were correct and which were wrong. It learned to assign a score to each fingerprint. Correct solutions get a low score and incorrect ones get a high one.
So the process is to generate multiple solutions, get their fingerprints, score each one, and pick the lowest. Only that one gets tested. The Cost Field picks correctly about 88% of the time according to the repo.
-
A.T.L.A.S: outperform Claude Sonnet with a 14B local model and RTX 5060 Ti
43 votes -
Comment on Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x in ~tech
hungariantoast LinkBlog post from Google: https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/ And the paper: https://arxiv.org/abs/2504.19874Blog post from Google:
https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/
And the paper:
-
Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x
44 votes -
Comment on Harry Potter and the Philosopher’s Stone | Teaser in ~tv
hungariantoast Link ParentOkay sure, let's say the documentation I linked actually does constitute "exact rules" on how the malice label should be used, that I am misusing the label, and that Deimos wants me to stop...Okay sure, let's say the documentation I linked actually does constitute "exact rules" on how the malice label should be used, that I am misusing the label, and that Deimos wants me to stop misusing the label that way.
Even then, how have I acted maliciously or in bad faith? Because for that to be the case would require malicious intent. I really don't understand how you can read such intent in my original comment, especially if you "understand the mechanics" of how the malice label works.
You know what, it's not important. Let me just say the important thing: when I use the malice label to report a comment that I think has been mislabeled as exemplary, but is not otherwise malicious, I don't believe I am acting maliciously or in bad faith.
-
Comment on Harry Potter and the Philosopher’s Stone | Teaser in ~tv
hungariantoast (edited )Link ParentI'm not sure how you could actually think that, unless you don't understand how the malice label works. Let me break that down for you: When you label a comment with the malice label, you have to...I'm not sure how you could actually think that, unless you don't understand how the malice label works. Let me break that down for you:
When you label a comment with the malice label, you have to write and submit a message before the label is actually applied (just like with the exemplary label). Once you write that message and apply the label, Deimos receives a notification of the malice label's usage and your message along with it. Deimos then reads the notification and your message, and acts accordingly.
The malice label does not apply any sort of penalty to the comment it was used on, unlike the joke, offtopic, or noise labels. The malice label's only immediate effect is to notify Deimos of a problem.[1]
1. At least, I'm pretty sure that's the case. If it's not the case, and the malice label actually does apply some sort of rank penalty to the comment, then tough titties. Think of it, in the context of its use for an inappropriately used exemplary label, as an offset of that inappropriate usage ;)
-
Comment on Harry Potter and the Philosopher’s Stone | Teaser in ~tv
hungariantoast Link ParentYou can still use the malice label and just write in the message something like "I don't actually think this is malicious but it isn't exemplary either". I've done that before and, while Deimos...You can still use the malice label and just write in the message something like "I don't actually think this is malicious but it isn't exemplary either". I've done that before and, while Deimos has never told me it's okay, he's never told me to stop doing it either :)
To get tagging permission? Yes