43
votes
A.T.L.A.S: outperform Claude Sonnet with a 14B local model and RTX 5060 Ti
Link information
This data is scraped automatically and may be incorrect.
- Title
- GitHub - itigges22/ATLAS: Adaptive Test-time Learning and Autonomous Specialization
- Authors
- itigges22
- Word count
- 1354 words
I originally came across this on Lobsters and the OP there had a pretty good summary of the methodology:
If I follow along with what they're describing, then I'm unclear on how easily this is generalizable. If the smaller Cost Field network needs to be trained on what a good solution looks like, doesn't that make it limited to a fairly narrow slice of problems that it was trained on? Or am I getting tripped up by the simplified explanation?
Running locally is great for privacy, but also if this is that good, then even renting it through buying credits for some datacenter provider could be awesome, because the price per task is about 10x lower than the big providers like GPT or Claude.
Ever since I needed a visual-language model for OCR for a hobby project and found out that I can OCR 9000 webcomics for about 2.50 USD because even the biggest QWEN model ran through an independent provider is that much cheaper than ChatGPT and for this task just as good, I've been quite excited about similar models.
I've considered doing this (loading a large model into a cloud host) but hadn't found a project that warranted it yet, so haven't looked at provider pricing or features.
Who did you end up using?
I just used openrouter.ai because it's simple, it allowed me to test all the models before finding one that's good enough and while obviously they need to have an overhead cost to be sustainable it seemed low enough to be inconsequential for me.
Models.dev is similar to OpenRouter in that it lists information for inference providers. However, it's just a database, not a service. The idea is: you figure out what model you want to use, then go to models.dev to find the best provider for that model. If you are only interested in using a single model (or family of models, like Qwen or GLM), then it is often cheaper to use the provider's API directly (or their subscription, depending on usage) than to go through OpenRouter.
OpenCode Zen is a "provider aggregator" like OpenRouter, where you have a single account, but get access to many providers and models. Zen does not have a "platform fee" like OpenRouter. The models and providers available through Zen are also more curated, but that means Zen's selection of models and providers is limited compared to OpenRouter.
There's also OpenCode Go. I actually bought this a few days ago, because the first month is only $5, and I wanted to try larger models that I cannot self host. The subscription is normally $10/month though, and only gives access to four models:
If you have never worked with big models before, OpenCode Go is what I would recommend trying first. The GLM and Kimi models are good. The MiniMax models are fine for most things, but not as capable in my (limited) experience.
I also think Go's usage limits are good. I have read people online complain about hitting their weekly limits in a single day. I don't know how they could manage that, unless they were running a model in an automated loop to slopcode a project entirely with AI. With the way that I use models[1], I have struggled to hit 10% of the subscription's five-hour usage limit, let alone an entire week's worth. I'm still very new to using LLMs though. Your experience might be different.
If your AI usage is very high, then a subscription from a specific provider will (most likely) work out to be cheaper than API pricing (API pricing through that provider alone, or through an aggregator). I am not aware of any "provider aggregators" like OpenRouter or OpenCode Zen that offer a subscription instead of API pricing.
bugs.mdwith your recommended fix".It’s easy to start hitting limits if you run things in parallel. My weekly code audit getting this territory, as I have it create GitHub issues for all the things it finds and then it runs through them all in parallel to have fully tested pull requests that I can sift through afterwards.
I experimented with rnj-1 on my local 3070 over the weekend, but at 8B it fit on my card. In their self reported metrics, it generally outperforms Qwen 3's 8B (ATLAS ran with Qwen 3 14B). I wonder what kind of quality ATLAS can provide for smaller models.
I would be interested in trying this. The prices for the cards are not high and I have enjoyed working with Claude Sonnet. Can someone provide a rundown on how this might compare on general knowledge or scientific tasks?
Models this small do not have much general knowledge. They are usually better at semantic extraction or transformation.