Tildes

Activity

Votes

Comments

New

All activity

Showing only topics with the tag "language models". Back to normal view

A theory of prompt injection (and why you should study roles)

~comp Article 4684 words

5 comments

role-confusion.github.io

2 days ago

27 votes
Power consumption of LLM's

~tech Ask

I haven't been closely following the releases of new models and the research papers that sometimes go along with them. So I was wondering: is power consumption ever seriously talked about by...

I haven't been closely following the releases of new models and the research papers that sometimes go along with them. So I was wondering: is power consumption ever seriously talked about by OpenAI, Google, Anthropic, and others? Do we get some specific numbers of how much power their models actually consume to produce 100 tokens? The cost of training some of these models? Or we're not yet at that stage yet and nobody cares and at best we can just get really rough estimates based on "trust me bro" tweets by their respective CEO's?

Some time ago, I came across this post. GPT-OSS (2OB, 120B) are meager, yet energy efficient models and I was surprised that the power consumption is still larger than what I had estimated before.

For 120B to generate a 1000 tokens (which is really not a lot) it would take up around 83 Wh of electricity. For context my PC consumes ~100Wh when idling. So it's almost equivalent as leaving my PC on for an hour. Considering that proprietary models are TRILLIONS of parameters and probably not as energy efficient the true power consumption of these models is concerning. Of course, these big data centers do a lot of things to maximize the efficiency that this "test" fails to do. But even if you half the energy consumption it's still significant given their size and their pervasiveness in handling everyday, trivial tasks.

I haven't come across similar posts or studies for newer open source models, so if somebody has, please share.

Also, I don't seem to fully understand how does context size fit into all of this. The larger the context size the more power it would take to produce those 100 tokens?

10 comments

milkywayflyinginsect

6 days ago

26 votes
How to buy cheap Claude tokens in China

~tech Article 1868 words

2 comments

chinatalk.media

June 25

36 votes
Does generative AI have a natural limit without a major innovation?

~comp Ask

I was musing about this recently with the recent models becoming more capable. The core of gen AI is the model, which is trained on a massive dataset. To date, gen AI has improved because the...

I was musing about this recently with the recent models becoming more capable. The core of gen AI is the model, which is trained on a massive dataset. To date, gen AI has improved because the models have become larger, more efficient, the data they are trained on has become better and the software/harnesses around them has improved to help query them.

As I see it, surely the bottleneck will soon become the data they are trained on? If we imagine a scenario where a models could consume an infinite amount of training data, and there is no limit to the training time or quality. The sum of human skill/knowledge is the limiting factor. Gen AI should (in theory) never be able to out preform or push the boundary of the sum of humanity at time of training.

Or, counterpoint, is there enough randomness and speed to iterate that gen AI can actually step change and improve if training times/cost were less prohibitive? Most companies/models today will save good output and feed it back into the next iteration, but right now that's taking months. What if that took minutes?

What do you think?

Is gen AI going to take us to general intelligence?
Will gen AI get to a place where it's "intelligence" and reasoning is actually better than the sum of Humanity?

42 comments

kaffo

June 14

28 votes
AI is bringing my friend out of retirement
~comp
- programming
Ask
I have a friend that is lucky enough to have retired at 40. A year ago he was adamant he'd never work again, having been burnt out from his time at big tech. Back then he was also an absolute AI...

I have a friend that is lucky enough to have retired at 40. A year ago he was adamant he'd never work again, having been burnt out from his time at big tech. Back then he was also an absolute AI hater and wouldn't listen to anyone who claimed LLMs were useful for programming.

He finally tried LLMs when Claude Opus 4.6 released and immediately changed his mind in the face of the overwhelming evidence that LLMs can in fact program pretty well. And now with the release of Fable 5 he's giddily creating all sorts of things that would have taken far too long to make prior to AI-accelerated software development. He actually plans to try and found his own business now. He's a very smart guy, so I hope he can make something interesting that people want.

There are a lot of AI doomers and haters. In person I mostly see people doing the same thing they've always done, but now saving time on various tasks. But this is the first time I've seen someone go from grumpy and checked out to giddy and optimistic thanks to LLMs.

19 comments

teaearlgraycold

June 10

38 votes
What about having an LLM teach you to code?
~comp
- programming
Ask
My daughter (11) is doing a week long Python class, which is not using LLMs. It got me thinking about how I learned to program in the pre-internet days (laboriously, from books), and then what a...

My daughter (11) is doing a week long Python class, which is not using LLMs.

It got me thinking about how I learned to program in the pre-internet days (laboriously, from books), and then what a marvel it was when you could just search for information, especially for troubleshooting. But for her, the first answer in the Google search is going to be the AI summary, and most of her search tools are going to be AI tools.

I wonder if it would be possible to make an LLM that has a didactic/socratic mode. So if you said, "help me write a program to do madlibs" maybe it would give you a skeleton of a function, then prompt you to come to with a plan, then critique that plan. Or if you said, "I'm getting this error", it wouldn't just fix it, it would explain what the error means and nudge you towards the answer.

Thinking in a larger sense, it could have a rubric of important concepts, even tiers of understanding. It could be using the interactions to track the user's understanding, which could let it then tune how it answers future questions, or even be used to customize assignments.

I recognize that this is potentially replacing a teacher with a machine, which wouldn't be my goal. Good teachers are more holistic in their teaching than a machine is ever likely to be. But for people who don't have access to good teachers, or need more directed support than is available from a teacher, or just want to self study, it seems like it could be a valuable addition.

Until they solve the obsequiousness problem, it would be vulnerable to prompt hacking, so really more of a tool for someone who recognizes the value of learning over just being given the answer.

What do folks think about using such a tool? What would you want it to do, or not do?

Aside: I forgot until I reached the end of this post, but this is also (somewhat) the plot of The Diamond Age, or A Young Lady's Illustrates Primer by Neal Stephenson.

23 comments

first-must-burn

June 16

25 votes
Access to Fable and Mythos 5 cut off after US government order

~tech Article 759 words

58 comments

anthropic.com

June 13

57 votes
Will you be left behind if you don't use LLMs to code?
~comp
- programming
Video 18:12, published May 20 2026
15 comments

YouTube: You Suck at Programming

June 15

17 votes
Our workplace LLM mass delusion

~tech Article 1669 words

14 comments

avas.space

June 11

40 votes
Landmark German ruling declares Google's AI Overviews are Google's own words and makes it liable for false answers
~tech
- google.search
Article 1200 words
16 comments

the-decoder.com

June 11

80 votes
Claude Fable 5 and Claude Mythos 5

~tech Article 2830 words

49 comments

anthropic.com

June 9

44 votes
If Claude Fable stops helping you, you'll never know

~tech Article 445 words

8 comments

jonready.com

June 10

33 votes
The user is visibly frustrated

~tech Article 605 words, published May 6 2026

44 comments

pscanf.com

June 2

39 votes
Fine-tuning an LLM to write docs like it's 1995

~comp Article 1779 words, published Jun 1 2026

1 comment

passo.uno

June 8

11 votes
Code is cheap(er)

~comp Article 867 words

13 comments

htmx.org

June 5

23 votes
When AI builds itself — progress toward recursive self-improvement and its implications

~tech Article 4721 words

27 comments

anthropic.com

June 4

25 votes
Have you tried Pewdiepies' self-hosted AI workspace, Odysseus?
~tech
- linux
- privacy
Video 17:07, published May 31 2026
30 comments

YouTube: PewDiePie

June 4

18 votes
rsync and outrage
~comp
Article 1393 words
10 comments

Medium: Andrew Tridgell

June 5

32 votes
Did Claude increase bugs in rsync?
~comp
- open source
- security
Article 3875 words
1 comment

alexispurslane.github.io

June 5

21 votes
Clanker: A word for the machine

~tech Article 1973 words, published May 26 2026

45 comments

pocoo.org

June 1

40 votes
It's not just X. It's Y.

~humanities Article 2021 words

11 comments

cyberneticforests.com

June 1

29 votes
Introducing WebGPU support for llama.cpp

~tech Link

2 comments

reeselevine.github.io

May 29

12 votes
Actually useful MCPs
~comp
- programming
Ask (advice)
I'm a web developer and find the playwright MCP to be genuinely useful. My LLM is able to navigate my site, measure the size of elements, see console errors, network requests, etc. This is the...

I'm a web developer and find the playwright MCP to be genuinely useful. My LLM is able to navigate my site, measure the size of elements, see console errors, network requests, etc. This is the only MCP I've ever installed and haven't yet had any cause to use others. But I'm interested in hearing what other professionals are using.

16 comments

teaearlgraycold

May 28

28 votes
I think Anthropic and OpenAI have found product-market fit

~tech Article 1809 words

51 comments

simonwillison.net

May 29

32 votes
Language models are weird for the same reason human cultures are weird

~comp Article 3434 words

3 comments

davidoks.blog

May 24

26 votes
The silent critic
~comp
- open source
Article 1761 words
1 comment

tft.io

May 29

3 votes
Project Glasswing: An initial update
~tech
- security
Article 2555 words
12 comments

anthropic.com

May 22

24 votes
How I feel about LLM (AI) writing

~tech Ask

I love writing, it's one of the most human things about humanity. It's communication, art and sharing all at once. It's been fundamental to culture and progress for 1000's of years. LLMs are, in a...

I love writing, it's one of the most human things about humanity. It's communication, art and sharing all at once. It's been fundamental to culture and progress for 1000's of years.

LLMs are, in a way, really good at writing. They have the larger part of human creative output distilled into their weights. So it was inevitable that more and more people would start publishing articles and blog posts written (all or in part) by AI agents.

I don't like it but I accept it, there really isn't anything I can do about it. What I was hoping, though, is that high signal to noise ratio places on the internet (Tildes among them) would reject it and we could go on consuming 100% organic prose, at least for a while.

And for while that's exactly what happened. In techy places like Hacker News, AI posts were quickly flagged and downvoted into oblivion. At Tildes they mostly didn't show up at all, or if they did I missed them.

That seems to be ending though. Now I see agent written pieces on the front page of HN with 100's of comments. There's always a highly upvoted comment pointing out that the piece is slop, but you have to scroll to find it.

The reason I use HN as an example is that it's full of people with extensive experience using AI agents who are in a position to tell if something is slop. And it looks like the larger part of readers (or at least commenters) can't tell the difference anymore. If that's true at HN, it's going to be true everywhere.

It is getting harder to tell when something is slop, people are post editing, handwriting intros and getting better at prompting to remove obvious LLM tells. But if you have any practical experience with these tools, it's still pretty easy to tell. Somewhere during post training certain patterns end up getting heavily favored. Interestingly, many of them happen across all of the frontier models. Em-dashes are the most famous but there are so many more. Most are rhetorical tricks or formatting patterns rather than punctuation.

Reading LLM prose, many of the tropes don't stand out at first, instead they land as strong writing. But after you see them repeat enough times they start to become obvious. Even putting the tropes aside, the hallmark of a lot of LLM writing is that it's more rhetoric than substance. Low signal, lots of noise.

I don't have a solution, it's starting to look like many (maybe most) people aren't going to be able to tell when they're consuming something that required minimal thought by the "author" who prompted the AI. Which is sad because, up until now, we could assume that, when we read something, someone cared enough to put time and mental bandwidth into creating it. That's become increasingly less true.

I suppose this post is me feeling wistful for the internet we used to have, written exclusively by humans. I continue to hope that people will reject slop at places like Tildes, but in order for them to do that they have to be able to identify it. Maybe people will get better at that, there is definitely a point where you've consumed enough slop that you can smell it from a mile away. But of course the slop is going to keep getting harder to detect.

I don't want to go as far as to say that slop will take over the internet, I think (hope) that people will keep wanting to read organic, human, writing. And that as a result we'll come up with strategies and solutions to support that.

It's a weird time. Right now every LLM blog post and article that goes viral is signalling to the prompter, and anyone watching who can tell what's happening, that there is demand for slop. And of course with demand comes profit. I think we're at the beginning of a steep curve.

61 comments

post_below

May 12

44 votes
Aurora: A leverage-aware optimizer for rectangular matrices

~comp Article 5001 words

7 comments

tilderesearch.com

May 9

14 votes
Gemini 3.2 Flash rumored to hit 92% of GPT-5.5 performance at lower cost
~tech
- google
Link
29 comments

di.gg

May 14

23 votes
AI chatbots

~tech Video 29:43, published Apr 27 2026

1 comment

YouTube: LastWeekTonight

May 12

9 votes
Multi-Token Prediction (MTP) with Gemma 4

~comp Link

6 comments

maartengrootendorst.com

May 5

20 votes
Teaching Claude why

~comp Article 1745 words

2 comments

anthropic.com

May 9

17 votes
For thirty years I programmed with Phish on, every day. In 2026, the music is out of phase with the work.

~tech Article 1494 words

19 comments

christophermeiklejohn.com

May 4

32 votes
Synthesizing multi-agent harnesses for vulnerability discovery
~comp
- security
Article
2 comments

arXiv

April 24

9 votes
Prototyping with LLMs

~tech Article 362 words, published Apr 6 2026

8 comments

jim-nielsen.com

April 10

23 votes
Florida opens criminal inquiry into ChatGPT tied to fatal school shooting
~news
- usa.fl
- crime
Article
11 comments

The New York Times

April 21

22 votes
AI: Where in the loop should humans go?
~comp
- programming
Article 2978 words, published Mar 7 2025
4 comments

ferd.ca

April 16

18 votes
Vibe coding is just the return of Excel/Access, with more danger
~comp
- programming
Ask
I probably triggered some PTSD right there. Was just in a meeting at work, where we listed off everything that makes software development hard and slow. An excersize for the thread would be to...

I probably triggered some PTSD right there.

Was just in a meeting at work, where we listed off everything that makes software development hard and slow. An excersize for the thread would be to replicate that list. It turned out that Claude helps with like 1/5th or less of it....especially in a collaborative environment.

So, the situation we're now encountering is that random business areas can vibe code out something, tell nobody, throw it in AWS, have it become a critical part of a business process that fails when they quit, and nobody even has access to look at what was made.

It gives me comfort that in about 5 years there will be a new surge in demand for programmers to reign in all the rogue applications that need shutdown because of the immense risk to continual operation of a company, from data leaks to broken payroll.

It'll be Y2K all over again.

39 comments

vord

April 16

45 votes
Static analysis, dynamic analysis, and stochastic analysis
~comp
- programming
Ask
For a long time programmers have had two types of program verification tools, static analysis (like a compiler's checks) and dynamic analysis (running a test suite). I find myself using LLMs to...

For a long time programmers have had two types of program verification tools, static analysis (like a compiler's checks) and dynamic analysis (running a test suite). I find myself using LLMs to analyze newly written code more and more. Even when they spit out a lot of false positives, I still find them to be a massive help. My workflow is something like this:
1. Commit my changes
2. Ask Claude Opus "Find problems with my latest commit"
3. Look though its list and skip over false positives.
4. Fix the true positives.
5. git add -A && git commit --amend --no-edit
6. Clear Claude's context
7. Back to step 2.
I repeat this loop until all of the issues Claude raises are dismissable. I know there are a lot of startups building a SaaS for things like this (CodeRabbit is one I've seen before, I didn't like it too much) but I feel just doing the above procedure is plenty good enough and catches a lot of issues that could take more time to uncover if raised by manual testing.

It's also been productive to ask for any problems in an entire repo. It will of course never be able to perform a completely thorough review of even a modestly sized application, but highlighting any problem at all is still useful.

Someone recently mentioned to me that they use vision-capable LLMs to perform "aesthetic tests" in their CI. The model takes screenshots of each page before and after a code change and throws an error if it thinks something is wrong.
10 comments

teaearlgraycold

April 15

10 votes
That one study that proves developers using AI are deluded

~tech Ask

I've found myself replying to different people about the early 2025 METR study kind of often. So I thought I'd try posting a top level thread, consider it an unsolicitied public service...

I've found myself replying to different people about the early 2025 METR study kind of often. So I thought I'd try posting a top level thread, consider it an unsolicitied public service announcement.

You might be familiar with the study because it has been showing up alongside discussions about AI and coding for about a year. It found that LLMs actually decreased developer productivity and so people love to use it to suggest that the whole AI coding thing is really a big lie and the people who think it makes them more productive are hallucinating.

Here's the thing about that study... No one seems to have even glanced at it!

First, it's from early 2025, they used Claude Sonnet 3.5 or 3.7. Those models are no way comparable to current gen coding agents. The commonly cited inflection point didn't happen until later in 2025 with, depending on who you ask, Sonnet 4.5 or Opus 4.5

The study was comprised of 16 people! If those 16 were even vaguely representative of the developer population at the time most of them wouldn't have had significant experience with LLMs for coding.

These are not tools that just work out of the box, especially back then. It takes time and experimentation, or instruction, to use them well.

It was cool that they did the study, trying to understand LLMs was a good idea. But it's not what anyone would consider a representative, or even well thought out, study. 16 people!

But wait! They did a follow up study later in 2025.

This time with about 60 people and newer models and tools. In that study they found the opposite effect, AI tools sped developers up (which is a shock to no one who has used these tools long enough to get a feel for them). They also mentioned:

However the true speedup could be much higher among the developers and tasks which are selected out of the experiment.

In addition they had some, kind of entertaining, issues:

Due to the severity of these selection effects, we are working on changes to the design of our study.

Back to the drawing board, because:

Recruitment and retention of developers has become more difficult. An increased share of developers say they would not want to do 50% of their work without AI, even though our study pays them $50/hour to work on tasks of their own choosing. Our study is thus systematically missing developers who have the most optimistic expectations about AI’s value.

And...

Developers have become more selective in which tasks they submit. When surveyed, 30% to 50% of developers told us that they were choosing not to submit some tasks because they did not want to do them without AI. This implies we are systematically missing tasks which have high expected uplift from AI.

And so...

Together, these effects make it likely that our estimate reported above is a lower-bound on the true productivity effects of AI on these developers.

[...]

Some developers were less likely to complete tasks that they submitted if they were assigned to the AI-disallowed condition. One developer did not complete any of the tasks that were assigned to the AI-disallowed condition.

[...]

Altogether, these issues make it challenging to interpret our central estimate, and we believe it is likely a bad proxy for the real productivity impact of AI tools on these developers.

So to summarize, the new study showed a productivity increase and they estimate it's larger than the ~20% increase the study found. Cheers to them for being honest about the issues they encountered. For my part I know for sure that the increase is significantly more than 20%. The caveat, though, is that is only true after you've had some experience with the tools.

The truth is that we don't need a study for this, any experienced engineer can readily see it for themselves and you can find them talking about it pretty much everywhere. It would be interesting, though, to see a well designed study that attempted to quantify how big the average productivity increase actually is.

For that the participants using AI would need to be experienced with it and allowed to use their existing setups.

I want to add that this is not an attempt to evangelize for AI. I find the tools useful but I'm not selling anything. I'm interested in them and I stay up to date on the conversations surrounding them and the underlying technology. I use them frequently both for my own projects and to help less technical people improve their business productivity.

Whether AI agents are a good thing or not, from a larger perspective, is a very different, and complicated, conversation. The important thing is that utility and impact are two different conversations. There isn't a debate anymore about utility.

I know this probably won't stop people from continuing to derail conversations with the claim that developers are wrong about utility, but I had to try. It's just hard to let it pass by when someone claims the sky is green.

I understand that AI makes people angry and I think they have good reason to be angry. There are a lot of aspects of the AI revolution that I'm not thrilled about. The hype foremost, the FOMO as part of the hype, the potential for increased wealth consolidation really sucks, though I lay that at the feet of systems that existed before LLMs came along.

It's messy, but let's consider giving the benefit of the doubt to professionals who say a tool works instead of claiming they're wrong. Let them enjoy it. We can still be angry at AI at the same time.

61 comments

post_below

March 19

82 votes
The center has a bias

~tech Article 1531 words

91 comments

pocoo.org

April 13

35 votes
Anthropic announces deal with Google, Broadcom, says revenue has tripled
~finance
- business
Article 441 words
27 comments

Quartz

April 9

31 votes
AI Coding agents are the opposite of what I want
~comp
- programming
Ask
I've been thinking a lot about LLM assisted development, and in particular why I keep dropping the available tools after a few attempts at using them. I realized recently that it's taking away the...

I've been thinking a lot about LLM assisted development, and in particular why I keep dropping the available tools after a few attempts at using them.

I realized recently that it's taking away the part of software development I enjoy: the creative problem solving that comes with writing code. What's left is code review tasks, testing, security checks, etc. Important tasks, but they all primarily involve heavy concentration, and much less creativity.

Why aren't agents focused on handling the mundane tasks instead? Tell me if I've just introduced a security vulnerability or a runtime bug. Generate realistic test data and give me info on what the likely output would be. Tell me that the algorithm I just wrote is O(n^2).

Those tasks are so much more applicable to matching against existing data, something LLMs should be extremely good at, rather than trying to get them to write something novel, which so far they've been mostly bad at, at least in my experience.

47 comments

karsaroth

April 6

46 votes
Project Glasswing: securing critical software for the AI era
~tech
- security.cyber
Article 1053 words
15 comments

anthropic.com

April 7

25 votes
Claude Mythos preview
~tech
- security.cyber
Article 13 495 words
4 comments

anthropic.com

April 7

25 votes
Harm reduction centered on AI use

~tech Video 1:30:12, published Apr 2 2026

2 comments

YouTube: Dr. Fatima

April 7

9 votes
Gemma needs help

~comp Article 1894 words, published Mar 10 2026

19 comments

lesswrong.com

March 25

31 votes
Designing an agent reading test

~comp Article 3887 words

1 comment

dacharycarey.com

April 6

10 votes
Here’s what the world had to say about the AI economy

~tech Article 674 words

18 comments

windfalltrust.org

April 3

18 votes