Activity

Votes

Comments

New

All activity

Showing only topics in ~comp with the tag "language models". Back to normal view / Search all groups

A theory of prompt injection (and why you should study roles)

Article 4684 words

5 comments

role-confusion.github.io

2 days ago

26 votes
Does generative AI have a natural limit without a major innovation?

Ask

I was musing about this recently with the recent models becoming more capable. The core of gen AI is the model, which is trained on a massive dataset. To date, gen AI has improved because the...

I was musing about this recently with the recent models becoming more capable. The core of gen AI is the model, which is trained on a massive dataset. To date, gen AI has improved because the models have become larger, more efficient, the data they are trained on has become better and the software/harnesses around them has improved to help query them.

As I see it, surely the bottleneck will soon become the data they are trained on? If we imagine a scenario where a models could consume an infinite amount of training data, and there is no limit to the training time or quality. The sum of human skill/knowledge is the limiting factor. Gen AI should (in theory) never be able to out preform or push the boundary of the sum of humanity at time of training.

Or, counterpoint, is there enough randomness and speed to iterate that gen AI can actually step change and improve if training times/cost were less prohibitive? Most companies/models today will save good output and feed it back into the next iteration, but right now that's taking months. What if that took minutes?

What do you think?

Is gen AI going to take us to general intelligence?
Will gen AI get to a place where it's "intelligence" and reasoning is actually better than the sum of Humanity?

42 comments

kaffo

June 14

28 votes
AI is bringing my friend out of retirement
- programming
Ask
I have a friend that is lucky enough to have retired at 40. A year ago he was adamant he'd never work again, having been burnt out from his time at big tech. Back then he was also an absolute AI...

I have a friend that is lucky enough to have retired at 40. A year ago he was adamant he'd never work again, having been burnt out from his time at big tech. Back then he was also an absolute AI hater and wouldn't listen to anyone who claimed LLMs were useful for programming.

He finally tried LLMs when Claude Opus 4.6 released and immediately changed his mind in the face of the overwhelming evidence that LLMs can in fact program pretty well. And now with the release of Fable 5 he's giddily creating all sorts of things that would have taken far too long to make prior to AI-accelerated software development. He actually plans to try and found his own business now. He's a very smart guy, so I hope he can make something interesting that people want.

There are a lot of AI doomers and haters. In person I mostly see people doing the same thing they've always done, but now saving time on various tasks. But this is the first time I've seen someone go from grumpy and checked out to giddy and optimistic thanks to LLMs.

19 comments

teaearlgraycold

June 10

38 votes
What about having an LLM teach you to code?
- programming
Ask
My daughter (11) is doing a week long Python class, which is not using LLMs. It got me thinking about how I learned to program in the pre-internet days (laboriously, from books), and then what a...

My daughter (11) is doing a week long Python class, which is not using LLMs.

It got me thinking about how I learned to program in the pre-internet days (laboriously, from books), and then what a marvel it was when you could just search for information, especially for troubleshooting. But for her, the first answer in the Google search is going to be the AI summary, and most of her search tools are going to be AI tools.

I wonder if it would be possible to make an LLM that has a didactic/socratic mode. So if you said, "help me write a program to do madlibs" maybe it would give you a skeleton of a function, then prompt you to come to with a plan, then critique that plan. Or if you said, "I'm getting this error", it wouldn't just fix it, it would explain what the error means and nudge you towards the answer.

Thinking in a larger sense, it could have a rubric of important concepts, even tiers of understanding. It could be using the interactions to track the user's understanding, which could let it then tune how it answers future questions, or even be used to customize assignments.

I recognize that this is potentially replacing a teacher with a machine, which wouldn't be my goal. Good teachers are more holistic in their teaching than a machine is ever likely to be. But for people who don't have access to good teachers, or need more directed support than is available from a teacher, or just want to self study, it seems like it could be a valuable addition.

Until they solve the obsequiousness problem, it would be vulnerable to prompt hacking, so really more of a tool for someone who recognizes the value of learning over just being given the answer.

What do folks think about using such a tool? What would you want it to do, or not do?

Aside: I forgot until I reached the end of this post, but this is also (somewhat) the plot of The Diamond Age, or A Young Lady's Illustrates Primer by Neal Stephenson.

23 comments

first-must-burn

June 16

25 votes
Will you be left behind if you don't use LLMs to code?
- programming
Video 18:12, published May 20 2026
15 comments

YouTube: You Suck at Programming

June 15

17 votes
Fine-tuning an LLM to write docs like it's 1995

Article 1779 words, published Jun 1 2026

1 comment

passo.uno

June 8

11 votes
Code is cheap(er)

Article 867 words

13 comments

htmx.org

June 5

23 votes
rsync and outrage
Article 1393 words
10 comments

Medium: Andrew Tridgell

June 5

32 votes
Did Claude increase bugs in rsync?
- open source
- security
Article 3875 words
1 comment

alexispurslane.github.io

June 5

21 votes
Actually useful MCPs
- programming
Ask (advice)
I'm a web developer and find the playwright MCP to be genuinely useful. My LLM is able to navigate my site, measure the size of elements, see console errors, network requests, etc. This is the...

I'm a web developer and find the playwright MCP to be genuinely useful. My LLM is able to navigate my site, measure the size of elements, see console errors, network requests, etc. This is the only MCP I've ever installed and haven't yet had any cause to use others. But I'm interested in hearing what other professionals are using.

16 comments

teaearlgraycold

May 28

28 votes
Language models are weird for the same reason human cultures are weird

Article 3434 words

3 comments

davidoks.blog

May 24

26 votes
The silent critic
- open source
Article 1761 words
1 comment

tft.io

May 29

3 votes
Aurora: A leverage-aware optimizer for rectangular matrices

Article 5001 words

7 comments

tilderesearch.com

May 9

14 votes
Multi-Token Prediction (MTP) with Gemma 4

Link

6 comments

maartengrootendorst.com

May 5

20 votes
Teaching Claude why

Article 1745 words

2 comments

anthropic.com

May 9

17 votes
Synthesizing multi-agent harnesses for vulnerability discovery
- security
Article
2 comments

arXiv

April 24

9 votes
AI: Where in the loop should humans go?
- programming
Article 2978 words, published Mar 7 2025
4 comments

ferd.ca

April 16

18 votes
Vibe coding is just the return of Excel/Access, with more danger
- programming
Ask
I probably triggered some PTSD right there. Was just in a meeting at work, where we listed off everything that makes software development hard and slow. An excersize for the thread would be to...

I probably triggered some PTSD right there.

Was just in a meeting at work, where we listed off everything that makes software development hard and slow. An excersize for the thread would be to replicate that list. It turned out that Claude helps with like 1/5th or less of it....especially in a collaborative environment.

So, the situation we're now encountering is that random business areas can vibe code out something, tell nobody, throw it in AWS, have it become a critical part of a business process that fails when they quit, and nobody even has access to look at what was made.

It gives me comfort that in about 5 years there will be a new surge in demand for programmers to reign in all the rogue applications that need shutdown because of the immense risk to continual operation of a company, from data leaks to broken payroll.

It'll be Y2K all over again.

39 comments

vord

April 16

45 votes
Static analysis, dynamic analysis, and stochastic analysis
- programming
Ask
For a long time programmers have had two types of program verification tools, static analysis (like a compiler's checks) and dynamic analysis (running a test suite). I find myself using LLMs to...

For a long time programmers have had two types of program verification tools, static analysis (like a compiler's checks) and dynamic analysis (running a test suite). I find myself using LLMs to analyze newly written code more and more. Even when they spit out a lot of false positives, I still find them to be a massive help. My workflow is something like this:
1. Commit my changes
2. Ask Claude Opus "Find problems with my latest commit"
3. Look though its list and skip over false positives.
4. Fix the true positives.
5. git add -A && git commit --amend --no-edit
6. Clear Claude's context
7. Back to step 2.
I repeat this loop until all of the issues Claude raises are dismissable. I know there are a lot of startups building a SaaS for things like this (CodeRabbit is one I've seen before, I didn't like it too much) but I feel just doing the above procedure is plenty good enough and catches a lot of issues that could take more time to uncover if raised by manual testing.

It's also been productive to ask for any problems in an entire repo. It will of course never be able to perform a completely thorough review of even a modestly sized application, but highlighting any problem at all is still useful.

Someone recently mentioned to me that they use vision-capable LLMs to perform "aesthetic tests" in their CI. The model takes screenshots of each page before and after a code change and throws an error if it thinks something is wrong.
10 comments

teaearlgraycold

April 15

10 votes
AI Coding agents are the opposite of what I want
- programming
Ask
I've been thinking a lot about LLM assisted development, and in particular why I keep dropping the available tools after a few attempts at using them. I realized recently that it's taking away the...

I've been thinking a lot about LLM assisted development, and in particular why I keep dropping the available tools after a few attempts at using them.

I realized recently that it's taking away the part of software development I enjoy: the creative problem solving that comes with writing code. What's left is code review tasks, testing, security checks, etc. Important tasks, but they all primarily involve heavy concentration, and much less creativity.

Why aren't agents focused on handling the mundane tasks instead? Tell me if I've just introduced a security vulnerability or a runtime bug. Generate realistic test data and give me info on what the likely output would be. Tell me that the algorithm I just wrote is O(n^2).

Those tasks are so much more applicable to matching against existing data, something LLMs should be extremely good at, rather than trying to get them to write something novel, which so far they've been mostly bad at, at least in my experience.

47 comments

karsaroth

April 6

46 votes
Gemma needs help

Article 1894 words, published Mar 10 2026

19 comments

lesswrong.com

March 25

31 votes
Designing an agent reading test

Article 3887 words

1 comment

dacharycarey.com

April 6

10 votes
Google releases Gemma 4

Article 640 words

18 comments

blog.google

April 2

28 votes
Executing programs inside transformers with exponentially faster inference

Link

6 comments

percepta.ai

March 13

14 votes
Can coding agents relicense open source through a “clean room” implementation of code?
- open source
- programming
Article 1004 words
39 comments

simonwillison.net

March 6

51 votes
Is it worthwhile to run local LLMs for coding today?
- programming
Ask (advice)
I've made the decision to purchase a new M5 Macbook Air because of the memorypocalypse. My current M1 model is already upgraded to the amount of memory and storage as the current base model and...

I've made the decision to purchase a new M5 Macbook Air because of the memorypocalypse. My current M1 model is already upgraded to the amount of memory and storage as the current base model and I'm wondering if it's worth spending the extra 2-4 hundred dollars on memory upgrades today.

My current computer is more than good enough for today but I figure I should probably future proof just in case. I was thinking the 16GB would be enough, but I also know that I'm kind of falling behind by not embracing AI coding agents. According to my research the maximum 32GB is recommended for most coding-relevant models - almost as a minimum.

I work in education so coding is not actually much of a need, and obviously there are cloud providers I could use if I end up needing them in the future. I also have less than a teacher's salary because I work part time, which is the greatest reason why I'm sticking with the 16GB base for the moment, but other than that I also don't do many memory-intensive programs. But I thought I would get some recommendations before they start shipping.

I'd also be interested on people's opinions on trading in my old one, since it'll only get me ~$275 back. I'm considering reneging on that part and keeping it around to act as a web server or give it to my husband who has a computer that still runs Windows 7 and barely uses it.

40 comments

Akir

March 6

35 votes
Hacker used Anthropic's Claude chatbot to attack multiple government agencies in Mexico

Article 455 words, published Feb 25 2026

8 comments

Engadget

March 6

21 votes
microgpt - GPT in 200 lines

Article 4088 words, published Feb 12 2026

2 comments

karpathy.github.io

March 1

32 votes
Updating Eagleson's Law in the age of agentic AI

Ask

Eagleson's Law states "Any code of your own that you haven't looked at for six or more months might as well have been written by someone else." I keep reading how fewer and fewer of the brightest...

Eagleson's Law states

"Any code of your own that you haven't looked at for six or more months might as well have been written by someone else."

I keep reading how fewer and fewer of the brightest developers are writing code and letting their AI agent to do it all. How do they know what's really happening? Does it matter anymore?

Curious to hear this communities thoughts

9 comments

hamitosis

February 25

11 votes
Ladybird chooses Rust as its successor language to C++, with help from AI
- programming languages
- open source
Article 609 words
18 comments

ladybird.org

February 23

33 votes
llOOPy lOOPs
- programming.object oriented
Article 1431 words, published Feb 3 2026
4 comments

autonoma.ca

February 6

12 votes
Blocking Claude

Article 213 words

4 comments

aphyr.com

January 27

28 votes
Can AI tell if I'm writing AI slop? A machine learning journey.

Article 1708 words

3 comments

mattsayar.com

September 15, 2025

21 votes
Defeating nondeterminism in LLM inference

Article 5706 words

2 comments

thinkingmachines.ai

September 10, 2025

15 votes
GPT 5 released

Link

21 comments

openai.com

August 7, 2025

30 votes
Applying Chinese Wall Reverse Engineering to LLM Code Editing

Article

1 comment

arXiv

July 22, 2025

8 votes
I wrote my first Chrome extension to simplify Wikipedia articles

Article 370 words, published Jul 3 2025

17 comments

mattsayar.com

July 7, 2025

15 votes

User-friendly and privacy-friendly LLM experience?

privacy

Ask

I've been thinking perhaps I'll need to get one of the desktop LLM UI. I've been out of touch with the state of the art of end user LLM as I've been exclusively using it via API, but tech-y people...

I've been thinking perhaps I'll need to get one of the desktop LLM UI. I've been out of touch with the state of the art of end user LLM as I've been exclusively using it via API, but tech-y people (who are not developers) mostly talk about the end-user products that I lack the knowledge of.

Ethical problems aside, the problem with non-API usage is, even if you pay, I can't find one that have better privacy policy than API. And the problem with API version is that it is not as good as the completed apps unless you want to reinvent the wheel. The apps also may include ads in the future, while API technically cannot as it would affect some downstream usecases.

Provider	Data Retention (API)	Data Retention (Consumer)	UI-only features
ChatGPT Plus	30 days, no training	Training opt-out, 30 days for temp. chat, unknown retention otherwise	Voice, Canvas, Image generation in chat, screensharing, Mobile app
Google AI Pro	0	72 hours if you disable history, or up to 3 years and trained upon otherwise	Android assistant, Canvas, AI in Google Drive/Docs, RAG (NotebookLM), Podcast generation, Browser use (Mariner), Coding (Gemini CLI), Screensharing
Gemini in Google Workspace	See above	0-18 months, but no human review/training	See above
Claude Pro	30 days	Up to 2 years (no training without opt-in)	Coding, Artifact, Desktop app, RAG, MCP

As a dual use technology, the table doesn't include the extra retention period if they detect an abuse. Additionally, if you click on thumbs up/down it may also be recorded for the provider's employee to review.

I don't think OpenWebUI, self hosted models, etc. would suffice if they are not built to the same quality as the first party products. I know I'm probably asking for something that doesn't exists here, but at least I hope it will bring to people's attention that even if you're paying for the product you might not get the same privacy protection as API users.

15 votes

Personalized software really is coming, but not today. Maybe tomorrow?

Article 733 words

3 comments

mattsayar.com

May 22, 2025

13 votes
Block AI scrapers with Anubis
- open source
Article 1617 words, published Jan 19 2025
25 comments

xeiaso.net

March 17, 2025

27 votes
The Long Context - Interactive fiction driven by an LLM

Article 5933 words

2 comments

thelongcontext.com

March 17, 2025

12 votes
Bartosz Milewski - Understanding Attention in LLMs
- programming
Article 973 words
0 comments

bartoszmilewski.com

March 8, 2025

6 votes

Funny output from Claude.ai: a codeblock in a question about an early rennaissance theological work

programming

Ask

I asked Claude for references to some ideas the ai had output earlier about a 16th century cardinal (Sadoleto). It's response was, Let me analyze the letter, and then this: async function...

I asked Claude for references to some ideas the ai had output earlier about a 16th century cardinal (Sadoleto). It's response was, Let me analyze the letter, and then this:

async function searchForSadoletoLetter() {
  try {
    const files = ['sadoleto_letter.txt', 'sadoleto.txt', 'sadoleto_to_geneva.txt', 'letter_to_geneva.txt'];
    
    for (const filename of files) {
      try {
        const content = await window.fs.readFile(filename, { encoding: 'utf8' });
        console.log(`Found file: ${filename}`);
        console.log("First 500 characters:");
        console.log(content.substring(0, 500));
        return content;
      } catch (err) {
        // File not found, try the next one
      }
    }
    console.log("Could not find any files containing Sadoleto's letter");
    return null;
  } catch (err) {
    console.error("Error searching for files:", err);
    return null;
  }
}

const content = await searchForSadoletoLetter();

// Since we don't have the letter available as a file, let's search for published versions online
console.log("The letter isn't available as a file. We would need to rely on scholarly knowledge about Sadoleto's letter to Geneva (1539).");
console.log("Sadoleto's letter was published in 'A Reformation Debate: Sadoleto's Letter to the Genevans and Calvin's Reply' (ed. John C. Olin).");

This strikes me as odd? Did Claude write a script to generate its own response? Have coders put something like this in as a gaurdrail?

edit: details about earlier convo

15 votes

Building a personal, private AI computer on a budget
- hardware
Article 2933 words
23 comments

ewintr.nl

February 9, 2025

24 votes
Task-Specific LLM Evals that Do & Don't Work

Article 6254 words, published Mar 31 2024

2 comments

eugeneyan.com

December 9, 2024

4 votes
"Mechanistic interpretability" for LLMs, explained

Article 3670 words

1 comment

Substack: Sean Trott

July 8, 2024

6 votes
Researchers describe how to tell if ChatGPT is confabulating

Article 522 words

5 comments

Ars Technica

June 21, 2024

24 votes
I will fucking piledrive you if you mention AI again

Article 4269 words

32 comments

mataroa.blog

June 19, 2024

119 votes
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Article

2 comments

arXiv

June 15, 2024

9 votes
"Badness 0", a suckerpinch/tom7 video dive into typesetting, LLMs, and Donald Knuth

Video 22:15

4 comments

YouTube: suckerpinch

June 7, 2024

29 votes