Activity

Votes

Comments

New

All activity

Showing only topics with the tag "language models.large". Back to normal view

Animals versus ghosts

~tech Article 1776 words, published Oct 1 2025

3 comments

bearblog.dev

13 hours ago

5 votes
GPT-5 has come a long way in mathematics

~tech Article 3904 words

24 comments

ritchot.me

November 23

21 votes
LLMs are bullshitters. But that doesn't mean they're not useful.

~tech Article 1919 words, published Nov 19 2025

3 comments

kagi.com

November 23

19 votes
Is trying to become an author insane in times of LLMs?

~tech Ask

A simple question. I know LLMs are currently not a replacement for authors. Will that remain true in 5 to 10 years? EDIT: No. I never expected to earn a living either mostly or exclusively by...

A simple question. I know LLMs are currently not a replacement for authors. Will that remain true in 5 to 10 years?

EDIT: No. I never expected to earn a living either mostly or exclusively by selling books. There are however many "side gigs" in my country that can greatly benefit from being published by a real company. Ultimately though, I'm not in it primarily for the money. But I wonder what the future holds for fiction as a whole.

16 comments

lou

November 16

21 votes
The worlds on fire. So lets just make AI porn.

~tech Article 5247 words

21 comments

itstoday.site

November 20

23 votes
Part of me wishes it wasn't true but: AI coding is legit

~tech Ask

I stay current on tech for both personal and professional reasons but I also really hate hype. As a result I've been skeptical of AI claims throughout the historic hype cycle we're currently in....

I stay current on tech for both personal and professional reasons but I also really hate hype. As a result I've been skeptical of AI claims throughout the historic hype cycle we're currently in. Note that I'm using AI here as shorthand for frontier LLMs.

So I'm sort of a late adopter when it comes to LLMs. At each new generation of models I've spent enough time playing with them to feel like I understand where the technology is and can speak about its viability for different applications. But I haven't really incorporated it into my own work/life in any serious way.

That changed recently when I decided to lean all the way in to agent assisted coding for a project after getting some impressive boilerplate out of one of the leading models (I don't remember which one). That AI can do a competent job on basic coding tasks like writing boilerplate code is nothing new, and that wasn't the part that impressed me. What impressed me was the process, especially the degree to which it modified its behavior in practical ways based on feedback. In previous tests it was a lot harder to get the model to go against patterns that featured heavily in the training data, and then get it to stay true to the new patterns for the rest of the session. That's not true anymore.

Long story short, add me to the long list of people whose minds have been blown by coding agents. You can find plenty of articles and posts about what that process looks like so I won't rehash all the details. I'll only say that the comparisons to having your own dedicated junior or intern who is at once highly educated and dumb are apt. Maybe an even better comparison would be to having a team of tireless, emotionless, junior developers willing to respond to your requests at warp speed 24/7 for the price of 1/100th of one developer. You need the team comparison to capture the speed.

You've probably read, or experienced, that AI is good at basic tasks, boilerplate, writing tests, finding bugs and so on. And that it gets progressively worse as things get more complicated and the LoCs start to stack up. That's all true but one part that has changed, in more recent models, is the definition of "basic".

The bit that's difficult to articulate, and I think leads to the "having a nearly free assistant" comparisons, is what it feels like to have AI as a coding companion. I'm not going to try to capture it here, I'll just say it's remarkable.

The usual caveats apply, if you rely on agents to do extensive coding, or handle complex problems, you'll end up regretting it unless you go over every line with a magnifying glass. They will cheerfully introduce subtle bugs that are hard to catch and harder to fix when you finally do stumble across them. And that's assuming they can do the thing you're asking then to do at all. Beyond the basics they still abjectly fail a lot of the time. They'll write humorously bad code, they'll break unrelated code for no apparent reason, they'll freak out and get stuck in loops (that one suprised me in 2025). We're still a long way from agents that can actually write software on their own, despite the hype.

But wow, it's liberating to have an assistant that can do 100's of basic tasks you'd rather not be distracted by, answer questions accurately and knowledgeably, scan and report clearly about code, find bugs you might have missed and otherwise soften the edges of countless engineering pain points. And brainstorming! A pseudo-intelligent partner with an incomprehensibly wide knowledge base and unparalled pattern matching abilities is guaranteed to surface things you wouldn't have considered.

AI coding agents are no joke.

I still agree with the perspectives of many skeptics. Execs and middle managers are still out of their minds when they convince themselves that they can fire 90% of their teams and just have a few seniors do all the work with AI. I will read gleefully about the failures of that strategy over the coming months and years. The failure of their short sightedness and the cost to their organizations won't make up for the human cost of their decisions, but at least there will be consequences.

When it comes to AI in general I have all the mixed feelings. As an artist, I feel the weight of what AI is doing, and will do, to creative work. As a human I'm concerned about AI becoming another tool to funnel ever more wealth to the top. I'm concerned about it ruining the livelihoods of huge swaths of people living in places where there aren't systems that can handle the load of taking care of them. Or aren't even really designed to try. There are a lot of legitimate dystopian outcomes to be worried about.

Despite all that, actually using the technology is pretty exciting, which is the ultimate point of this post: What's your experience? Are you using agents for coding in practical ways? What works and what doesn't? What's your setup? What does it feel like? What do you love/hate about it?

67 comments

post_below

November 16

50 votes
Former PM Katrín Jakobsdóttir has said the Icelandic language could be wiped out in as little as a generation due to the sweeping rise of AI and encroaching English language dominance

~humanities.languages Article 815 words

5 comments

The Guardian

November 16

18 votes
Duck Duck Go search AI curiously cited Tildes

~tech Ask

I was trying to find out why Lidarr wasn't matching my copy of The Cure's Greatest Hits. Found out I've got some bootleg Russian release that's catalogued on discogs (I eventually found the...

I was trying to find out why Lidarr wasn't matching my copy of The Cure's Greatest Hits. Found out I've got some bootleg Russian release that's catalogued on discogs (I eventually found the musicbrainz release and updated my profile to include bootlegs). So I search "Lidarr use specific discogs release" and the duck duck go search assist spat out some text about Lidarr not using discogs and cited this Tildes post.

It's curious because that post is 3yrs old and doesn't talk about discogs integration in Lidarr, just one mention of discogs in the post and some folks talking about Lidarr in the comments (It did cite a relevant GitHub issue about it though). The AI response mentioned that some users track new releases with Lidarr and downloads disabled, while covered in the post, it seems fairly tangential to my query.

I'm curious why it decided to check or cite a tildes post. No tildes posts came up in the first couple pages of search results. I use tildes from the same location, though on my phone where this query was on my desktop, and have done a couple DDG queries using "site:tildes.net" on my phone.

Has anyone else seen a search assist cite an unexpected site? Not unexpected as in irrelevant, that's all too common, but small and specific sources.

7 comments

Carrow

November 15

29 votes
How has AI positively impacted your life?

~tech Ask (survey)

I've been trying to get a more rounded understanding of the impacts that "AI" has had since ChatGPT went viral back in 2022. I've found it easy to gather a list of negative impacts, but have...

I've been trying to get a more rounded understanding of the impacts that "AI" has had since ChatGPT went viral back in 2022.

I've found it easy to gather a list of negative impacts, but have struggled to point to many positives.

I was curious if there were folks who have used any of these AI tools, and would willing to share any positive impacts those tools have had in their lives. I'm particularly interested in the text, audio, image, and video generation tools that have appeared since ChatGPT went viral, but please share anything else that you think fits.

80 comments

zoroa

November 10

50 votes
Researchers isolate memorization from problem-solving in AI neural networks

~tech Article 1447 words

0 comments

Ars Technica

November 12

12 votes
Anthropic to bring its AI to hundreds of teachers in Iceland with pilot scheme – aim of helping them with lesson planning, classroom materials, and administrative work

~tech Article 414 words

2 comments

euronews.com

November 6

7 votes
Signs of introspection in large language models

~tech Article 3454 words

17 comments

anthropic.com

October 31

28 votes
Who’s making these AI copies of my work?

~tech Video 18:43

3 comments

YouTube: Christophe

October 30

17 votes
If AI can diagnose patients, what are doctors for?
~health
- medicine
- healthcare
Article 5612 words, published Sep 22 2025
15 comments

The New Yorker

October 10

19 votes
AI slop is killing our channel
~tech
- social media
Video 12:14
8 comments

YouTube: Kurzgesagt – In a Nutshell

October 8

36 votes
Why do LLMs freak out over the seahorse emoji?

~tech Article 1542 words

32 comments

vgel.me

October 6

50 votes
Merriam-Webster has unveiled their latest and greatest LLM to date

~tech Link

10 comments

bsky.app

October 3

67 votes
Can AI tell if I'm writing AI slop? A machine learning journey

~comp Article 1708 words

3 comments

mattsayar.com

September 15

21 votes
Defeating nondeterminism in LLM inference

~comp Article 5706 words

2 comments

thinkingmachines.ai

September 10

15 votes
Why language models hallucinate

~tech Link

21 comments

openai.com

September 6

27 votes
An AI social coach is teaching empathy to people with autism

~health.mental Link

7 comments

stanford.edu

September 3

19 votes
Is it possible to easily finetune an LLM for free?

~tech Ask (advice)

so Google's AI Studio used to have an option to finetune gemini flash for free by simply uploading a csv file. but it seems they have removed that option, so I'm looking for something similar. I...

so Google's AI Studio used to have an option to finetune gemini flash for free by simply uploading a csv file. but it seems they have removed that option, so I'm looking for something similar. I know models can be finetuned on colab but the problem with that is it's way too complicated for me, I want something simpler. I think I know enough python to be able to prepare a dataset so that shouldn't be a problem.

7 comments

cuteFox

August 28

21 votes
Deep Think with Confidence

~tech Link

3 comments

jiaweizzhao.github.io

August 24

9 votes
AI tokens are getting more expensive

~tech Article 1988 words

3 comments

Substack: Ethan Ding

August 20

10 votes
Claude Opus 4 and 4.1 can now end a rare subset of conversations

~tech Article 504 words

17 comments

anthropic.com

August 15

15 votes
Social media probably can’t be fixed
~tech
- social media
- internet
Article 3458 words
34 comments

Ars Technica

August 13

38 votes
Evaluating GPT5's reasoning ability using the Only Connect game show

~tech Article 618 words

9 comments

ingram.tech

August 13

18 votes
Is chain-of-thought reasoning of LLMs a mirage? A data distribution lens.

~tech Article

8 comments

arXiv

August 10

28 votes
Reddit will block the Internet Archive
~tech
- internet
- social media
Article 682 words
26 comments

The Verge

August 11

58 votes
Question - how would you best explain how an LLM functions to someone who has never taken a statistics class?

~tech Ask

My understanding of how large language models work is rooted in my knowledge of statistics. However a significant number of people have never been to college and statistics is a required course...

My understanding of how large language models work is rooted in my knowledge of statistics. However a significant number of people have never been to college and statistics is a required course only for some degree programs.

How should chatgpt etc be explained to the public at large to avoid the worst problems that are emerging from widespread use?

39 comments

boxer_dogs_dance

August 9

37 votes
GPT 5 released

~comp Link

24 comments

openai.com

August 7

30 votes
AI industry horrified to face largest copyright class action ever certified

~tech Article 422 words

39 comments

Ars Technica

August 9

63 votes
The great LLM scrape
~tech
- internet
Article 766 words, published Mar 26 2025
4 comments

bearblog.dev

August 4

24 votes
Persona vectors: monitoring and controlling character traits in language models

~tech Article 1381 words

1 comment

anthropic.com

August 2

13 votes
Made a free VTT prototype

~games.tabletop Link

4 comments

lindqvist.dev

July 25

13 votes
Applying Chinese Wall Reverse Engineering to LLM Code Editing

~comp Article

1 comment

arXiv

July 22

8 votes
OpenAI can rehabilitate AI models that develop a “bad boy persona”

~tech Article 990 words

5 comments

MIT Technology Review

June 18

14 votes
The future of forums is lies, I guess
~tech
- internet
- social media
Article 1559 words
49 comments

aphyr.com

July 8

63 votes
No, of course I can! Refusal mechanisms can be exploited using harmless fine-tuning data.
~tech
- security
Article published Feb 14 2025
1 comment

arXiv

July 14

9 votes
AI coding tools make developers slower but they think they're faster, study finds

~tech Article 724 words

11 comments

theregister.com

July 13

40 votes
Pay up or stop scraping: Cloudflare program charges bots for each crawl
~tech
- internet
Article 319 words
13 comments

Ars Technica

July 1

46 votes
Cats confuse reasoning LLM: Query-agnostic adversarial triggers for reasoning models

~tech Article published Mar 4 2025

5 comments

arXiv

July 4

24 votes
'Positive review only': Researchers hide AI prompts in papers to influence automated review

~science Link

3 comments

nikkei.com

July 3

29 votes
TikTok is being flooded with racist AI videos generated by Google’s Veo 3
~tech
- google
- social media
Article 364 words
25 comments

Ars Technica

July 3

35 votes
Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task

~tech Article 875 words

22 comments

mit.edu

June 20

54 votes

User-friendly and privacy-friendly LLM experience?

~comp

privacy

Ask

I've been thinking perhaps I'll need to get one of the desktop LLM UI. I've been out of touch with the state of the art of end user LLM as I've been exclusively using it via API, but tech-y people...

I've been thinking perhaps I'll need to get one of the desktop LLM UI. I've been out of touch with the state of the art of end user LLM as I've been exclusively using it via API, but tech-y people (who are not developers) mostly talk about the end-user products that I lack the knowledge of.

Ethical problems aside, the problem with non-API usage is, even if you pay, I can't find one that have better privacy policy than API. And the problem with API version is that it is not as good as the completed apps unless you want to reinvent the wheel. The apps also may include ads in the future, while API technically cannot as it would affect some downstream usecases.

Provider	Data Retention (API)	Data Retention (Consumer)	UI-only features
ChatGPT Plus	30 days, no training	Training opt-out, 30 days for temp. chat, unknown retention otherwise	Voice, Canvas, Image generation in chat, screensharing, Mobile app
Google AI Pro	0	72 hours if you disable history, or up to 3 years and trained upon otherwise	Android assistant, Canvas, AI in Google Drive/Docs, RAG (NotebookLM), Podcast generation, Browser use (Mariner), Coding (Gemini CLI), Screensharing
Gemini in Google Workspace	See above	0-18 months, but no human review/training	See above
Claude Pro	30 days	Up to 2 years (no training without opt-in)	Coding, Artifact, Desktop app, RAG, MCP

As a dual use technology, the table doesn't include the extra retention period if they detect an abuse. Additionally, if you click on thumbs up/down it may also be recorded for the provider's employee to review.

I don't think OpenWebUI, self hosted models, etc. would suffice if they are not built to the same quality as the first party products. I know I'm probably asking for something that doesn't exists here, but at least I hope it will bring to people's attention that even if you're paying for the product you might not get the same privacy protection as API users.

15 votes

Echo Chamber: A context-poisoning jailbreak that bypasses LLM guardrails

~tech Article 1820 words

10 comments

neuraltrust.ai

June 24

34 votes
Is pop culture a form of "model collapse?"

~tech Ask
Disclaimer: I do not like LLMs. I am not going to fight you on if you say LLMs are shit. One of the things I find interesting about conversations on LLMs is when have a critique about them, and...

Disclaimer: I do not like LLMs. I am not going to fight you on if you say LLMs are shit.

One of the things I find interesting about conversations on LLMs is when have a critique about them, and someone says, "Well, it's no different than people." People are only as good as their training data, people misremember / misspeak / make mistakes all the time, people will listen to you and affirm you as you think terrible things. My thought is that not being reliably consistent is a verifiable issue for automation. Still, I think it's excellent food for thought.

I was looking for new music venues the other day. I happened upon several, and as I looked at their menu and layout, it occurred to me that I had eaten there before. Not there, but in my city, and in others. The Stylish-Expensive-Small-Plates-Record-Bar was an international phenomenon. And more than that, I couldn't help but shake that it was a perversion of the original, alluring concept-- to be in a somewhat secretive record bar in Tokyo where you'll be glared into the ground if you speak over the music.

It's not a bad idea. And what's wrong with evoking a good idea, especially if the similarity is just unintentional? Isn't it helpful to be able to signal to people that you're like-that-thing instead of having to explain to people how you're different? Still, the idea of going just made me assume it'd be not simply like something I had experienced before, but played out and "fake." We're not in Tokyo, and people do talk over the music. And even if they didn't, they have silverware and such clanging. It makes me wonder if this permutation is a lossy estimation of the original concept, just chewed up, spat out, slurped, regurgitated, and expensively funded.

other forms of conceptual perversion:
- Matters of Body Image - is it a sort of collapse when we go from wanting 'conventional beauty' to frankensteining features onto ourselves? Think fox eye surgeries, buccal fat removal, etc. Rather than wanting to be conventionally attractive, we aim for the related concept of looking like people who are famous.
- (still thinking)
13 comments

vingtcinqunvingtcinq

June 19

15 votes
Disney files landmark case against AI image generator

~tech Video 21:59

3 comments

YouTube: LegalEagle

June 21

16 votes
The Common Pile v0.1: An 8TB dataset of public domain and openly licensed text

~tech Article 432 words

19 comments

arXiv

June 10

26 votes