The truth about AI (specifically LLM powered AI)
The last couple of years have been a wild ride. The biggest parts of the conversation around AI for most of that time have been dominated by absurd levels of hype. To go along with the cringe levels of hype, a lot of people have felt the pain of dealing with the results of rushed and forced AI implementation.
As a result the pushback against AI is loud and passionate. A lot of people are pissed, for good reasons.
Because of that it would be understandable for people casually watching from a distance to get the impression that AI is mostly an investor fueled shitshow with very little real value.
The first part of the sentiment is true, it's definitely a shitshow. Big companies are FOMOing hard, everyone is shoehorning AI into everything they can in hopes of capturing some of that hype money. It feels like crypto, or Web 3.0. The result is a mess and we're nowhere near peak mess yet.
Meanwhile in software engineering the conversation is extremely polarized. There is a large, but shrinking, contingent of people who are absolutely sure that AI is something like a scam. It only looks like a valid tool and in reality it creates more problems than it solves. And until recently that was largely true. The reason that contingent is shrinking, though, is that the latest generation of SOTA models are an undeniable step change. Every day countless developers try using AI for something that it's actually good at and they have the, as yet nameless but novel, realization that "holy shit this changes everything". It's just like every other revolutionary tech tool, you have to know how to use it, and when not to use it.
The reason I bring up software engineering is that code is deterministic. You can objectively measure the results. The incredible language fluency of LLMs can't gloss over code issues. It either identified the bug or it didn't. It either wrote a thorough, valid test or it didn't. It's either good code or it isn't. And here's the thing: It is. Not automatically, or in all cases, and definitely not without careful management and scaffolding. But used well it is undeniably a game changing tool.
But it's not just game changing in software. As in software if it's used badly, or for the wrong things, it's more trouble than it's worth. But used well it's remarkable. I'll give you an example:
A friend was recently using AI to help create the necessary documents for a state government certification process for his business. If you've ever worked with government you've already imagined the mountain of forms, policies and other documentation that were required. I got involved because he ran into some issues getting the AI to deliver.
Going through his session the thing that blew my mind was how little prompting it took to get most of the way there. He essentially said "I need help with X application process for X certification" and then he pasted in a block of relevant requirements from the state. The LLM agent then immediately knew what to do, which documents would be required and which regulations were relevant. It then proceeded to run him through a short Q and A to get the necessary specifics for his business and then it just did it. The entire stack of required documentation was done in under an hour versus the days it would have taken him to do it himself. It didn't require detailed instructions or .md files or MCP servers or artifacts, it just did it.
And he's familiar with this process, he has the expertise to look at the resulting documents and say "yeah this is exactly what the state is looking for". It's not surprising that the model had a lot of government documentation in its training data, it shouldn't even really be mind blowing at this point how effective it was, but it blew my mind anyway. Probably because not having to deal with boring, repetitive paperwork is a miraculous thing from my perspective.
This kind of win is now available in a lot of areas of work and business. It's not hype, it's objectively verifiable utility.
This is not to say that it's not still a mess. I could write an overly long essay on the dangers of AI in software, business and to society at large. We thought social media was bad, that the digital revolution happened too fast for society to adapt... AI is a whole new category of problematic. One that's happening far faster than anything else has. There's no precedent.
But my public service message is this: Don't let the passionate hatred of AI give you the wrong idea: There is real value there. I don't mean this is a FOMO way, you don't have to "use AI or get left behind". The truth is that 6 months from now the combination of new generations of models and improved tooling, scaffolding and workflows will likely make the current iteration of AI look quaint by comparison. There's no rush to figure out a technology that's advancing and changing this quickly because most of what you learn right now will be about solving problems that will be solved by default in the near future.
That being said, AI is the biggest technological leap since the beginning of the public, consumer facing, internet. And I was there for that. Like the internet it will prove to be both good and bad, corporate consolidation will make the bad worse. And, like the internet, the people who are saying it's not revolutionary are going to look silly in the context of history.
I say this from the perspective of someone who has spent the past year casually (and in recent months intensively) learning how to use AI in practical ways, with quantifiable results, both in my own projects and to help other people solve problems in various domains. If I were to distill my career into one concept, it would be: solving problems. So I feel like I'm in a position to speak about problem solving technology with expertise. If you have a use for LLM powered AI, you'll be surprised how useful it is.
If there were no negative effects to generative AI I still have never been able to be told, not by my university's "AI guy", nor by anyone on the internet, nor by the "power users" exactly how what they keep selling our VPs and Board of Trustees is useful for me improving on how I'm doing my job.
It's not faster to ask it to write an email than to write one, or use a template. And replacing my words with Copilot's words would remove the whole human connection thing I'm offering the students. I can't and won't input student information into it. The one time I gave it a shot on a project it did ok on a list but immediately fucked up the little graphic it made from said list and I still would have to review the output and rewrite large chunks of it to use any of the material.
I get told "it helps you do the parts of your job you like, more." It does not. Or at least it hasn't.
I heard (they presented on it, and I got it second hand from an attendee) one university was creating a centralized phone number where all the depts input their common FAQs into the AI and then two people could answer the phone calls by asking the AI chat bot for the answers. They bragged about firing people, I mean, saving on labor costs due to this. Meanwhile they took a good idea - one central number and not bouncing callers around - and removed any critical thinking or institutional expertise from the process. Now a student gets told "no" regardless of unusual circumstances, because the people answering the phone don't actually know anything about these departments.
I'm a hater, but I've also genuinely tried to find any sensible usecase. And that ignores all the flat out negatives. Even if I grant it's great for coding - idk, nowhere near my industry but still seems bad for the future of entry level jobs - I don't see any evidence it's more than a nuisance at absolute best.
And that doesn't get into the whole copyright infringement/theft from authors (and literally everyone else on the internet)
I think it's worthwhile separating out the two things. OP is arguing LLMs will change everything. You are arguing this is bad. You can both be right.
Yes, it's bad. OpenAI broke laws. AI will likely result in significant loss of jobs. AI encourages less human empathy, and will fill the virtual world with slop.
But yes it is unavoidable, it is going to change things in ways we probably can't conceive right now, because barely three years in it actually helps real people solve real problems. e.g. I order my pizza from my favorite mom and pop store with LLM powered AI. Raymond no longer needs to pick up the phone to talk to me. I still chat with him when I go to pick it up, but I can see that ending soon as well.
We have seen all of this with computers/the internet/ smart phones. Before computers were used in work life, you had to communicate with the person IRL. My Grandmother was a secretary, because she could dictate and transcribe and type. Most of those jobs went away. Nowadays people send an email or a slack... or if they are feeling particularly empathetic, they will schedule a zoom even though you are already double booked. Before computers, the news you read was written by an actual reporter, and either put in a newspaper or put on a news show on TV. Nowdays the news is written by strangers and is at the top of our feed because it incites rage and that is good for clicks.
Keep hating on it. But keep experimenting with it.
I disagree that it's unavoidable. We could make a different choice as a society. We could regulate this shit.
But I also disagree with your summarization of our points. To my mind, OP was saying that they're useful, more so now than ever before.
I'm saying they aren't, especially outside of the programming bubble.
I have nothing to keep experimenting with. I'm absolutely out of ideas and have zero interest in pushing my ethical bounds beyond this point. I was honestly curious if folks had actual ideas but what good does fucking around with something useless and unethical do.
You say "helps real people with real problems " but human contact while picking up my pizza isn't a problem... and either the restaurant has an online system or .... The AI is ordering by phone for you? Why not just use the non-AI online ordering system? Or talk to a human in the time it takes to order by AI? Like, that's not a real problem unless your local pizza guy really hates talking to real people. (My local pizza place has online ordering, and no AI). I really don't get how this demonstrates anything. It's really indicative of how my experiences with AI advocates has gone tbh.
Also, my experience of what you're describing as "what happens today" in an office isn't accurate. I have a support staff person, she's not a secretary, and yet she makes my job easier, I helped her get a pay raise even though I'm not her direct supervisor. As a team and department, we talk in person all the time, empathetically. We don't put that load on a secretary? Nor has it gone away here. I'm bringing homemade cookies as presents for my team tomorrow. I got a Christmas card from my coworker Friday. I send a teams message to check in one someone. Or call them. And then we meet up if they need to talk. Like this is just normal.
Have folks just forgotten how to be human in these other industries? Because I've worked for the sort of shitty company John Oliver mentions on his show and yeah, I expect that out of a private prison company. I guess tech?
But this highlights why people keep swearing it's useful to people like me and when I say it's not...I'm not bullshitting.
I respect you, and your opinion.
Let me clarify a few things?
I never said it was useful to you right now. Eventually, in a decade or two, I think it will be so useful it will be unavoidable.
When I say something is unavoidable, I am not trying to tell you to stop hating on it, or to give up your fight. In fact I ended my statement telling you to keep hating.
But I am challenging you on your world view, so this is where I get a little more disagreeable and provocative. Keep hating.. I just don't see you winning your fight. Much like I don't see anyone winning a fight against our increased consumption of non renewable resources or carbon emissions driving global warming. But don't get me wrong. Don't let me being a Debbie Downer (sorry Debbie!) Don't let me discourage you from hating all of these terrible things.
To continue with my provocative stance. Frankly, it is not your place or my place to tell my local pizza store how to run their business or if they should or should not use LLM based technology. I initially missed the human interaction of talking to Raymond to order my pizza. But he has made his choice. People ordering pizza's from him via Slice or Uber Eats (the online ordering system you seem to prefer) means Slice or whatever takes a slice from his profits, because they are the ones to collect payment. Currently, the automated phone AI bot simply sends him a transcript of my order, and when I pay him, he gets all my money. And to be clear, AI does a better job of taking my order. It does a better job of understanding my accent than most people. And it always remembers what I ordered last time (creepy but useful) and I am largely a creature of habit. But again. It's his business. Not ours. For Raymond, what LLMs provide is already better. Hate it all you want, it doesn't change facts.
Even more provocative, still. You have already given up the fight. You just don't want to admit it. Computers have already killed jobs and automated decisions and removed empathy from the equation. It just happened so slowly you didn't realize you were the frog getting boiled. We used to have to pick up a phone to ask to meet someone, then go meet them to get stuff done. With the advent of computers we have seen a huge loss of empathy in the business world. I no longer need to visit the local bank teller to deposit my check. I can't talk to someone even if I want to in order to complain about the quality of the goods I just purchased. Computers have already reduced the needs for off all sorts of jobs and severely reduced the empathy we all feel for each other. Initially, they didn't seem that useful either. You could use a computer, but it was hard, and inefficient, and most people just chose not too. But eventually computers found a way to be more useful than not, more addictive than not, and now everyone has a little computer in their pocket, and now when I am in a low end restaurant, half the people will be staring at their digital devices rather than each other. We are already turning into the fat people at the beginning of Wall-E. LLMs are just one more step in getting us there.
Now, you might think that computers somehow made things better as well as making things worse. And this is the whole point of this reply. I think my fundamental disagreement with you is that LLMs already make things better as well as worse, and that each person will rationally choose to use an LLM because it makes their life a little better, while we continue to boil the frog and make things like the environment worse.
At this point regulating AI is about as easy as deciding, as a society, to dismantle all nuclear weapons.
It only works if EVERYONE does it - and that's not happening, sadly.
I personally think comparing regulation to complete disarmament is hyperbolic. It doesn't need to go away, it just needs people to pass laws. I bet it happens in other countries even.
EU would definitely pass these regulations, even if it means shooting themselves in both legs.
Russia won't, India won't, USA won't because basically all of the major AI subscription services are American.
China ... they're betting heavily on a "here's an open model, run it yourself" -apprroach. Mostly to annoy American companies who are subscription based and it's a lot harder to regulate what people do with their own models with their own hardware.
I don't think regulating products is shooting oneself in the legs. Like the idea that there's zero regulation or nuclear disarmament ignores how regulated nuclear weapons are.
We could regulate it. We probably won't. But we could. And other countries regulate things like privacy and data collection even though we don't. Those are good things! We should want that.
There's of course no way to answer that without a lot more information but you're likely absolutely right that what you're being offered isn't something that works for your use case. It might even be a crappy model. Eventually there will be more people who have expertise in making LLMs work in individual cases. Whether or not one of those people will ever end up on your University's staff is something else entirely. "AI guy" probably got the title by default but maybe he'll skill up over time.
I'm with you on the writing. AI is fine for getting something on the page if you're blocked but it's soulless from the perspective of anyone who cares about writing or communication. Not being able to input student info probably kills all the best use cases for you. Perhaps eventually privacy guarantees will improve to the point that changes. I can imagine it being TA level competent at grading for example, especially with prior art as guidance. You could probably redact names for that sort of usage. I could also see it being useful for lesson planning but only after it had ingested a bunch of your previous lesson plans so it didn't default to a generic style from training.
And research, AI's are really really good for initial research on most subjects.
But also if you don't need it, fuck it, not a major loss and maybe future models will be more useful.
I'll just say that if I were a student, I would have been told over and over and over that getting an LLM to do my homework is cheating, unethical, and screwing myself out of an education. If I then found out the university I was paying a lot of money to attend was feeding the work I slaved over for weeks into an LLM to be graded, it wouldn't only be hypocritical. It would feel like an absolute slap in the face.
You're asking students to do quite a lot of work to pass a class while taking quite a bit of their money. You should at least have a pair of human eyeballs look at it.
And the students are aware. The education majors in my class - I teach RAs, not any other type of class - are livid at their classmates using AI when they're supposed to be learning to teach others. And the ones that have tried to use it are pretty aware of its limitations. Most found it the most useful for flashcard/study question sorts of things.
We're allegedly teaching them how to use AI appropriately but no one actually knows how to do that.
(And hiring fewer TAs isn't good systemically for grad students and the field in general)
I imagine the way this is going to play out is that there will be some kind of AI-based tutoring system, less homework, and if you want more human interaction, maybe go to office hours?
Said AI guy seemed knowledgeable, he's not a random person from the institution but someone in our division's IT dept; he met with us, and he was used to those of us with concerns. We're being sold copilot, primarily, though some people have the more powerful version than others right now. He just doesn't know our work, and he didn't bullshit us.
l am basically a social worker (and supervisor of others) for students that live on campus. Most of my work is in reviewing reports that people are concerned about, doing assessment and data analysis, supervision and following up with students myself. There's a smattering of crisis response in there too. While I teach, technically, it's a zero credit class and if it's ethical to put other people's work into an AI like that, it'd still take me longer than grading in our course management system
(I'd have to download all the assignments and input them I assume. Plus even the students are pretty fed up with "here's our new AI offerings and also here's how you get in trouble for using AI in homework... Good luck!" as messaging. And that's where the ethical part comes back in.) But also, if I'm not really reading their work and grading it, let's say it's a "real" class, I don't understand their competence level. And if it's a big class I'd have that TA anyway.
Even benchmarking research against our comparable institutions would require me to double check so much of the work I'd basically be doing it a second time based on my recent attempts. If they build a FERPA compliant model we can use student data on, I still don't want my students mental health crises, pregnancy, immigration status, etc being reviewed by something that can't understand nuance, and to what end? I'm still going to read every case, and my data reporting has been streamlined already.
Unfortunately I can't just be like "fuck it" because Microsoft and the AI industry has done the same thing they do basically everywhere and sold our higher ups on this product we're going to spend a lot of money on and they want to know how we're using it. The pressure to use the thing that isn't useful to me is new in the decade of my being here. And it's incredibly annoying. I just feel like the talk about AI here can get very siloed into talking about it from a tech lens but that's a small fraction of the workforce. My field is just one, very annoyed, part of the rest of the world. (I don't generally need a rubber duck, and if I do in a non-coding sense I have colleagues and we like to talk to each other!)
That sucks. Even if they're just kinda sorta mandating it, I can see why you're frustrated. In my opinion that's totally the wrong way to do it, A better way is to make the tools available to people in the org who actually want them, give them lots of support, and let them figure out what works. Then let it spread organically from there.
That's a perfect example of the hype causing bad management and executive decisions. Forcing people to use new tech just for the sake of it makes about as much sense as forcing the IT guy to do social work. There's a reality distortion field around AI right now.
If it was me I wouldn't have any ethical problem with just pretending to use it, especially at first. If they're monitoring usage, maybe run it in the background on throwaway tasks to keep token usage passable and then learn it at your own pace when you actually feel inspired to.
Your ethics might be different than mine though, if you actually do want to figure out how to use it I'll throw out some suggestions and hopefully there will be something useful in them.
I can only guess, but generally speaking IT people aren't engineers. They're more like tech support with a lot of experience in networking and maybe some software certifications. And they don't usually get paid enough. Doesn't mean AI guy isn't good with AI, maybe he is, but also odds are he's still figuring it out just like everyone else.
Either way I'm guessing he doesn't have the time to sit down with people one and one and help them figure out what to use it for, how to use it and set them up with systems to support their workflow. But that's what should happen.
For purposes of throwing out ideas I'll start with this even though it may not be particularly relevant your work:
You mentioned Copilot, the upside there is that they have strong data security and privacy policies and even stronger motivation to follow through on those policies. Enterprise customers would flee in droves if they ever got caught doing anything questionable. Microsoft doesn't have their own frontier AI models, instead they use models from Open AI and Anthropic. That means you could be getting GPT 5+ but you could also be getting budget models. Some people report that they get worse performance from Copilot than from the model provider websites. Likely because Copilot 365 often uses a router that chooses the model based on the prompt and they're motivated to route to cheaper models whenever possible.
That might explain why you've gotten less than stellar results in your testing.
Back to assignments, there are a lot of ifs... If the assignments are in a digital format and if you have access to a newer agentic1 model and if that model has access to the filesystem where the assignments are stored then no you wouldn't have to do anything more than tell it which assignment group to grade and it could do the rest on its own. That's if you had an AGENTS.md (or similar) that told it how to behave as an assignment grader. Or alternatively you had a dedicated agent that was instructed to be a specialized assignment grader or TA. If not you'd have to do a lot more prompting, which is doable with copy/paste once you have a system figured out.
Speaking of instructions they could include things like (paraphrased) "grade all the simple stuff and then give me a digest of the long form answers with student/test IDs so that I can grade those myself, and then populate the tests with my grading and notes once I'm finished". But then you might already have a system that does some version of grading the easy stuff automatically.
All of the above is easy to do if you already understand how it all works and can guarantee a newer model with tool use capabilities. But that's a lot of ifs. That's where an AI guy to sit down with you and get it all working would be nice.
1 Agentic essentially means a model that has been trained on tool use (web search, filesystem access, scripting, etc..)
There's a pretty good chance you can find some solid uses for AI in this realm. And also more ifs. A newer agentic model can do all sorts of fun things with a spreadsheet, database, table, etc.. It can convert between data formats, extract structured data from any sort of file, find patterns and associations in unrelated datasets, search datasets very effectively, move fields/columns, write and run formulas, normalize data, enrich data with information from other sources, cross reference, do relevant web searches, do research based on a dataset, merge datasets based on your criteria, summarize datasets of almost any size, etc.. Even if the data in question is just a list, or a collection of discrete records, it can do useful things.
Another useful agent feature is the ability to output in a variety of formats. You could turn a text file into a formatted word document, or a webpage, or a PDF, for example.
In addition modern agents are really good at helping with automation by one shotting simple scripts, macros or batch files for you. They can also tell you where to put them and how to use them. If they're running with direct access to your system they can also directly do pretty much any task on your PC that you can do. For better and worse, no doubt you've already been briefed on safety and security.
The biggest if is whether or not you can reliably get access to a new enough model. Older tool using models can do many of the same things, just not nearly as well.
Something to keep in mind, it usually takes some trial and error to get reliable, useful results. There are quirks to figure out, prompting strategies to dial in. Possibly all things you have little interest in doing, but as with all tech, patience and persistence are a big help. The silver lining is that once you get things dialed in it gets a lot easier.
You could probably get a lot of good answers about making specific use cases work, or prompting strategies here on Tildes, if I saw the thread I'd be happy to help. Feel free to msg me as well. But also I still advocate for the pretending strategy! Your work is more important than whatever limited benefit you might get from AI. Six months or a year from now that might be less true.
Last thought, if you're able to request a certain tier of access (something that gets you newer models more often), it's worth considering.
The problem many companies are facing is that "client satisfaction" is hard to measure. They can measure how many calls answered a day, how quickly the client hung up, how quickly the AI agent responded per query: wow these all went straight up with AI! What they don't see is the client getting increasingly frustrated and, if they have a choice, leaving not just the interaction but choosing a different company altogether.
They're trying desperately to replace all of our jobs. The guy making these decisions can't get a raise from customer satisfaction, but they sure can by using AI plus other skewed metrics. At minimal the guy can use these skewed metrics to whip the remaining humans into working harder, faster and more scared for their jobs.
The irony is that these are universities. They don't need to measure metrics like that. And thankfully my job isn't the sort on the line from that. But I still feel pressured to find some way - and I worry in the future I'll be asked to measure my team's metrics in some AI comparable way.
Randomized controlled trial on AI coding tools speed up (predicted v. observed) (METR, July 2025)
The LLMentalist Effect: How chat-based Large Language Models replicate the mechanisms of a psychic’s con (Baldur Bjarnason, July 2023)
Color me skeptical.
Fundamentally, though, I've realized I don't actually care too much about whether coding AI works or not. Even if it is helpful, I still feel very strongly that the problems surrounding it precludes ethical use of it. I'd list out those problems, but I've been talking about AI use so much in the past 24 hours that I'm sounding like a broken record. And besides, you've heard it before.
Suffice to say that I'll use AI when it stops ruining lives.
You're right I've read all the takes, or at least all the popular ones. The METR study is much beloved in AI coding conversations. It notably used Claude 3.5/3.7 Sonnet, which at this point are ancient in LLM years. The current generation of models are in another category entirely. It's also a small sample size and they didn't select for experienced LLM users. Instead they randomly assigned the LLM/non LLM groups which is standard procedure in many studies but totally the wrong choice here. A proper comparison would be between equally experienced groups of developers where the LLM group had significant experience using LLM coding tools. There is a vast difference between effectiveness out of the box and after some time to develop strategies, tooling and general understanding of model strengths and weaknesses. It's the sort of study you do when you don't really understand what you're studying yet. No judgement though, it was a study worth doing. We're still working on figuring out what it all looks like.
Regarding your second link, yes absolutely. I think that effect is more pronounced in a chat environment. LLMs are super validating, insightful in the way that comprehensive language knowledge can create the illusion of insight and sycophantic as hell. That combination is compelling and can create trust very quickly. The intent may not be to be manipulative, but it's a recipe for manipulation nonetheless. The effect that is likely to have on the vulnerable is very concerning.
Where coding is concerned, LLMs were nearly useless for non trivial coding tasks in 2023 when that was written.
As far as ethics goes, I'm with you. There's tension between how useful these tools are and the question of whether they should exist at all. If it was somehow up to me I might choose to erase them. My biggest concern is the environmental impact, no so much directly, but the effect it seems to be having on the speed of decarbonization. The ethics of training aren't great either, and the control by giant companies is another chapter, possibly a new volume, in the story of capitalist dystopia.
But it's not up to me, and after seeing what they can do, I'd put the odds of huge amounts of people not using somewhere around zero.
Yes. And this is subjective, but the difference I’m noticing between AI fanatics vs abstentionists is active empathy.
I'm not sure entirely how to phrase this to still sound sympathetic to your points. That is important to me, because I am sympathetic. I am just concerned that we are at risk of loosing the forest for the trees, so to speak. Those who are "refusing" to knowingly use AI seem more like people in 1900 "refusing" to use electricity because it was putting lamp lighters out of work. On the one hand, I get it. It is empathetic to feel bad for people who are loosing their way of life. On the other hand, those people walking a different route home to avoid "using" public electric lights does absolutely nothing for the lamp lighters. The electric lights are still there, they are still being built, they are still being bought and installed by the people in power.
AI is being used by major corporations, and it is seeping in to our day to day life. Ads are being designed by AI, products are sold in stores that are prints of AI art. While we all think we can spot them a mile away, I'm not entirely sure. AI is being used in photo editing and video editing, in article writing and post writing. The only way to avoid it is to not engage with the world we live in. I think it is far more useful to make yourself familiar with the AI tools that are available out there, so you can be more educated about their abilities, and more weary of their pitfalls.
The world is so big it may as well be infinite. I have no shortage of world to engage with that’s absent of LLMs. I’m not missing out on anything by not using LLMs either.
No doubt you and others find LLMs useful, but they’re really not essential to daily life in any way.
I know there's real value here, a genuine technological leap.
But I'm being told to dive into this slimey crate of rotting veg, find the 5 good potatoes, make sure they're scrubbed very clean, serve them to 100 kids, and explain to them why there's no budget for 100 potatoes, because all that money got spent on crates of slop.
With the way companies have overspent on AI hype, there's not going to be money for actual potatoes for a very, very long time. Meanwhile, the mountain of rot just keeps getting bigger, and more kids are being told to go without.
Just as an aside, that was a lucid, if depressing, illustration. No notes.
Honestly, I really hope so. I think you touched on a lot of feelings I have towards it, but as a general skeptic about humanity, capitalism, and the current trends, I very much still do not feel as optimistic. I feel it's entering the "religion" realm - where it could be good for the masses, but the ones running it just use it for their own benefit and progression.
This deserves its own thread, I intentionally didn't dive into it in my post but yeah... there's a lot.
They absolutely are, but this is not a property of the technology or its progression, but of human nature.
I've been using LLMs for software work and staying on top of industry news for a while, and the impression I've gotten is that these different vendors are fungible, in many cases with as little effort as changing a single line of code.
So I think that's why they're pushing so hard on the "AI could end the world" angle, since they know that without government granting them a monopoly, they're in a low-margin industry closer to corn farming, than to Salesforce's (for example) high-margin, high switching cost position.
I think that's a bit too cynical. A lot of people, rightly or wrongly, are genuinely concerned about advanced AI being dangerous and they often work at AI companies. Concern about this predates ChatGPT and is a large part of why OpenAI was founded.
I think sometimes it can be used in a cynical way, but it's not entirely cynical.
Also, there are several strong competitors, switching is currently pretty easy, and I see no prospects for AI becoming a monopoly any time soon.
It might become a more regulated industry and that will be messy, but as with the airlines and medicine, this isn't necessarily a bad thing.
I finally started using it about a year and a half ago because I realized, whether or not it is a good idea, I at least needed to engage with it and learn its limitations if I was going to stay relevant.
I'd temper this by saying that my reaction is more like, "holy shit, it was surprisingly good in this one interaction." I find its behavior very inconsistent/unstable, and something that worked once won't necessarily work the second time, or you can even get away from the good behavior later in the same session.
The place where it really shines for me is in breadth tasks - if I need to delve into yet another AWS service that's brand new to me, I can generate a list of 5-10 requirements asking the AI to build the module, and I'll get a reasonable starting place to
However, I know that the output will be architecturally bankrupt, so I'll need to restructure and refactor the code to make it reasonably maintainable. Sometimes I do this by prompting, sometimes manually, depending on how bad it is and how the AI seems to be feeling that session. I think architecture/big picture is where humans are going to continue to shine, because we can process and hold patterns that far exceed the AI context windows.
[1] - Except when it hallucinates a method that doesn't exist in the library at all.
The other win using AI is the auto complete picking up patterns in rote tasks. If I have to go through and change all my
camelCasetosnake_case, it will pick up on that pattern and start auto-suggesting it.This should be the actual headline to all AI use. If and only if you can validate the outputs, feel free to use AI. Once you can't, you're out on a limb, sawing between you and the tree.
In my experience that comes down to prompting and workflow. I get pretty consistent results but it took a lot of trial and error to get to that point. Provided of course that the task in question is close enough to the center of the training distribution. If it's not then no amount of .md files, skills, commands or other prompting strategies will get results better or faster than just doing it yourself.
The declining performance later in sessions is definitely a thing, the more context there is, the harder it is for the current models to focus clearly. That's basically your sign that the session is over, start fresh.
Definitely, in both coding and elsewhere, it's remarkable to be able to skip a bunch of the early steps and get right to the good bits. For me it alleviates a big part of the friction in getting started on a task.
You're right, I feel like I should have said that more clearly in my OP. If you don't know enough about the topic to validate then the safe thing to do is assume everything is a hallucination.
I do have an implicit bias against LLMs and big tech. I think my last three back to back posts are stating my distrust of it and it has little to do with direct outcomes or processes within it, but rather the bad business around it. If I'm going to be expected to transform my my entire business/lifestyle, I'm going to be doing some risk and dependency management because big tech has burnt me a dozen too many times to have the benefits of the doubt.
To highlight just one of the many concerns:
Wnat is the true costing I should expect because flat rates do not seem sensible and sustainable?
I know from testing that there is linear escalation of token spend in simple analytical data models and exponential burn with LLMs. The new advanced models that lean on multi-model architectures and AI agents just multiply that token spend. LLMs are stateless because the models have to adjust for prior token generation in the context of a new prompt. Over the course of multiple projects, what does it look like.
I've been looking at peoples Cursor Wrapped reports and there are billions of tokens used across multiple models per user per year. Its still $40-60 per user per month. $200 for power users but that's still a steal. (The docs detail $1.25/mil input/cache, $6/mil out, $0.25/mil cache read)
Something is fishy here and the other shoe has to drop eventually.
And I need to know how much is this efficiency gain worth to contractors and companies. If this is a scaling cost, what happens when you breach the break even point. Especially if the manager feels that for half the cost, they could just get an AI agent and learn to do it themselves.
Theres also the conflict of interest where the AI service provider is likely going to charge per token burn and I've seen how companies incentivize scaling up over efficiency.
It becomes hard to see this as a great technology leap when everything about it looks like a golden noose.
Guaranteed. Current pricing is unsustainable. However the other shoe is still up in the air. Improvements in hardware, models designed to use less compute with comparable outputs, completely new architecture that handles context differently and solves the scaling problem... hard to say what will happen. Also open weight models, assuming they keep up and continue to follow just a little bit behind frontier models, will serve as a fallback if they try to jack up prices too far.
But enshittification in some form is inevitable so you're right to be cautious. I'd add that currently the competition is pretty fierce and there are enough players to keep prices down. It may still be a long way off. You'll know the enshittification is on the horizon when collapses, mergers and acquisitions start happening.
Either that or the bottom falls out of the tech stock market, the whole industry slows down, and they're forced to raise prices quickly. Anyone's guess how likely that is.
Speculation on the future of the tech is valid but it does feel a little moot at this point given the scaling limits. (There was an updated assessment on the diminishing returns if you don't mind 50 pages of wizard math). Deepseek broke a barrier with MoE architecture, but I think that there is a foundational limit to LLMs by simple virtue of the fact that it's based on Language, not Data.
This is personal philosophy, but it often feels like information in an LLM is going to be polluted eventually because the model can't seperate a data point from the representation of that data, its linguistic functions and all the associated "baggage". The mess will initially be be sqashed with context and systems prompts, but all that noise will eventually drown out even the most recent instructions and it may even have subtle influence on outputs that even seem correct.
But regardless of how things develop, we can't ignore the fact that there has already been almost a trillion dollars in capital expenditure. Funds allocated, prior to sufficient infrastructure being made available, let alone user demand. I think it's why they keep trying to scale compute, data and training time in spite of the scaling limits that they identified. Spend that can't be recouped if there is some major development in the next few years.
The AI industry can not risk a more efficient GPU or model going public because that would instantly tank the value of Blackwell GPUs and the massive cost to train the latest models. I suspect it's the reason openAI sounded a code red when Gemini ran so well on TPUs and not cuda hardware.
That would fix a few of the problems I have with LLM coding assistants. Do you have some links that give ab overview?
I was speculating about what could happen, rather than talking about what already exists. However some strategies that already exist are:
Not sure about "current pricing is unsustainable." How much do we really know about that? Maybe some AI companies are selling at below cost, but I don't think we can tell from the outside when Google is (for example). They have a lot of engineers working on lowering their costs. (They also built their business on giving services away for free and making up for it with advertising.)
It is true that AI companies can't lose money forever, so something has to change. I can gesture vaguely in the direction of a shakeout, but don't see how anyone can predict how it will play out.
Amazon lost money for many years, but it worked out for them in the end. Other companies might not be so lucky.
It seems like we have one of these LLM discussion posts roughly once a month. Pretty tired of the discourse personally
Speaking for myself, I don't read every thread on Tildes, or even most of them.
I think there's a really funny way LLMs are treated with the highest level of scrutiny instead of comparing them to what was there before. I can now near instantly get ideas for office party activities, a large draft of an announcement or a statement, format things in whatever way I want instantly, ask for help with text or code with clarifications and corrections, most importantly all of this is completely customizable by whatever it is I want, it's generalized to the extreme. Does it matter then that it is not a literally perfect answer, ready to be copypasted? No, because a) I can just ask to change things and b) before there was no such thing whatsoever, other than googling and hopefully landing something that fits your use case (frequently also flawed) or I would have to convince or hire other people to help out, again with all of the downsides of working with another person.
Thankfully, beyond online discussions of hatred and obsession in real world practical applications are mostly what matters.
I agree completely. The only thing more shocking than how good today’s models are, is how quickly people are to dismiss them.
That said, I think a lot of these tools are hamstrung by context window size and their inability to maintain focus on many parts of a complex system at once. That’s especially evident when working with code… the tools can produce huge amounts of standalone code from scratch, but are not nearly as good at updating existing codebases. They don’t understand how the existing modules fit together, or how the APIs work. They don’t understand the libraries. They don’t know your coding conventions.
I imagine in the future we’ll have workflows with long chains of checking and double-checking. Reasoning models that always ask themselves things like…
And so on. Probably repeating this whole checklist multiple times, defensively asking “are you sure??” frequently along the way, refusing to consider a task done until every box is ticked. I’m thinking of an extremely self-critical, fastidious agentic system that is constantly reading local files and looking up information on the web, reading docs, validating assumptions, disproving hallucinations, etc. A chain of events that would be painfully slow and inefficient today but, if all these huge capital investments in GPUs and data centers are worth anything, maybe won’t be in a couple years.
The real pain point for me in all LLMs right now is “drift.” You meticulously prompt your way to a place where it finally seems like it’s got a good understanding of the problem domain. Then you continue working with it and… oops, it forgot some tiny detail you mentioned in the beginning. That’s the first crack that begins to show, but a few prompts later you realize it’s completely lost the plot. Everything you typed 5 prompts ago is dust in the wind. Extremely frustrating. But it feels like a temporary problem that will be resolved eventually. Hopefully without boiling the oceans in the process.
Yeah keeping the oceans would nice. I vote for not boiling the oceans.
Context windows are a hard problem because the compute doesn't scale linearly with context size, it scales quadratically. It's the current achilles heel of LLMs for certain kinds of tasks, and dealing with large, deeply interrelated code bases is one of those.
No doubt they'll figure out how to solve it, but there's no path to a solution in the current generation of models, and maybe not the next.
In the meantime, it's useful to break tasks down into smaller steps that are as self contained as possible.
It sort of pushes towards having more modular codebases that are less confusing for the AIs. Hopefully advancements reduce that push, I don't love the idea of AI limitations starting to dictate architecture.
Lots of your issues with context drift will be solved with larger context windows in the future, but you can solve it now with RAG AI systems.
RAG stands for retrieval augmented generation and the gist of it is that instead of purely generating responses based on your current prompt and past conversation history, the AI will look to your source documents for concrete information.
Things like ChatGPT will do this currently if you upload documents with your prompt. You can probably have whatever model you’re using generate a summary of facts about what you’re working on occasionally and use that as source document to realign the AI if it starts to get off track. This would probably also help if you decided to move to a different model or chat window mid project.
Haven't tried it but the beads issue tracker sounds promising for keeping agents on track.
On the subject of AI FOMO, there definitely should be a sense of urgency to start using it. Early adopters will have a significant edge, however short lived, over AI haters.
I don’t mean you should force yourself to use AI for tasks that it isn’t good at, but keeping an open mind and occasionally checking whether AI is now helpful in your workflow will help you out a ton.
For example, everyone knows AI can be used to increase your job application submission rates, so companies try to add sneaky prompts into job listings to catch it. If you were an early adopter of AI resume and cover letter generation, you would have had a competitive edge over other job seekers for that short period of time.
I don’t have any other examples like job applications, but they are out there waiting for someone to discover them. If you’re able to discover a use case first, you’ll get to benefit until others catch on.
I don't think the edge of early adopters is going to be short-lived. With trivial tasks and automation, obvious use cases, sure, but not where it'll matter with time.
The limiting factor is and will always be that to prompt AI, you need to use precise language to say unambiguously what you want it to say. In my experience, people are notoriously bad at exactly that, especially doing so in concise ways (so it doesn't take a lot of time).
Getting that experience will let you build on the experience now to take next steps. Once ahead, you get more ahead. LLM's aren't the end all. Getting the experience to see where the train stops and their use stops first lets one build experience in that environment.
I notice in my field (networking) that some of us are already doing the tasks of multiple colleagues, where we used to work at the same speed. The quality of work is audited after being meticulously self-controlled.
Good LLM use is hard. Once you figure out what tasks to use it with, there are compounding effects of those who've always been the "best" at their roles getting significantly better. Others aren't getting the same increases in efficiency. The gap is widening. Especially with tasks where collaboration is difficult.
The workplace is a competitive environment, where there are limited positions that get ahead. Being viewed as slightly better will get you that first position, that position gets you the next one in competition with others. And so the ball keeps rolling folks who've got the edge upwards.
There are ways of thinking that are transformative.
When a kid learns to read, suddenly a new world emerges due to the transformative nature of reading.
When a kid learns a second language reasonably well, a new world with entirely new cultures emerge.
When a student learns the scientific method in school, a new world with entirely new ways of knowledge emerges.
When in high school /university, an entirely new way of seeing the world emerges after engaging with philosophy/ethics.
The same goes for having a basic knowledge of statistics: Suddenly you can evaluate the world in new was.
With coding, suddenly applied logic becomes a real tool for evaluating concepts.
The whole point of this list: AI-use has some of that same transformative nature. For me, the transformation this way is how we think about the most limited resource in our lives: time. AI is all about reimaging time-management and what to spend my time on.
That won't go away either.
I’ve been using Opus 4.5 recently to set up a new project. It’s definitely been a big time saver. To be fair, creating a new thing is way way easier than changing an existing thing. And with a new project there’s a lot of boilerplate. But there have been a few moments where it’s clear all of the work on tooling and the AI have paid off.
Just a couple hours ago I was debugging a weird oauth issue. In the process of feeding context into the LLM I gave it a GitHub URL with all of my oauth parameters. At that point it noticed my app ID was formatted as a “GitHub App” and not a “GitHub OAuth App”. Annoyingly credentials for one will kinda work as though it’s the other. It’s the kind of thing you’d need to be an expert on GitHub integration to catch. It could have taken me hours to find it myself. But the AI got it as soon as it had all of the necessary information.
I’ll have to go back and throw this model at things Opus 4.1 made embarrassing mistakes on. I do think I’ve been giving 4.5 easier tasks so I can’t say how good it really is yet.
Opus 4.5 really is that good. It's also a complete idiot and will still make embarassing mistakes. Just depends on the context (no pun).
I'm very curious whether 4.5 -> 5.0 will be as big of a leap as 4.0 -> 4.5 was. Or GPT 5.2 -> 6, or Gemini 3.0 -> 3.5. At some point they have to plateau but so far all predicitions of an impending plateau have been wrong. One way they can delay any potential plateau (or even just a slowdown in advancement) is by figuring out how to both have larger context windows efficiently and solve attention within larger context. A model with even current gen intelligence that could handle (let's say) 2-3 million tokens without drift would be a game changer.
I think it's important to draw distinctions between the system overall (LLM, tool calls, MCP, editor UI, etc.) and the LLMs themselves. In my example above Opus did do some web searches. How much of its performance is down to an optimized search engine or "Google-fu"? I still think there's a noticeable slow-down in the performance increases of the core LLMs. Not an outright plateau but the jumps from GPT 2 to 3 to 4 were hugely impressive. However from 4 to 5 feels not like an exponential jump but rather a reasonable upgrade. But I have seen the utility of AI tools in software engineering go up since the release of GPT-4. Not quite a steady rise, though. Kind of a "we're so over" one month to "we're so back" the next.
It's true that one of the biggest leaps was from non-agentic to agentic models, which makes the scaffolding and tooling as or more important than the model's "intelligence", but a big part of that is encoded in training. The models have to be extensively pre-trained on tool use. Tools ultimately add a parallel pathway of advancement in addition to general knowledge. Similar with things like attention, instruction following, context management and so on. There are more and more pathways to advancement happening at the same time.
Using Opus as an example, 4.1 to 4.5 was an improvement on a variety of levels. 4.5 uses tools better, in more contexts, follows instructions more rigorously, has better codebase and pattern awareness and maintains focus better while also having better code inference and other "base" skills. However incremental the advancement in any particular area, the overall effect is a dramatic difference in utility.
And also it's still an idiot. It's a bizarre paradox that's sorta unique to LLMs.
I built an "agentic" tool as contractor this year back in May. At the time the advice was to not give the AI more than 2 or 3 tools. It feels like things have improved a lot in that specific area in the last few months.
I definitely agree the models are getting better. I think for a while I felt they had plateaued but I’ve found the abilities of Opus 4.5 to be very useful. I may also be learning better how to use LLMs for programming.
I still like to read through the code it writes and rewrite a lot of it. I feel like we’re in good harmony when Opus writes a bunch, I commit, then I delete a bunch and commit again. They are code generators after all. Not really an automated software engineer but instead a fast typer. You need to know where you are and where you want to go. The AI then (sometimes) helps you get from A to B quickly.
Unfortunately, I agree with you and think this stuff isn’t going anywhere. Related topic, is anyone else trying to get out of software? Has anyone discovered a decent pathway away from this mess or are we all stuck with it?
Software engineer of 15 years here. I've started to consider a career change, not because I'm afraid of getting replaced by LLMs, but because of a deeper disillusionment with the entire software industry.
We've been in dire straits for a while now, but the introduction of LLMs has really thrown gasoline on the fire. Product, design, and software quality steepened their already-precipitous decline. Leadership and management widened their already-incredible disconnect with reality. The interviewing systems and engineering cultures at most companies went from "broken" to "completely broken".
I've worked at software companies of all sizes ranging 10 people to 10,000 people. Across these, almost all the challenges I've faced in my career have been organizational / political / human - very few have actually been technical. Typing out the code has never been the limiting factor, it has always been communication, coordination, and politics. LLMs have only made this asymmetry more pronounced. These human problems remain largely unsolved and as frustrating as ever, but software "leaders" everywhere seem convinced that optimizing the "typing the code" piece will revolutionize their organizations.
Meanwhile, I moved into an old (100+ year old) house that needed some love, naively thinking "I'll just fix it up - how hard can it be?". After calling contractors for every little thing quickly got old, I started tackling some jobs myself. The jobs I was comfortable taking on started small but got bigger and bigger over time. Learning tons of new skills that had relevance in the physical world and granted some degree of self-reliance was liberating. Working with my hands became addicting. Being able to end the day, step back, touch something real and say "I did that today" was so much more fulfilling than sitting on video calls and shuffling make-believe bits around with a keyboard. If you've never experienced this contrast first-hand (like I hadn't), it's hard to fully appreciate.
Now I'm strongly considering moving into the trades - becoming an electrician, carpenter, plumber, and/or builder. I am under no illusion that this switch will be easy. Software engineering is probably one of the physically softest/easiest jobs, while trade work can be incredibly grueling and taxing. Some of the onboarding paths to these careers involve years of "being someone else's bitch" to meet licensing requirements. The pay and the benefits will be worse. The hours will be longer. The culture is completely different. Taking something you enjoy doing "on the side" and making it your full-time profession is often a recipe for disaster - maybe this will be, too. But now I'm at the point where taking that risk for something that can be so much more fulfilling is looking worth it.
As the housing stock in the US ages, and the demand for new housing continues growing, we will need skilled tradespeople for the foreseeable future (until the humanoid robot electricians arrive, at which point I'll be standing in the bread line with the rest of you). The intersection of these two industries also seems compelling - the quality bar for most contractors is terrifyingly low today. If you have the capabilities needed to work in the software industry (learning new skills quickly, solving problems, communicating with people, verifying your work), it is likely that those capabilities will also serve you well in the trades and put you ahead of the pack. The building industry - and its intersection with building science, energy efficiency, and climate-conscious building particularly - still has a lot to learn, and needs capable people who care about doing good work.
Hope that helps.
My main concern is people (at least in Silicon Valley) seem to see LLMs as a way to write more bad code faster. I don’t want to work as a software engineer along side people that are atrophying while vibe coding. Otherwise they’re a great tool. Now, as should have always been the case, the focus is more on product and design.
They are a great tool but I find it hard to justify them morally. I feel like my day to day work now contributes to a lot of evil in society by funding these mega corps, automating others work, reducing individual freedom. I just don’t think I can continue on in this moral conundrum forever. Not to mention that I’m sure someone will try to replace me at some point.
Sucks to feel like the field I was so passionate for become a purveyor of evil.
Software has always been about automating jobs.
Sure but it wasn’t always about enabling the ruling class to class to own everything. Or maybe it was but it just wasn’t as effective as it is now. Either way, I don’t want to stay in the field.
I'm not sure who the ruling class is in this hypothetical (OpenAI?), but if the last three years have proven anything, it's that there is no moat when it comes to models. There's dozens of good open-weight options available, allowing for all kinds of use cases that were thought impossible just a few years ago. And you can run them now, on your own PC.
Check out GLM Air, DeepSeek R1, GPT-OSS, Qwen, FLUX.2, Phi 4, and all the other great open models. There's no need to be reliant on any cloud service.
It’s certainly much cheaper and easier to buy a $9500 512GB M3 Ultra Mac Studio running Chinese LLMs than to switch careers.
Even if one doesn't have the hardware to run a high-level model like that, there's many hosted alternatives available such as OpenRouter. Since the models are freely available, companies need to compete on price to host them, and as a result it's actually pretty affordable to use even the best models. R1 is famously cheap to use.
Realistically though, smaller and flash models still work great for a large number of tasks. The big models are really only needed for difficult problems. Think major refactors, not boilerplate.
I haven’t had much luck with LLMs doing major refactors.
The ruling class is the tech CEOs.
I do indeed run the open source models. They are the main way I try to interact with A.I. in my personal life. My job is a completely other story because no one will trust the OSS models.
Thanks for the suggestions.
I think part of the way out is to avoid broad generalizations and make some distinctions. Is it really true that they’re all bad, or is that just catastrophizing? Can you find a job for a company that seems to be doing something useful?
Also avoid purity tests where you can’t use a tool for a good purpose because of something you know about the company that made it. Big companies, especially, do a lot of good and bad things simultaneously. It’s sort of like avoiding a city because some of the people who live there are Trumpists. There are good people too. You’re not going to get purity.
Seemingly, no.
I don't want purity. I just want less evil options than Google, Meta, and OpenAI. I guess Anthropic will have to do... too bad my company doesn't have contracts with them.
Deciding that Google is "evil" is an example of the over-generalization that I'm talking about. There are good and bad people and good and bad products. You can make distinctions.
Fair enough. Frankly, I think I have seen enough from these companies to make a decision about their ethics.
The only jobs that are safe for the moment if you want to keep working on a computer is customer facing jobs. Jobs like IT support, business side, etc.
Its because bots are still considered second class help and companies want to provide their large clients with advanced support from actual developers.
So, if you don’t have people skills, you’re screwed. Our only way out is that people still prefer to talk to us than to a bot. If people don’t wanna talk to you, you’re out. Replaced by the polite bot.
Thanks for the honest assessment. I don't think I want to stay working with computers but that's the only skill set that I have :(.
I’m in the same boat.
Right now I’m just trying to stay ahead by using the models to speed up my work cause my company knows that I should be able to work twice as fast now so I do that. Eventually though it wont be enough.
Eventually the model wont need a technical person to oversee the quality of software it writes, it’ll be enough that you just have business side people validating that the end result is what they wanted.
They'll still need the occasional consultant to come in when "it was working but now it's not working". On the other hand it'll be a blackbox and may not ever be able to find why it stopped working or how it worked to begin with
Yeah, there will still be customer facing tech support jobs. Those jobs will just be for the very high paying customers and still also double as sales and training for customers.
If you’ve ever dealt with an AWS support specialist, thats what Im imagining.
User - facing tech support “have you turned it off and back on again” that will be replaced by bots. It pretty much already is.
It does make me sad that effective human problem solving had become a luxury item :(
You know, I think it kinda has been for a while. Like, Google doesn’t have support unless you spend probably millions on Google business suite right? Rest of us, even small non for profits, just get nothing. AI tech support is actually kind of an improvement for us.
Engineers are still going to be needed. But they’ll need to be the type that can interview users and come up with good product and design ideas.
If you’re a programmer in a more specialized field then I doubt you have much to worry about with AI. You need to be able to tell the LLMs what to do and if you are at say a PhD level doing deep tech think about how many people can even ask the right questions. Could any random person tell a senior engineer what to do? No that’s a rare skill.
Right so like right now theres this disconnect, theres a tech side and then the business side, and only the business side talks to the customers.
Already happening in some places, the two sides become combined on one team where each member of the team does both tasks of talking to the customer and developing the product.
That’s really common in startups and has been for a while. I think it’s much better than the old way. Personally I want to have my head wrapped around the entire thing. Code, product, design, finances, etc.
Yeah better for the people who can do all of that and get to keep their jobs, haha. I’m okay at the people talking thing, but there are many people better than me and I’ll probably always be picked over. The stuff I was good at, the model is better at it than me.
I was a hater no more than 6 months ago. I thought that LLMs probably didn't have all that much run way as just fancy markov chains. But with the agent like interfaces which are able to recursively ask the model to refine the answer over and over, it's won me over.
I can really see this format for AI and future models accelerating a lot of industries.
Yeah, the problem is the companies providing the models and the compute power... But here we are.
I agree this isn't going to go away, but at the moment outside of a "better google" and quickly setting up a small proof of concept, I don't find it reliable or useful at all. The results are all over the place and despite finding a lot of praise for agentic coding all over the internet, I can't find any reliable workflow that get me the same results that people tout.
A lot of people also recommended me to keep trying it out and keep customizing it, but I already did and I feel like I'm stuck getting the same (bad) results.
At the end of the day it's just another tool. If you don't like it, or don't need it, there's really no downside to not using it. Agents, scaffolding and tooling will get better throughout 2026 so you can always try again later to see if it's more useful.
And also, like other tools, using it the right way makes all the difference. If you want help or tips, feel free to message me.