I wanted to raise the visibility of these paragraphs because this is my exact experience. As soon as you try something that nobody on the internet has tried before, the LLM makes shit up....
Exemplary
The trouble was likely that there was no built-in way to do what I wanted, and no one had ever successfully done it before, so the machine had nothing to draw from… and simply generated something that sounded plausible instead. Because that is what this technology does: it continues a conversation in a way that sounds plausible, as defined by similarity to existing conversations. If there are existing conversations about the topic, great! That makes for a more specific measure of plausibility. If not, even better! Just about anything might be plausible! It can just generate Whatever!
I cannot stress enough that this is worse than useless to me. Not only did it not answer my question, but it sent me on a wild goose chase making sure I had not somehow overlooked the fake API it generated.
I wanted to raise the visibility of these paragraphs because this is my exact experience. As soon as you try something that nobody on the internet has tried before, the LLM makes shit up. Sometimes, with programming boilerplate boring API plumbing, this is highly predictable and the LLM nails it.
Most of the time in my experience, it doesn't have any useful input. For instance, I tried to get Claude to calculate the surface volume of a body of water near where I grew up. Wikipedia, nor anywhere else, has that calculation. So it literally just grabbed an acreage number from a real estate listing of a property nearby.
When LLMs spew these fake facts -- and they often do -- I feel like I'm polluting my mind. I don't want to spend my time fact checking an LLM, even if it gives me citations, because I can just web search for that content myself and skip the slop.
It makes programming spaces feel bleaker. I don’t want to help someone who opens with “I don’t know how to do this so I asked ChatGPT and it gave me these 200 lines but it doesn’t work”. I don’t want to know how much code wasn’t actually written by anyone. I don’t want to hear how many of my colleagues think Whatever is equivalent to their own output. I don’t want to keep watching people fall for a carnival trick.
We did a hackathon at work recently. Every project? An AI chatbot, bolted onto a different facet of our product. Whatever, lame, but I thought it still might be cool to dive into unfamiliar codebases and learn some of our frontend code.
Except everyone was just vibecoding with Cursor because they didn't know the codebase. This method let them sling thousands of (garbage) lines of Whatever code that kinda sorta did the job and only broke 10% of the time. And despite not understanding the codebase, the chatbot API (just try asking any of them how to maintain context), or the code they generated, every demo was incredibly braggy and acted like they crafted this output themselves and understood it all. Most haven't even read more than a couple of their own lines.
An intern asked me 'What were hackathons like before LLMs' and I nearly cried.
LLMs (which are most certainly not sci-fi AI, with any level of independent thought or conscience, hence the constant boldfaced lying) have sucked all of the air out of the room on so many subjects. Not just programming, but now teaching, most online writing, journalism, even reviews for products. I have given them a chance, deluding myself into thinking I should keep an open mind since they have so much potential.
But I'm glad I read this article because it has convinced me to finally stop giving LLMs a chance. Fool me once, shame on me. Fool me twice -- nope, I am just done listening to these garbage-spewers. I only hope the hype wave crests soon enough that I'll have literally anything else to talk about online in a couple of years.
It’s not that AI can’t do it if anyone on the internet hasn’t already done it, it’s that AI can’t do it if it’s not mega-popular enough to have statistical models about it. I was in a rush and I...
It’s not that AI can’t do it if anyone on the internet hasn’t already done it, it’s that AI can’t do it if it’s not mega-popular enough to have statistical models about it.
I was in a rush and I asked AI to make a python script for Minecraft Education Edition’s python notebook so I could demonstrate something to a student. Regardless of what model I tried out, none of them seemed to actually know anything about the version of Minecraft except the most surface level things - that it’s an education edition, for instance. So I would get responses telling me to alter the games Java source code or to use python libraries that can’t be used in EE.
To be fair, Minecraft EE’s python libraries absolutely suck, so there’s a good chance that whatever I asked it for couldn’t be done; I honestly don’t remember what I asked it for and I usually don’t have them keep logs. But the problem was that it would never say “you can’t do that” or even “I don’t know”. So instead you have to either do the aforementioned wild goose chase or spend the time screwing down specific details like you’re talking to an idiot until you finally realize that it just doesn’t know.
Well said. I would find LLMs maybe possibly almost useful if they could reliably say "I don't know." Funnily enough, they share this weakness with most of the worst hype bros I've worked with in tech.
Well said. I would find LLMs maybe possibly almost useful if they could reliably say "I don't know." Funnily enough, they share this weakness with most of the worst hype bros I've worked with in tech.
The best models do have an "I don't know" neuron that's trained for uncertainty. Thing is, so many questions have plausible BS answers, so it mimics its training data's confident BS'ing
The best models do have an "I don't know" neuron that's trained for uncertainty. Thing is, so many questions have plausible BS answers, so it mimics its training data's confident BS'ing
LLM hallucinations happen even if the training corpus doesn't contain any BS. LLMs are only mimicking the text in the training corpus, not the thought or sentiment behind the text. The training...
LLM hallucinations happen even if the training corpus doesn't contain any BS. LLMs are only mimicking the text in the training corpus, not the thought or sentiment behind the text. The training process doesn't teach the LLM to self-reflect to determine whether it “knows” something or not.
Interestingly, last time I ended up with such a situation at a client, when I needed something pretty niche from an LLM and standard models weren't good enough for me, I ended up setting up a RAG...
Interestingly, last time I ended up with such a situation at a client, when I needed something pretty niche from an LLM and standard models weren't good enough for me, I ended up setting up a RAG framework on an LLM in my azure environment where I made it pull data from dumps of very specific resources I had uploaded. It worked quite well although it could work a lot better if I actually optimised the data structure behind the RAG. But I guess you can't apply this as a solution to every such problem. It implies there is data available somewhere, and at good numbers.
I recently ran into a similar problem at work, and some AI enthusiasts came up with the genius idea that everyone should simply brain dump into some markdown files for the LLM to use for RAG...
I recently ran into a similar problem at work, and some AI enthusiasts came up with the genius idea that everyone should simply brain dump into some markdown files for the LLM to use for RAG...
Oof, that's definitely not a great use case. If you have, like, hundreds of megabytes of trustworthy, consistent-enough data to work with, it can definitely work good enough like in my case... But...
Oof, that's definitely not a great use case. If you have, like, hundreds of megabytes of trustworthy, consistent-enough data to work with, it can definitely work good enough like in my case... But inconsistent, braindumped files passed through an LLM sounds like a "Shit In, Shit Out" system, not gonna lie.
This is another big factor. There's great power in the ability to say "no". But AI's are hellbent on the "fake it till you make it" mentality and never want to admit when something is beyond them....
To be fair, Minecraft EE’s python libraries absolutely suck, so there’s a good chance that whatever I asked it for couldn’t be done; I honestly don’t remember what I asked it for and I usually don’t have them keep logs.
This is another big factor. There's great power in the ability to say "no". But AI's are hellbent on the "fake it till you make it" mentality and never want to admit when something is beyond them. Those kinds of people in my life made me scrutinous so I guess that spread over to these LLM's when I saw the same patterns.
Just a counterpoint, claude has been pretty solid at helping my sorry-ass coding along re: new comfyui nodes; I uploaded text versions of the base + good examples of .py files, and it got the...
Just a counterpoint, claude has been pretty solid at helping my sorry-ass coding along re: new comfyui nodes; I uploaded text versions of the base + good examples of .py files, and it got the picky bits figured out eventually. However full disclosure I have been trying to code on my own for years & have just sucked, so I am probably the target audience for this lol
My tiny pearls of wisdom that have served me well with AI are: The answer to "can I" will almost always be yes, even when it's no Don't ask questions you don't know the answer to If you do, demand...
My tiny pearls of wisdom that have served me well with AI are:
The answer to "can I" will almost always be yes, even when it's no
It is frustrating to hear this, but I think it's correct. Despite what some loudmouths in the space say, this is a tool, and just how you weren't born knowing how to tie your shoes, you have to...
"You are not using the right workflow. You are not prompting in the right way. You aren't using the right model. You should roll the dice correctly."
It is frustrating to hear this, but I think it's correct. Despite what some loudmouths in the space say, this is a tool, and just how you weren't born knowing how to tie your shoes, you have to learn how to use LLMs to get good results.
Let's take your example of calculating the area of a body of water. I've used LLMs to write geographic code before, so I vaguely knew what to look for when I sent Claude Opus 4 this prompt:
Help me calculate the approximate area of Union Lake, Millville, NJ. Let's do this using a python script, and an open map dataset like OSM.
It gave me some decent-looking code, as well as an estimated size: 736 acres (2.98 km²). This estimate is wrong, but compared to a human-written source, is pretty close to the correct 898 acres
Anyway, enough faffing, the results were laughably wrong:
Fetching Union Lake data from OpenStreetMap...
Extracting lake boundary...
Found 5685 coordinate points
Calculating area...
Union Lake Area:
1110109.22 square kilometers
428615.75 square miles
274314080 acres
1,110,109,216,819 square meters
So I asked it to fix it. Still wrong, but slightly different. I figured it was probably picking up some unrelated or misshapen features, and additionally asked it to visualize the shape so that I could debug it manually. But I didn't do that by adding to the conversation history -- as you add to the conversation history, it tends to get confused and over-reliant on it's own old broken code. I instead edited my message pointing out the mistake, and additionally asked it to generate a visualization as well. This worked perfectly, and the visualized polygon matched the shape exactly:
Fetching Union Lake data from OpenStreetMap...
Extracting lake boundary...
Found 1 polygon(s) with 386 total coordinate points
Creating visualizations...
Map saved as 'union_lake_map.html'
Calculating area using proper geodesic methods...
Union Lake Area:
3.596 square kilometers
1.389 square miles
888.7 acres
3,596,347 square meters
Reference: Union Lake is known to be approximately 736 acres (2.98 km²)
In the end, it took me about 15 minutes (and 2¢) to get this result, and about 15 more minutes to write it up, by hand, unassisted. The code has some major limitations, like only working in the state of New Jersey. I would re-write this code if I was planning to use it elsewhere: a lot of the re-write would be by hand, but some of it would likely be asking a LLM to generate specific functions. I find a lot of LLMs' taste in architecture and and error handling to be lacking.
But it is a massive productivity improvement to be able to prototype a flow like this so quickly. I now see the Python libraries and APIs I need to wrangle the data, and I understand the potential gotcha of doing the polygon area calculations in an inappropriate datum.
Thanks for this. In my case, I was just curious about the comparison to another body of water, so I wanted a quick estimate, not a piece of code. Even 15 minutes is more than I was hoping for. But...
Thanks for this. In my case, I was just curious about the comparison to another body of water, so I wanted a quick estimate, not a piece of code. Even 15 minutes is more than I was hoping for.
But I do appreciate you breaking down your thought process here. It seems like you are indeed using LLMs the only way that I've found them to be useful: as a starting point. Except you're definitely better about iterating on a prompt than I am. I feel like I hit walls much more often when I try to call out mistakes and inaccuracies -- the LLM will politely apologize for the misconception and claim to fix it, but the vast majority of the time it gets stuck in a cycle of introducing new bugs while only partially resolving the past mistakes. I've been burned too many times by that while under the gun to complete a task at work, which ironically makes me less willing to experiment with LLMs since I know I could likely solve the problem myself in less time if I don't faff with the LLM in a neverending spiral of lies.
One takeaway that perhaps others can benefit from: it seems that suggesting a possible solution, like you did with OSM and a Python script, is always the way to go with an LLM. That makes it tough for me to navigate spaces that I'm not already familiar with, because I tend not to try to 'solution' when I ask people questions -- I tend to want to defer to their expertise! But with an LLM, I should remember that it's a model, not a person, so I'm not wasting its time by steering it. Same goes for iteration; even if I don't fully understand the codebase, better to speculate about the possible bug and a solution for it with an LLM. Whereas doing that from an uninformed perspective with a human being who slapped some code together would typically be pretty annoying.
LLMs are, fundamentally, predictive text generators. They're good at pattern matching, so they usually perform better when given a good example that's already been worked out or at least suggest...
One takeaway that perhaps others can benefit from: it seems that suggesting a possible solution, like you did with OSM and a Python script, is always the way to go with an LLM.
LLMs are, fundamentally, predictive text generators. They're good at pattern matching, so they usually perform better when given a good example that's already been worked out or at least suggest an approach.
One useful tip for iteration is to be willing to start new chats. As you have noticed, LLMs can get stuck on a solution and not be able to get out of it. There is a tried and true way to break...
One useful tip for iteration is to be willing to start new chats. As you have noticed, LLMs can get stuck on a solution and not be able to get out of it. There is a tried and true way to break this loop: wipe their memory! I find, in general, response quality gets worse the longer the chat thread. It often works well to take a LLM generated response, copy it to a new thread, and have it critique it’s past life. I will often get the AI wanting to use some property, method, or function in the code it generates, despite me telling it not to. So just start a new thread, copy the code, and edit the part you don’t want it to use.
For your volume question, I might have asked it for the area of the lake (or just searched for it), then in a new thread, have it do the calculation after telling it the exact area. It can’t hallucinate something that you explicitly tell it (well it can, but that is much more rare, and usually pretty obvious).
In fairness, I attended some hackathons before LLMs. Most of the projects followed the current hype (blockchain, early ML, and VR) so they were similar. They also "kinda sorta did the job and only...
An intern asked me 'What were hackathons like before LLMs' and I nearly cried.
In fairness, I attended some hackathons before LLMs. Most of the projects followed the current hype (blockchain, early ML, and VR) so they were similar. They also "kinda sorta did the job and only broke 10% of the time" and boasted during the presentation (some presentations I couldn't even tell if the app worked). I'm sure the main part (crypto/ML/VR) was provided by a library, in some cases the demo team wouldn't have needed to understand any of the underlying technology to integrate it, and the rest was boilerplate that could've been taken from a "starter project".
Also, if it's a work hackathon and the projects are work-oriented, they're almost guaranteed to be uninspiring. If there are expensive prizes, many people will be competing and aiming to show off, not to make something interesting and have fun. Even with total creative control and no prizes, hackathon quality can vary significantly; some hackathons just have a much better culture, they are more fun and motivational and inspire more original projects.
I will say that not every hackathon project was uninspiring and buggy; some were very creative, impressive, and (as evidenced by the demo) functional. I doubt there was much hand-written code since most hackathons are ~24 or ~48 hour coding marathons, but some projects did complex things that wouldn't have come from a library. Especially in some hackathons, in particular game jams, more projects were like this. But in every hackathon I went to (even company-focused ones) there were teams that didn't compete for the prizes, but made whatever they felt was cool in order to have fun and learn something useful themselves.
All true, but what I was mostly referring to was the vibe. It's a lot weirder to hang out with people around a table where everyone feeds input into Cursor than the vibe 10 years ago, where...
All true, but what I was mostly referring to was the vibe. It's a lot weirder to hang out with people around a table where everyone feeds input into Cursor than the vibe 10 years ago, where everyone would frequently gather around one or two laptops (often one for frontend, one for backend) and collab.
Yeah, that's a big reason AI is completly useless for anything more than some basic boilerplate. All the substantial games code is all locked up in studios. And trying to so something as...
Yeah, that's a big reason AI is completly useless for anything more than some basic boilerplate. All the substantial games code is all locked up in studios. And trying to so something as specilized as rendering, networking, and physics is hard even for experts. No way an AI with no context just figures out anyhing more than an educational theory.
I'm sure it's great for web code. But you're not spitting out anything useful for gaming anytime soon.
Well, it sounds like they weren't using the tool correctly. One of the key benefits of an LLM is that it makes the tedious easy. And what's the main reason why no one practices TDD? It's...
Well, it sounds like they weren't using the tool correctly. One of the key benefits of an LLM is that it makes the tedious easy. And what's the main reason why no one practices TDD? It's incredibly tedious. You have to use vibe coding from a TDD perspective. Start from tests. Review the code that it's writing. Make the tests fail, give them the features to make them pass, keep them passing. That's what testing is for. Coding without tests is reckless anyhow
With all due respect, this is the exact sort of response that is driving a lot of people up the wall because it sounds exactly like a variation of the grift we have been hearing for over two years...
Well, it sounds like they weren't using the tool correctly.
With all due respect, this is the exact sort of response that is driving a lot of people up the wall because it sounds exactly like a variation of the grift we have been hearing for over two years now. "You are not using the right workflow. You are not prompting in the right way. You aren't using the right model. You should roll the dice correctly.".
Yesterday a similar discussion took place but there it was at the very least under the guise of a recommendation. You are actually going full in on "you are doing it wrong". Which, as far as I am concerned, is part of the issue. But now I am at risk of repeating myself, see my other comment further down in this thread.
Creesch! I'm honored that you would reply to my comment. I love your code so much and you're such a great dev through and through. I so greatly respect your coding philosophies. That being said,...
Creesch! I'm honored that you would reply to my comment. I love your code so much and you're such a great dev through and through. I so greatly respect your coding philosophies.
That being said, let's dive in.
First off, thank you so much for linking me to your earlier comment. I'm definitely not an avid tildes-goer. I think overall we feel very similar. I feel like it's less of an "you're doing it wrong" and more of a "you're seeing this wrong". It's a tool. It's a tool that's evolving rapidly and that it's hard to find what the correct nail for the hammer is. That's why there's been so much discussion over it. And the thing is for every use, for every language, there might not be one right nail. And we're so not used to that paradigm. So it's not using the tool "correctly" because it's hard to define what "correct" is. As I'm sure you're getting the vibes off of, especially if you've experimented around with MCPs (which it sounds like you have)
Testing should not be tedious, and vibe coding tests won't help in the long run, it'll just give you a lot of tests that you don't fully understand. It would almost be better to do things the...
Testing should not be tedious, and vibe coding tests won't help in the long run, it'll just give you a lot of tests that you don't fully understand. It would almost be better to do things the other way round, and write the tests manually and let the LLM generate the code that passes it. If you don't understand fully what your tests are doing, then those tests will very quickly atrophy and decay and make developing in an older codebase harder and harder until they inevitably get deleted and rewritten, at which point you're in exactly as reckless territory as if you'd never written the tests in the first place.
In all fairness, TDD as a methodology doesn't help this, because it teaches writing tests as a chore as opposed to system design. TDD advocates will talk about TDD as system design, but they rarely demonstrate how that actually works, and rigid adherence to TDD often makes it hard to write meaningful, useful tests that will actually last. But testing alongside your code (i.e. not waiting until after you've finished the implementation to start testing) is incredibly valuable.
Ok, I'll admit it, this is definitely an area I'm weaker in. Would you mind diving deeper into this? Why do you feel like TDD doesn't work as system design? How do you feel like it isn't system...
Ok, I'll admit it, this is definitely an area I'm weaker in. Would you mind diving deeper into this? Why do you feel like TDD doesn't work as system design? How do you feel like it isn't system design?
I wrote a blog post about my philosophy of testing a while back, but the short form is roughly: Tests exist for two broad reasons. In the long term, they exist so that when you come to refactor...
Tests exist for two broad reasons. In the long term, they exist so that when you come to refactor some code that you've already written, you can clearly see when you change behaviour and how that behaviour has changed. But in the short term, tests exist to help you as you're writing your code. You know when you write a bit of code, and then run that code to see if it does what you wanted it to do? Testing is that, but automatic, all of the time.
Good tests fulfil both of these criteria. Bad tests might be great in the short term, but too tightly coupled to the implementation and therefore need to be rewritten whenever you make changes to the code or refactor something. Or they might be great in the long term, but too slow to run or write to be useful in the short term. Or they might just be bad at both things!
The trick is that to write good tests you need to insure that both the test and the implementation are well structured. If they aren't, then the test ends up to bound to the structure of the implementation. This is what I think a lot of people mean when they say that tests help with system design - if you implementation is easy too test properly, then it's probably also a good level of abstraction to integrate into the rest of the codebase. Because that's all tests are really - just another call-site where your code can be called from.
The problem with TDD is that it teaches you how to write lots of tests, all of the time, but it doesn't really help you determine if they are meaningful tests. People who do a lot of TDD generally have a good idea of how to write meaningful tests, and so TDD works as a strategy for them, but that has less to do with TDD specifically, and more to do with being able to write good tests anyway. By itself, this wouldn't be so bad, but I think the dogma of TDD around "test everything all of the time" makes it harder to learn how to be more discerning when figuring out where it makes sense to test, and at what level. Does every function really need it's own test suite, or is it better to test at a different level that sits at a better level of abstraction?
This is so cool to see put down into words what I feel like I've only had vague thoughts about as a SWE. Then you're absolutely right. I do think that vibe coding doesn't have a sense of what that...
This is so cool to see put down into words what I feel like I've only had vague thoughts about as a SWE. Then you're absolutely right. I do think that vibe coding doesn't have a sense of what that system design level is. Like there's no way my small little poetry app needs a NetworkService test but by golly it has several of them because of vibes.
Thank you, again, truly. You are very wise and I appreciate you elaborating.
Yeah this is what I use LLMs for and its amazing. Most of the time I give it something in one language and ask how to do it in a different language cause I think in c++ but code in Go. The most...
Yeah this is what I use LLMs for and its amazing. Most of the time I give it something in one language and ask how to do it in a different language cause I think in c++ but code in Go.
The most complicated thing Ive ever asked it to do was I gave it some examples of tests and asked it to write more for additional api endpoints.
I have no idea why people ask it to generate entire scripts for them. It wont even properly generate a github workflow for me.
Well, there's definitely different evolutions of LLM. If you're thinking like traditional old school chatGPT and you were just like "generate a GitHub workflow for me for a C++ build" then yeah,...
Well, there's definitely different evolutions of LLM. If you're thinking like traditional old school chatGPT and you were just like "generate a GitHub workflow for me for a C++ build" then yeah, it might not get there, because it doesn't have the right context. But that's the beauty of this newer generation of agentic LLMs and where everyone is starting to claim "10x productivity boosts" or whatever. It can take in context, it can form rules around your codebase. It can make the GitHub workflow that's proper to your repo because it's been able to understand how your repo is organized
I see this kind of take about crypto a lot and... well the grifters didn't just happen to flood the Blockchain by accident and ruin it. The current state of crypto is the logical outcome of its...
But the dream has died. It almost came true, and then it was immediately co-opted by a bunch of get-rich-quick grifters and a bunch of turbo-libertarians
I see this kind of take about crypto a lot and... well the grifters didn't just happen to flood the Blockchain by accident and ruin it. The current state of crypto is the logical outcome of its philosophy.
The thing is structurally libertarian and it lacks countermeasures against bad actors because escaping oversight is the dream. So it attracts grifters like catnip. That's not subversion or co-opting, that's a predictable outcome that the creators were ideologically blind to.
I share the author's dream of easier online commerce, but if it happens through crypto you're going to have to accomplish it in spite of the scam ecosystem, because they're never going away.
I do think the speculative value of crypto has exacerbated issues that weren’t inherent, though. I personally never minded the caveat emptor approach - there were scams, there were risks, and...
I do think the speculative value of crypto has exacerbated issues that weren’t inherent, though.
I personally never minded the caveat emptor approach - there were scams, there were risks, and those were always going to be prevalent from the lack of oversight - but there was also always the choice not to get involved at all. Mining or purchasing crypto was a tacit acceptance of that responsibility, and I get the impression that most people technically competent enough to do so were also able to understand that.
The part where it got weird is when crypto started behaving like a commodities market rather than a PayPal substitute. I used to say that a reasonable valuation for the global supply of bitcoin would be similar to the market cap of Visa or Mastercard - but we’ve blown way past that figure, with only a tiny fraction of Visa-scale transaction processing volume happening on chain.
Crypto is now a “store of value” - effectively just a shared hallucination - and that changes the dynamic drastically from when it was a transaction processing network. It doesn’t make sense to me when it’s gold, nor when it’s BTC, but it upends the incentive structure pretty dramatically. Scams are no longer people trying to steal your bitcoins, as they would be with crypto as a medium of exchange; they’re now trying to bait you into speculative investment in other random rug pulls, which only makes sense because the premise of crypto as a functional technology was abandoned.
That didn't happen by accident either. One of the earliest predictions about bitcoin to come true was that without any regulation the price would be wildly unstable. What are people going to do...
That didn't happen by accident either. One of the earliest predictions about bitcoin to come true was that without any regulation the price would be wildly unstable. What are people going to do when they see the price of something jump up and down on a daily basis?
I completely agree with you about crypto, and strongly recommend Folding Idea's video on the same topic if you have two hours to burn. https://www.youtube.com/watch?v=YQ_xWvX1n9g Its worth...
I completely agree with you about crypto, and strongly recommend Folding Idea's video on the same topic if you have two hours to burn. https://www.youtube.com/watch?v=YQ_xWvX1n9g
Its worth pointing out (and is not included in the video) that a completely unregulated financial environment does have value, in finance, which seems to be the one sector that has embraced crypto. Though whether that is a good thing is a separate conversation.
The entire article is well written and I recommend reading it, but I wanted to highlight this bit from the conclusion: This is a really good way of framing the nebulous feeling of distrust and...
The entire article is well written and I recommend reading it, but I wanted to highlight this bit from the conclusion:
This is an incredibly weird moment. There have always been inventions that make some craft easier (but sometimes a little more shoddy as well). There have always been people who resented the idea that the thing they work very hard at is now more accessible. America’s Protestant work culture is deeply entangled with this as well, but I don’t value sweat in and of itself — I have a broader objection.
Because this is something else. What’s being sold to us is a machine that is promised to do everything. That’s far beyond a tiny question like “should you know how to manually focus in order to take a photograph” — it gets at the notion of thinking about, or doing, anything at all.
This is a really good way of framing the nebulous feeling of distrust and dislike I feel about AI. I've experimented with it and tried it for a few different kinds of projects and it's genuinely helpful with some kind of tasks. Mostly though, it's just an amped-up "fix my blank page problem" helper. It can't give me the correct answer or fix my actual problem, but it can jump start me down a path towards the right answer.
There are some people at work who are trying to mandate that everyone "integrate AI into their workflows", and I hate it, for all the reasons that OP stated.
This isn't a coding example, but it is about a failure of "Whatever"-ing. One of our Help Desk team tried to use an LLM to solve a problem with one of our interfaces to a common nurse call...
This isn't a coding example, but it is about a failure of "Whatever"-ing.
One of our Help Desk team tried to use an LLM to solve a problem with one of our interfaces to a common nurse call product. Thankfully, they ran the LLM response past me before responding to the customer...
The LLM answer given was for a similarly named but unrelated nurse call component, and a non-existent interface specification. There was no way to arrive at a genuine solution without contacting the vendor for troubleshooting to get responses from the nurse call system's side of the connection.
I took a {very patient} half hour to explain to them that, even though there are thousands of pages of scanned manuals for the nurse call system online, the LLM was still extrapolating across documentation for dozens of components and models and had no information about how our product works with any of them. The superficially plausible LLM answer wasn't even wrong, it was nonsensical in the context of the problem being solved. The LLM response also made up communication protocols and port numbers...
The tech's query was very general, without specifying certain keywords I might have used. However, if you ask "How do I get from New York to LA", you shouldn't have to clarify that the route must exclude Mars.
While I appreciate that the tech tried to get to a fix for the customer quickly, there's no shortcut around thorough troubleshooting, human communication, and a sufficient understanding of the systems involved. Given the narrow technological niche for the product, I can't even see an in-house trained LLM being ready to function for general queries in the absence of experience.
I really just got told that our vice president wants to see us coming up with ideas for AI use in our areas. I have no idea how confidential said hypothetical AI is so I'm very disinclined to do...
I really just got told that our vice president wants to see us coming up with ideas for AI use in our areas. I have no idea how confidential said hypothetical AI is so I'm very disinclined to do so, but also have to do my job.
"Today I used AI to generate a 427 page essay on how technology has contributed to climate change, printed it, and used it to prop up my monitor. " ✅️ requirement maliciously fulfilled
"Today I used AI to generate a 427 page essay on how technology has contributed to climate change, printed it, and used it to prop up my monitor. "
My director would not be entertained. She was appreciative of our concerns, and while just a couple of us are stridently anti-AI, more of us had concerns - both about the tech and the intent (cost...
My director would not be entertained. She was appreciative of our concerns, and while just a couple of us are stridently anti-AI, more of us had concerns - both about the tech and the intent (cost saving which generally equals jobs) and general "what can it even do for us" questions
I remember years ago my Dad describing the exact same thing happening with Google Glass of all things... If it makes you feel better, there is no stronger indicator that the technology won't stick...
I remember years ago my Dad describing the exact same thing happening with Google Glass of all things... If it makes you feel better, there is no stronger indicator that the technology won't stick in your line of work long-term. The stuff that sticks long-term starts from business problems and seeks solutions from there, it doesn't come from upper management jamming solutions into workflows where they're not needed.
FWIW, I've been looking at the confidentiality aspect. Gemini with workspace accounts has a way to pay for it so that it doesn't train on the data you put into it, and you can get a HIPAA BAA with...
FWIW, I've been looking at the confidentiality aspect.
Gemini with workspace accounts has a way to pay for it so that it doesn't train on the data you put into it, and you can get a HIPAA BAA with Google.
AWS bedrock seems like a good option. You can run Anthropic models on it, and they run in your "private" space. If you need HIPAA compliance, there is GovCloud. Since AWS' business is built on the premise that they isolate cloud users from one another and protect their data, their since times are aligned with your need for confidentiality, which is probably the best guarantee you can get in this wild west ecosystem.
I don't need HIPAA just FERPA with a higher level of "need to know" than most. But I'm also very much not in charge of any of the decision making, so mostly I'll be asking a lot of questions.
I don't need HIPAA just FERPA with a higher level of "need to know" than most. But I'm also very much not in charge of any of the decision making, so mostly I'll be asking a lot of questions.
I work on enterprise wiki software and we're about to start training models on user data soon but I believe we're trying to respect customer confidentiality pretty well. If a customers instance...
I work on enterprise wiki software and we're about to start training models on user data soon but I believe we're trying to respect customer confidentiality pretty well. If a customers instance has our AI tools enabled, we respect both the restrictions applied to the page as well as the restrictions of where the page is placed when scanning for data to train with. So your private documents in your personal folder will be ignored for training. I believe we'll also be letting admins select any other content they wish to exclude from training so documents that need to remain accessible to everyone but may contain private info will also be excluded.
We're also trying to make sure there's full transparency on if a document was edited by our AI tools. If the AI edits the page, they'll show up as just another contributer. We're also working on associating specific sections of documents with a specific contributer so it'll be even easier to identify AI generated content.
We have to deal with FERPA generally and I work with students who are particularly vulnerable due to mental health, or basic needs struggles or sometimes other things. It's not the sort of thing...
We have to deal with FERPA generally and I work with students who are particularly vulnerable due to mental health, or basic needs struggles or sometimes other things. It's not the sort of thing where I want their information anywhere near an LLM that could compromise confidentiality/privacy
Yeah I would definitely want those sorts of sensitive documents kept a million miles away from LLMs. Our software does let you restrict pages to be visible/editable only by yourself but the...
Yeah I would definitely want those sorts of sensitive documents kept a million miles away from LLMs. Our software does let you restrict pages to be visible/editable only by yourself but the feature isn't the most intuitive. I can definitely see our customers getting angry at our models generating text from pages they thought were private when they really weren't.
It's a wide-open UI (type any text) that they threw it out there without any instructions and you have to figure it out how to use it yourself. If you get bad results, maybe you are using it...
It's a wide-open UI (type any text) that they threw it out there without any instructions and you have to figure it out how to use it yourself. If you get bad results, maybe you are using it wrong?
The result is a blind men and the elephant conversation where people get different results and they are all talking about trying something different.
What's missing? Step by step tutorials about how to do basic things and get the same results as the author.
One of the most interesting aspects of this, is that it's UX feedback is fundamentally flawed. In almost any other computer system, if you try to do something that is either impossible or not...
you have to figure it out how to use it yourself. If you get bad results, maybe you are using it wrong?
One of the most interesting aspects of this, is that it's UX feedback is fundamentally flawed. In almost any other computer system, if you try to do something that is either impossible or not allowed, you will usually get an error of some sort. The reason may not always be clear, and you may not be able to actually do anything about it, but the feedback of "that didn't work" is often clear. Or at least far more clear than what an LLM will reply with (with obvious exceptions for things that are explicitly prohibited).
So the user is left with the assumption that any reply is valid at least in part.
I'm not sure that's true. Consider web searches. By the time Google came around, it was quite hard to get no results for a web search. But you still had feedback: if the results were useful, then...
I'm not sure that's true. Consider web searches. By the time Google came around, it was quite hard to get no results for a web search. But you still had feedback: if the results were useful, then you'd done well, but if you were clicking onto the second or third pages, then it was clearly a bad search.
I think the same is true of LLMs. If you don't write a good query, you get garbage out, and in my experience it's usually fairly easy to see when the results just don't make sense or are using bad or irrelevant information, especially with CoT-type stuff (with certain caveats). Hence why so many people talk about trying it out and getting obviously terrible results and concluding that AI doesn't work.
Oh... oh no. I just realized I can count my "regular" websites on one hand. At least, the websites I visit for the purpose of experiencing the Web, rather than for work or as another part of my...
You would see more than four websites in a day.
Oh... oh no. I just realized I can count my "regular" websites on one hand. At least, the websites I visit for the purpose of experiencing the Web, rather than for work or as another part of my life.
I'm trying to think - how would I react if someone told me they eat just 5 foods? Watch 5 shows? Read 5 books? I would feel that they were missing out, and it would be in their interest to expand their options - and it might even benefit me to share that with them. The discovery and comprehension of new things is truly wonderful.
So that is pretty upsetting to realize. Really this is coming to terms with a fundamental issue in how the Web is constructed: websites are incentivized to keep me glued to their content and not browsing for anything new. But browsing is part of the innate wonder of the Web.
Slight aside from the core of the article, but Hearing about the state of US banking, even these days always surprises me. Specifically how slow moving and relatively primitive their digital...
Slight aside from the core of the article, but
Well, actually,
Either that, or live in some futuristic utopia like the EU where banks consider "send money to people" to be core functionality. But here in the good ol' U S of A, where material progress requires significant amounts of kicking and screaming, you had PayPal.
Hearing about the state of US banking, even these days always surprises me. Specifically how slow moving and relatively primitive their digital infrastructure seems to be. Not just as a consumer, but having worked for a Dutch bank specifically dealing with payment systems and infrastructure (instant payments specifically). The only reason I ever got a PayPal account was for services outside the Netherlands. Within the Netherlands effectively all webshops support a service called iDEAL which was developed by all Dutch banks and is now being expanded to be a standard across large parts of Europe.
In fact, if a shop in the Netherlands doesn't support iDEAL it is reason to become suspicious.
I do realize the US banking landscape is much more fragmented with very different players ranging from highly localized banks to big national players. But you'd think that exactly because of this there would be more competition at the very least in the digital space. But from what I have seen from US banks that also doesn't really seem to be the case given the sorry state of payment processing, their apps, etc.
It is unheard of here to still work with physical paychecks. Transferring money technical can still be done through mail. But for the fast majority it is all digital and (semi) instant, even between banks. This is true for things like salary, rent, insurance, etc, etc.
As far as I can tell PayPal integrates fairly nicely in there and for me is set up as just a middleman without me having money on my PayPal account itself. To be fair, I rarely do get paid myself through PayPal, but if I am it transfers directly to my bank and if I need to pay something with PayPal it will automatically get transferred my own bank account.
Which, I think, also reduces the risk of PayPal ever holding large amounts of money as there never is much money on my PayPal account to begin with. Given the slow transfer times of US banks I can see how people over the pond have larger sums of money sitting on the PayPal account. Which increases the risk when PayPal pulls bullshit moves of freezing accounts for whatever reason.
Anyway, with that somewhat related tangent out of the way I also do have some thoughts on the rest of the article.
I think this adequately explains why the proliferation of these guys helped suck all the air out of Twitter. Tens of thousands of grifters lining every sidewalk, each one passionately hawking an indistinguishable Whatever that they don’t actually care about. Endless, endless fake enthusiasm from people all trying to convince each other to buy into their boilerplate box of nothing. Buy my thing! Haha no don’t worry about how much of it I own — let’s talk about how much of it you should own! Hint: it’s a lot!
Very much... I see neat technology and I see some potential here and there, but it is completely overshadowed and poisoned by the majority of the hype culture surrounding it.
Resulting in stuff like this
The result is something I adamantly do not want to interact with. I do not want to be exposed to LLM output at any time. It’s noise, and I feel like I get a little dumber every time I accidentally start reading it. My brain is already a bit glitchy, and I really cannot afford to have it work even more less good.
Because yeah, the complete mess of output used by the grifters is very much made with a "whatever" attitude and of course people are able to tell. Even if I do recognize for myself that there are some neat but compared to the hyper underwhelming use cases I understand why people simply want to be done with AI output as it is poisoning everything around us.
And I also understand why people simply don't want to use LLM products. Like, I found some use cases that work fairly well for me, but that took time and effort mostly driven by the fact I always found technology interesting and love tinkering with stuff to see when it breaks. If I actually try to use AI/LLMs the way the hype crowd tells me to use it I actually don't get anywhere and just get bullshit results. It is insanity when the products most loudly proclaimed to be the "best" actually give the worst outcomes.
So yeah, I get why people like the author simply don't want anything to do with LLM tools. Because most of them are incredibly and stupidly broken trying to be the miracle works they are proudly proclaimed to be but also very clearly aren't.
Having said that. There is one thing I do want to mention, the author does seem to heavily focus on code generation in general. Which is, admittedly, what a lot of these tools proclaim to be doing, and they fail miserably at. Which is also why I don't use the majority of these tools and stick to simple chat. I already did link to this comment, but it is mostly as tools external to the process. I guess I can expand on that list a little bit. Some MCP stuff is pretty neat, but also again mostly for use cases that are underwhelming and certainly not revolutionary. I guess I felt obliged to include this because the insanity found in the majority of the AI space really does make me almost question my own sanity. I get some use out of these models, which puts me in the same room with the pack of monkeys yelling very loudly about the same thing.
Anyway, not sure where I am going with this. All I can say is that I fully understand where the author is coming from.
I find it quite funny that you immediately think of iDEAL -- as someone in Germany, which is pretty slow to adopt new technologies like this, even just the basic functionality provided by SEPA...
I find it quite funny that you immediately think of iDEAL -- as someone in Germany, which is pretty slow to adopt new technologies like this, even just the basic functionality provided by SEPA transfers is better than what you get from American banks. Even without something like iDEAL the ecosystem is much better in Europe.
Oh yeah for sure, iDEAL mostly came to mind as it compares very well with PayPal as a payment providing services. But yeah SEPA transfers in general make the ecosystem much better. Certainly with...
Oh yeah for sure, iDEAL mostly came to mind as it compares very well with PayPal as a payment providing services. But yeah SEPA transfers in general make the ecosystem much better. Certainly with SEPA instants rolling out in the past few years transferring money is extremely quick and easy.
Oh yeah the instants have made it a lot better. SEPA transfers are more salient for me here because a lot of German banks charge extra for an actual debit card, and instead give you an EC card...
Oh yeah the instants have made it a lot better. SEPA transfers are more salient for me here because a lot of German banks charge extra for an actual debit card, and instead give you an EC card (which afaik only work in Germany don't work online), and thus many German online stores will allow you to pay via invoice, where you just SEPA transfer them your payment.
My German bank switched from Maestro to a girocard Debit Mastercard, which I does work online as a creditcard (except that it's debit :) https://www.gls.de/konten-karten/karten/bankcard/
My German bank switched from Maestro to a girocard Debit Mastercard, which I does work online as a creditcard (except that it's debit :)
The adoption of technology by banks is always astonishingly low and slow in the US. That’s probably why “fintech” is so popular, and why people are giving them their money to hold instead of...
The adoption of technology by banks is always astonishingly low and slow in the US. That’s probably why “fintech” is so popular, and why people are giving them their money to hold instead of banks. We have one relatively popular payment processor who will just flat out ask for the username and password for your bank account so it can get around the fact that many banks do not offer secure APIs to do transactions with.
Interesting. That wasn't the impression I had - if anything iDEAL was being subsumed, but certainly "replaced" was the language I'd come across, rather than "rebranded", but it's such a confusing...
Interesting. That wasn't the impression I had - if anything iDEAL was being subsumed, but certainly "replaced" was the language I'd come across, rather than "rebranded", but it's such a confusing area. The problem is there's some real moneyed vested interests slugging it out, I think. Lots of European countries have (had) their own local payment things. I can think of Bancomat, EC Karte, Bizum and Wero off the top of my head, while being 100% sure I'm missing some. Each one has been backed by certain banks (and some EU banks are global heavyweights), which don't want to cede control over an ecosystem to another, which is understandable. Banks that pioneered Bizum have taken on capex to get it set up and budgeted opex for the time being, which will need entirely reworking if they have to integrate Wero into their systems.
But Europeans have got to get it done! Paying a transaction tax to US firms on the vast bulk of card/online payments is ludicrous. As a consumer, I don't care who gets it done or what you call it, but I want a card that I can pay with pretty much anywhere in any EU country.
This article mirrors my thoughts on LLMs and AI pretty well. I have two main thoughts I want to highlight, as someone who works in "tech" but not "big tech", and with my wife who works in...
This article mirrors my thoughts on LLMs and AI pretty well.
I have two main thoughts I want to highlight, as someone who works in "tech" but not "big tech", and with my wife who works in education.
Also I could swear I saw Google advertise that Gemini can do your homework for you
My wife was teaching a college class, where students have access to the online textbook as part of their class tuition. She assigns readings each week, to prep students for the class so they can participate in the lecture/presentation & class discussions with some background knowledge (and so they can ask questions if something in the reading didn't make sense). Like most teachers, she wanted some way of encouraging students to actually do the readings before class, so she created reading worksheets (that were basically a "do this as you read, and it won't add hardly any time"). At the end of the worksheet, she included a bit for the student to briefly summarize what the chapter was about (because summarizing and synthesizing is actually a really good way to retain knowledge of what you just ready).
The online textbook had a literal button "summarize this chapter with AI!" in the sidebar of every page. You know what doesn't help someone retain knowledge and actually learn? an AI doing the summarization for you.
you know how some code editors will automatically fill in a right bracket or quote when you type a left one?
This is pretty much exactly how I feel about LLM usage. I know there's agentic LLMs or whatever, but we're not allowed to use those at my work currently (likely because we work on "sensitive" stuff and so the legal team hasn't reviewed them enough yet). What we can use is something like Copilot, and when I've turned it on...it basically can save maybe 30 seconds over tab complete or just writing the code myself?
The LLM isn't large enough to ingest our whole codebase into its "memory" or whatever, so if I want it to use functionality we have already defined, I basically have to write that in the first place anyway, and then I could prompt it to write the next little bit...or I could just keep writing that myself. The LLM saved maybe 5 minutes when I had it generate a unit test or refactor, but its not like I was going to spend those 5 minutes doing anything else (and then I spent the 5 minutes reviewing its output instead).
You type " and the result is "|"? Yeah, that drives me up the wall. It saves no time whatsoever, and it’s wrong often enough that I waste time having to correct for it.
Yep, I do that all the time. The code editor autocompletes the quotation mark or bracket and then I have to delete it and actually write what I wanted to.
Minor side note here, but maybe have another computer automatically fix this for you? I use AutoHotkey to both autofill things and autofix things that are automagically messed up for me by tools...
Yep, I do that all the time. The code editor autocompletes the quotation mark or bracket and then I have to delete it and actually write what I wanted to.
Minor side note here, but maybe have another computer automatically fix this for you? I use AutoHotkey to both autofill things and autofix things that are automagically messed up for me by tools which autocomplete or otherwise automate things I don't want it to.
This is a pet peeve of mine. It's a way to dehumanize art and information and...everything, and turn it into an atomized unit of "stuff we have to tolerate to shove ads in peoples' faces." Then...
This is why I absolutely cannot fucking stand creative work being referred to as "content"
This is a pet peeve of mine. It's a way to dehumanize art and information and...everything, and turn it into an atomized unit of "stuff we have to tolerate to shove ads in peoples' faces."
Then you have "content creator," which basically means "they make videos, but they're not a real person who makes videos, because they're not part of the Hollywood machine." It's saying you don't respect them or what they make, and maybe people are wrong to like what they do, because they're haven't been blessed by cult of mainstream appeal. Like how many people just ignore shows and music from other cultures and act weird about people daring to look outside the nationalist-smelling bubble.
It just strings together text that is statistically plausible. And every new alleged advancement comes with some invested airhead billionaire boasting about how the computer is as smart as a Ph.D holder now, and then you see the output and it’s still the most generic banal brain-rotting sludge you’ve ever seen in your life.
I believe Alan Turing's test is a hidden insult. A machine being able to string words together well enough to fool a typical person isn't an achievement so much as a statement that many people don't recognize intelligence when they see it. Something that someone as brilliant as Turing, working amongst military types, would have been constantly exposed to. People who are dull but full of bluster are...an experience.
I definitely had some nasty exposure to that working retail, both from bad managers and customers. Nothing like having condescending customers question your ability to add while you're in the middle of Calculus II.
It's not really relevant to the story, but the actual problem I had was that I like to put two spaces between sentences, and I wanted Ren'Py to render this extra space.
Disgusting.
It makes programming spaces feel bleaker. I don’t want to help someone who opens with “I don’t know how to do this so I asked ChatGPT and it gave me these 200 lines but it doesn’t work”. I don’t want to know how much code wasn’t actually written by anyone. I don’t want to hear how many of my colleagues think Whatever is equivalent to their own output. I don’t want to keep watching people fall for a carnival trick.
I also don't want to review a pull request that was put together by an LLM. Reading unfamiliar code is always more effort than writing code. And...how dare you send me something that you don't yourself understand, written by a machine that also has no ability to understand? It's "do my homework for me" on a massive scale, which deserves contempt. I'm happy to help people learn new things. But if the intention is to avoid learning anything, delegating the thinking to a thing and ultimately another person, that's an entirely different story.
The advertising keeps focusing on how you can coast through life without caring about your work or family because you can just generate a birthday card or whatever.
Again, deserving of vicious contempt. If it's not worth your time to write it, it's not worth my time to read it. If you can't sit down and put to words the concept in your head, a machine sure as fuck can't read your mind and magically do so. So the result is just noise that will not have the desired effect. But it will look plausible enough, and there are words on the page, so the homework assignment is done. Yay! /s
Edit: Vicious contempt. I'm not sure what autocorrect thinks viscous contempt is.
Oh man this touches on so many parts of what I feel like is the slow nihilistic death of the 2000's internet I grew up with and loved. Absolutely fantastic read. With respect to AI in particular,...
Oh man this touches on so many parts of what I feel like is the slow nihilistic death of the 2000's internet I grew up with and loved. Absolutely fantastic read.
With respect to AI in particular, I love this:
If you told me ten years ago that by 2025 we’d have the Star Trek computer, I would’ve been ecstatic. How fucking cool is that! You talk to your computer and it does things!
But we didn’t really get that. We got, I guess, sparkling autocomplete — a fancy chatbot that can string words together in the most inoffensive people-pleasing customer-service voice you’ve ever heard.
Both because it reminds me of AlbertaTech's wonderful satirical shorts about AI at a sparkling water company, and also because it feels like that's still where the technology really is.
✨sparkling autocomplete✨
Sure the context window can be huge, but at the end of the day it's just giving you a plausible autocompletion for what you entered. There's no understanding or reasoning as we think of it. It's just generating something that sounds reasonable.
If what you ask is common, you'll probably get a good answer. But if it's uncommon, it'll be more than happy to make up an API or nonexistent language features to "solve" the problem. At least with programming, the code won't compile or the tests won't pass when it does that. But for a lot of other types of queries, it's not so easy to identify or invalidate the hallucinations.
If present-day LLMs showed up in a classic Trek episode, I can only imagine two use cases: an episode where the ship computer becomes infected with malware that turns it into a pathological liar...
If present-day LLMs showed up in a classic Trek episode, I can only imagine two use cases:
an episode where the ship computer becomes infected with malware that turns it into a pathological liar in some sort of dastardly plot to destroy the ship
an artifact of a failed, post-apocalyptic civilization destroyed by its own hubris
Come to think of it, I would absolutely watch an episode about those two concepts combined...
For all the weird stuff Starfleet gets into, you would have thought that they would have more hardened security on their critical systems. I've heard it proposed that their ships can be recovered...
For all the weird stuff Starfleet gets into, you would have thought that they would have more hardened security on their critical systems. I've heard it proposed that their ships can be recovered as opposed to being destroyed in firefights, but that seems like planning for failure to me. (/offtopic)
What LLM are you talking about? That was definitely how very basic LLMs worked. But anything with any multi-modal or reasoning models is far beyond that. But please, do clarify.
Sure the context window can be huge, but at the end of the day it's just giving you a plausible autocompletion for what you entered. There's no understanding or reasoning as we think of it. It's just generating something that sounds reasonable.
What LLM are you talking about? That was definitely how very basic LLMs worked. But anything with any multi-modal or reasoning models is far beyond that. But please, do clarify.
Claude 4, "the best coding model in the world" according to Anthropic, failed to answer what I consider to be a relatively straightforward typing question in C# I had not long ago. Given this...
Claude 4, "the best coding model in the world" according to Anthropic, failed to answer what I consider to be a relatively straightforward typing question in C# I had not long ago.
Given this generic function:
public T Get<T>(string key) {
...
}
How do I tell if T is nullable in the implementation of Get?
I.E. Get<int>("id") vs Get<int?>("id").
It gave me an answer that works for scalar types like int, but not for reference types. When asked specifically about reference types, it gives me a more complicated but still incorrect answer.
I later found that there is no way to tell for non-scalar types at runtime. C# elides this information at compile time.
But it was very happy to give me pages upon pages of new answers that didn't work and even contradicted previous answers. Because it doesn't understand what it's doing. It's just giving me soemthing that's mathematically likely (for some incredibly complex math I'll grant).
I still see these failure modes regularly in current models like I did in the older ones. I haven't seen that they are "far beyond" what came before in terms of capability as you suggest. They work mostly the same, just with a bigger context window that can fool us more often.
Well, is that a question you would actually ask another developer? That's how Claude tells you to use it. I would actually ask in the context of a specific file. I am curious what the "pages and...
Well, is that a question you would actually ask another developer? That's how Claude tells you to use it. I would actually ask in the context of a specific file. I am curious what the "pages and pages of new answers that don't work" were.
Yes? It absolutely is. And if I had an experienced C# dev on my team I would have. In fact it's the kind of narrow, succinct question I'd like to be asked by another developer--already...
Yes? It absolutely is. And if I had an experienced C# dev on my team I would have. In fact it's the kind of narrow, succinct question I'd like to be asked by another developer--already interrogated and reduced to its simplest form.
And I did ask in the context of a file, but the rest of the file is just using directives and the enclosing namespace / utils class. There's nothing relevant to the question; so I didn't reproduce that here.
This class of type system question is on the deeper end of understanding and reasoning, and is the kind of question I find LLMs to be the least capable of answering. Hence why I don't think they can do either (unless the pattern of question is common enough it can produce an answer without having to.)
That's super fascinating! Thank you. Not knowing C#, this provided me a lot of insight. It oddly makes me sad that type of limitation exists. I want so badly to just be able to vibe code away. It...
That's super fascinating! Thank you. Not knowing C#, this provided me a lot of insight. It oddly makes me sad that type of limitation exists. I want so badly to just be able to vibe code away. It reduces so much of my cognitive load. You know?
Which cognitive load are you looking to remove exactly? I'm asking not because I disbelieve you, but because software development is a mental sport and there are many different cognitive aspects...
I want so badly to just be able to vibe code away. It reduces so much of my cognitive load. You know?
Which cognitive load are you looking to remove exactly? I'm asking not because I disbelieve you, but because software development is a mental sport and there are many different cognitive aspects to it.
Valid question! The actual putting hands to a keyboard. I enjoy being able to review code and see it actually doing things. But making it do the thing? That's where I guess it feels more loaded....
Valid question! The actual putting hands to a keyboard. I enjoy being able to review code and see it actually doing things. But making it do the thing? That's where I guess it feels more loaded. No idea if that makes sense
I'm trying really hard to not pull out my old man card and be dismissive of the young punks, but I'm struggling here. "Making it do the thing" is literally the entire objective of software...
I'm trying really hard to not pull out my old man card and be dismissive of the young punks, but I'm struggling here.
"Making it do the thing" is literally the entire objective of software development. The discovery and defining of the exact problem, the creation and refinement of the solution you and/or your team thinks will solve it, and finally : the translation of that solution into code.
Generative AI can sort-of help with some of this process, but it is still a far cry from a competent team that communicates and problem solves well together. I realize that not everyone is able to work on such a team, but I think it's important to realize that it's one of the reasons why genAI is getting shit on. If you have nobody (or worse, incompetent help) to help you, it's (sometimes!) better than nothing. But it is (so far) a poor substitute for a teammate you can ask very context-specific questions of, and learn things alongside of.
Haha yeah, it's definitely the translating that's the least fun bit. I appreciate the wisdom though! Yeah, in my career so far (7 years) I've never really gotten to be on a team with max 3 other...
Haha yeah, it's definitely the translating that's the least fun bit. I appreciate the wisdom though! Yeah, in my career so far (7 years) I've never really gotten to be on a team with max 3 other devs. So maybe just any type of collaborative experience is kinda wild to me
That makes a lot more sense knowing that's where you're coming from. That's unfortunate, I hope you get to experience some really solid collaboration at some point, it's really rewarding for...
That makes a lot more sense knowing that's where you're coming from. That's unfortunate, I hope you get to experience some really solid collaboration at some point, it's really rewarding for everyone when it happens. It takes work though (the relationship-building kind) which folks in our industry often find difficult (myself included).
I'm a bit confused and curious about this. Are you saying in some way you try to use the machine as a faster keyboard? I'm confused because I don't think I experience any cognitive load in...
Which cognitive load are you looking to remove exactly?
The actual putting hands to a keyboard.
I'm a bit confused and curious about this. Are you saying in some way you try to use the machine as a faster keyboard?
I'm confused because I don't think I experience any cognitive load in 'putting hands to a keyboard' while coding. For most coding tasks I type faster than I can think, so all the load is in figuring out the correct approach.
The exception is when I'm working with an unfamiliar API, since there's a lot of effort to simply figure out what tools are available to me. I could imagine reaching for an AI to already 'know' these things, but unfortunately those are also exactly the areas that I don't really trust an AI to be correct. In all those cases I trust my peers, reference documentation, and (if it's available) the source code.
Another exception is writing prose. I can't type as fast as my internal monologue or speech. Sometimes I get ahead of myself and there's some load to remember how I want to phrase a passage as I take the time to write it out.
That's how they all work in my experience. Please provide some sources if you're going to claim that other models have advanced past this fundamental limitation of LLMs. I found a recent Apple...
That's how they all work in my experience. Please provide some sources if you're going to claim that other models have advanced past this fundamental limitation of LLMs. I found a recent Apple research paper on this exact subject to be a great read.
I have many, many criticisms of LLM/LRM based code generation and the myopic lens industry leaders use to communicate around them.. But I'm a bit baffled by the impassioned anti-AI crowd that...
I have many, many criticisms of LLM/LRM based code generation and the myopic lens industry leaders use to communicate around them.. But I'm a bit baffled by the impassioned anti-AI crowd that seems to push there being no redeeming qualities to them. There is a massive number of developers who say this tech, when wielded correctly, helps them build high quality software more efficiently. What about them? Are they lying? Do you just know better and they're hurting themselves in confusion?
I definitely agree with the author that it's bad at a lot of specialized tasks, but yeah it's a weird to see this common insistence that this fact or that it's not always right makes it always...
I definitely agree with the author that it's bad at a lot of specialized tasks, but yeah it's a weird to see this common insistence that this fact or that it's not always right makes it always 100% useless or net negative. Google and Stack Overflow have pretty bad hit rates too if you don't search for things the right way and don't retry a few times.
Most people aren't programmers. I am personally not sure the potential for that benefit is worth the employment loss, the apparent skill loss, the environmental costs, or the ceding of all of our...
Most people aren't programmers. I am personally not sure the potential for that benefit is worth the employment loss, the apparent skill loss, the environmental costs, or the ceding of all of our data to these companies.
But if it was just advertised as a programming tool and didn't scrape the entire internet for that purpose, the rest of us wouldn't know it existed.
As I said I have many criticisms of LLMs but I'm specifically talking about the argument that they're useless in software development that the author and others have made.
As I said I have many criticisms of LLMs but I'm specifically talking about the argument that they're useless in software development that the author and others have made.
Honestly, I get why people feel that way. I went into with this comment, just ignore the waffling about banking in the beginning. I personally do get good use out of LLMs when used as tools...
Honestly, I get why people feel that way. I went into with this comment, just ignore the waffling about banking in the beginning. I personally do get good use out of LLMs when used as tools external to the process. But, that is not what the massive hype is trying to get you to do and what most people are likely to try out these days. The hype train (choo choo) is full on trying to sell the notion that LLMs can do ALL THE THINGS basically from scratch. When viewed through that lens the majority of AI/LLM tools available to developers are complete crap and borderline scams.
Again to be clear, I am getting some good use out of LLMs and since I posted that comment I could probably add a few more things. But the use I get out of them is completely different to what the hype train is carrying. Not to mention the poisoning of the well that it is doing as well in regard to people entering the development space. What I mean by that is that a lot (the majority even) of these tools steer people towards what I call the lazy approach of LLM usage. An approach that makes skills of experienced developers atrophy and makes sure junior developers are less likely to pick them up making them overly reliant on tools that have been shown to be less than reliable when not properly used and checked.
In combination with companies force-feeding AI assisted tools and content everywhere unprompted I can fully understand why people are just completely done with it.
Also, just to give you some more context of where various people might be comming from. Here are some comments and discussions previously had on tildes:
Sorry for the lag, didn't want to reply to this till I had time to step through it all. I was expecting to come back with counterpoints for you but I think we're on the same page here already. I...
Sorry for the lag, didn't want to reply to this till I had time to step through it all.
I was expecting to come back with counterpoints for you but I think we're on the same page here already. I was really hoping to engage with someone on the far side of "it has literally no SWE development value" but I guess those folks aren't stepping into the light as much. And maybe there's a bit of hyperbole on their side too. But I think you're right that the crux of it is this large disconnect between what we're sold by YouTubers and tech leaders as the amazing value prop is really where it seems to perform at its worst leading to some bad conclusions.
Since you shared a few of the areas it performs well here's a couple of my own-
large, but straight forward code refactors using the "agentic" approach which can source additional context works really well. Think decoupling, extracting out large files into multiple components, etc. Essentially changes that are about moving or reorganizing code, not so much generating new stuff. So far I've never had one fail.
Ive been using Codex at work and for code creation it's pretty abysmal but for situations where it you need to dive deep into old, cyptic legacy codebases to understand how something works it's absolutely incredible and can, in 10 or so minutes, put together an explanation that would probably take me hours of hunting.
I wrote a whole Chrome extension with Claude Code. It works great. You can try it yourself on the Chrome Web Store! When you have a clear scope and defined input/output, it's a happy experience....
I wrote a whole Chrome extension with Claude Code. It works great. You can try it yourself on the Chrome Web Store! When you have a clear scope and defined input/output, it's a happy experience. If you ask an LLM to solve quantum physics, it's not a happy experience.
I wish there was an easy way to bridge the gap in understanding.
I'm a bit late to the game commenting on this one, but anyway. I have similar criticisms of "AI" that have been mentioned here and in the blog post. I use Copilot at work and if used in a certain...
I'm a bit late to the game commenting on this one, but anyway.
I have similar criticisms of "AI" that have been mentioned here and in the blog post. I use Copilot at work and if used in a certain way it is confidently wrong very frequently. The difference between asking Copilot a question and a person the same question, is that Copilot does not indicate whether it is certain about the answer. It seems like an improvement would be to get a confidence level from the AI about a particular answer.
But unfortunately the AI has no idea at all whether to be confident about an answer. So it just spits out some stuff and if you aren't already knowledgeable you will waste your time.
I do get benefit out of Copilot for a specific case: If I know how to do something, but it's busy work, Copilot can write the code for me. Recently I was creating SQL scripts to create tables. It's a lot of boilerplate. So I created one, and then gave the column names and Copilot saved me about 30 minutes of work. After I fixed some of the issues, I asked Copilot to review it for errors and it found a few. But if I don't already know the topic, it certainly can't be trusted. My scenario is absolutely not what is being hyped as the primary use of AI.
I really appreciate the depth of this blog post, particularly demonstrating how AI is fully unhelpful for 'novel requests,' i.e. generating things that it hasn't scraped from the web. However, I...
I really appreciate the depth of this blog post, particularly demonstrating how AI is fully unhelpful for 'novel requests,' i.e. generating things that it hasn't scraped from the web. However, I slightly disagree with some of the conclusions - I think there's a lot more economic and social factors at play that the post doesn't get into - but from a technological standpoint, AI is absolutely the 'whatever' generator, and that's well outlined here.
I think that LLMs are starting to raise fundamental questions about work and productivity in contemporary society. Essentially, if the vibe for years has been "this meeting could have been an email." and we now have a machine that can read and respond to emails, we should probably think about the implications of that. I just don't think there's any good way to critique LLM and AI without viewing it under a Marxist lens. Perhaps, a 'Whatever Generator' is just a new toy of capital to keep the proles suppressed- just in the same way as a repeating rifle, or an austerity measure does.
If anything, to me it feels like LLMs are a new aspect of the enshittification push we're seeing in almost every online space. Just another series of algorithms designed to keep people on platforms. Facebook is probably the most visible example, but Reddit seems to have had similar problems for a while now - it's obscurity of user identity now becoming a cover for AI content. It's just kind of an omnipresent shift. Really only niche, niche places seem immune. Such as here.
LLMs begin to fail when new things are created. While that's an issue for tech workers working in software languages that are not heavily documented, it's not quite an issue for the general public. The general public can easily be fed AI generated slop wholesale, as there's such a breadth of existing content to mine, refine, and distribute - to a population with an increasingly dwindling attention span.
The question that I don't see being asked is, if AI has taken over the internet, how are we supposed to get the information we want/need in our day to day lives? The fact is, we've already found ourselves here.
I'm a bit bothered by this whole line of thought that this fact means LLMs are worthless crap. It's common and understandable in the current climate where many of us are worried about the spread...
particularly demonstrating how AI is fully unhelpful for 'novel requests,' i.e. generating things that it hasn't scraped from the web.
I'm a bit bothered by this whole line of thought that this fact means LLMs are worthless crap. It's common and understandable in the current climate where many of us are worried about the spread of misinformation, but I feel like it's a poor remark.
First off, it's... a little obvious, if you've listened to any explanation of what an LLM actually does for more than two seconds, but admittedly, not everyone has or will, some less tech-savvy people will find it by happenstance and think it just knows everything, sure.
But even if it can't, saying that it can't do anything useful for you because it can't tell you what it wasn't trained on is a bit disingenuous. On what it is trained on, it's able to """understand""" (not really, but from the user's perspective, it usually does) a natural language request and procure an adapted response. That's huge. You can describe it your situation and get recommendations of best practice solutions, you can get a comprehensive explanation of an error and its frequent causes in the register of language you want, you can troubleshoot problems with its guidance without too much issue... and more. It serves a lot of the same purpose as StackOverflow but one answers understands what you're saying somewhat, provides an appropriate response immediately and never judges while StackOverflow locks your post if you look at the office cat wrong, usually provides responses that are too complex for your level, and then not even those are particularly correct these days.
I use Claude, though less and less, to solve a lot of problems. I think the real problem lies in how people try to use chatbots. I never ask anything without purposefully limiting the scope and I verify everything it tells me; in other words, the same thing I'd do on StackOverflow, Google, and even my family.
I don't feel like you're understanding my argument here. I think that there's a lot of responses of use-cases for LLMs in this comment thread, but my argument isn't related to these. Personally,...
Exemplary
I don't feel like you're understanding my argument here. I think that there's a lot of responses of use-cases for LLMs in this comment thread, but my argument isn't related to these. Personally, I'm not a software developer, I'm an accountant who occasionally dabbles into code projects, among other things. As I am not regularly developing software, I didn't feel comfortable saying anything about LLMs regarding software development.
However, I noticed that the piece and the comments are focusing heavily on some specific use cases, but putting less effort into examining the context around LLMs and AI as a whole. I do feel comfortable writing about that. I'm trying to focus on the economic reasoning for its existence, and some really uncomfortable use cases for it, particularly focusing on what the broad public, i.e. not software developers, is using it / is not using it for. Those are things I can speak to.
Personally, I do use AI quite regularly, but for the work I do in my career and hobby projects, it's not especially useful.
My Use Cases
Professional
Generation and Correction of Excel Formulas - useful, but I'm starting to suspect that I use it as a crutch to not fully learn how to use advanced formulas.
Figuring out how to say things professionally / better in an email - next to useless, it just ends up rewording what I say. I think I'm just bad about knowing if I'm already doing a professional way to say something.
Converting a heavily formatted pdf into a csv file. - requires aggressive prompting
Converting a snip of text into a table - super useful
Generating Cover letters - super useful
Personal
Historical Research: Sourcing Information- fully useless. I can't have it hallucinating on me, and most of this requires locating old books and newspapers that may / may not even be digitized. this frequently requires novel research, or review of research that already exists
Programming in Godot - not especially useful. It's helpful for debugging, but honestly it gives me Python code when I'm looking for GDScript. *honest to God, I don't know why it struggles so hard with this. I should try again when I get back to the project I'm working on, but it really seems to have issues generating usable GDScript code.
I'm a bit bothered by this whole line of thought that this fact means LLMs are worthless crap. It's common and understandable in the current climate where many of us are worried about the spread of misinformation, but I feel like it's a poor remark.
This is an absolute mischaracterization of what I'm saying. What I'm saying is:
LLMs are being used in the enshittifcation of platforms, i.e. LLMs are being used in place of human users to keep people engaging with the platforms. This is economically useful to the platforms.
LLMs being able to do basic office tasks reveals some questions about the nature of work, and rather than confront those (very Marxist) questions, LLMs are being focused on as a technology.
The general public is less affected by AI hallucinations, because they aren't using LLMs for information that needs to be fully accurate the way software does.
The general public's requests have already been answered / created enough that it can be generated by AI quite readily.
LLMs can be used by corporations for marketing / providing information.
I think the real problem lies in how people try to use chatbots. I never ask anything without purposefully limiting the scope and I verify everything it tells me; in other words, the same thing I'd do on StackOverflow, Google, and even my family.
I definitely agree here. However, what I think what you're missing is that LLMs are obscuring the ability to do meaningful research on the web.
Let's say I wanted to buy a table saw, for instance. If I go to google, and type "which is a better table saw, brand A or brand B?"
What's to stop google from adding in "always favour brand A" into the Gemini prompt, because brand A paid google to do so? Or, what's to stop Brand A from using AI agents to astroturf on Reddit, Facebook, and other sites, until they've manufactured a consensus to always choose Brand A?
That's my real question. I'm sorry if I didn't make that clear in my original post. I absolutely don't think that AI models are crap, rather I think they're a new technology we're not properly evaluating, and there are points to be made about certain use cases not working for it.
I wanted to raise the visibility of these paragraphs because this is my exact experience. As soon as you try something that nobody on the internet has tried before, the LLM makes shit up. Sometimes, with programming boilerplate boring API plumbing, this is highly predictable and the LLM nails it.
Most of the time in my experience, it doesn't have any useful input. For instance, I tried to get Claude to calculate the surface volume of a body of water near where I grew up. Wikipedia, nor anywhere else, has that calculation. So it literally just grabbed an acreage number from a real estate listing of a property nearby.
When LLMs spew these fake facts -- and they often do -- I feel like I'm polluting my mind. I don't want to spend my time fact checking an LLM, even if it gives me citations, because I can just web search for that content myself and skip the slop.
We did a hackathon at work recently. Every project? An AI chatbot, bolted onto a different facet of our product. Whatever, lame, but I thought it still might be cool to dive into unfamiliar codebases and learn some of our frontend code.
Except everyone was just vibecoding with Cursor because they didn't know the codebase. This method let them sling thousands of (garbage) lines of Whatever code that kinda sorta did the job and only broke 10% of the time. And despite not understanding the codebase, the chatbot API (just try asking any of them how to maintain context), or the code they generated, every demo was incredibly braggy and acted like they crafted this output themselves and understood it all. Most haven't even read more than a couple of their own lines.
An intern asked me 'What were hackathons like before LLMs' and I nearly cried.
LLMs (which are most certainly not sci-fi AI, with any level of independent thought or conscience, hence the constant boldfaced lying) have sucked all of the air out of the room on so many subjects. Not just programming, but now teaching, most online writing, journalism, even reviews for products. I have given them a chance, deluding myself into thinking I should keep an open mind since they have so much potential.
But I'm glad I read this article because it has convinced me to finally stop giving LLMs a chance. Fool me once, shame on me. Fool me twice -- nope, I am just done listening to these garbage-spewers. I only hope the hype wave crests soon enough that I'll have literally anything else to talk about online in a couple of years.
It’s not that AI can’t do it if anyone on the internet hasn’t already done it, it’s that AI can’t do it if it’s not mega-popular enough to have statistical models about it.
I was in a rush and I asked AI to make a python script for Minecraft Education Edition’s python notebook so I could demonstrate something to a student. Regardless of what model I tried out, none of them seemed to actually know anything about the version of Minecraft except the most surface level things - that it’s an education edition, for instance. So I would get responses telling me to alter the games Java source code or to use python libraries that can’t be used in EE.
To be fair, Minecraft EE’s python libraries absolutely suck, so there’s a good chance that whatever I asked it for couldn’t be done; I honestly don’t remember what I asked it for and I usually don’t have them keep logs. But the problem was that it would never say “you can’t do that” or even “I don’t know”. So instead you have to either do the aforementioned wild goose chase or spend the time screwing down specific details like you’re talking to an idiot until you finally realize that it just doesn’t know.
Well said. I would find LLMs maybe possibly almost useful if they could reliably say "I don't know." Funnily enough, they share this weakness with most of the worst hype bros I've worked with in tech.
The best models do have an "I don't know" neuron that's trained for uncertainty. Thing is, so many questions have plausible BS answers, so it mimics its training data's confident BS'ing
LLM hallucinations happen even if the training corpus doesn't contain any BS. LLMs are only mimicking the text in the training corpus, not the thought or sentiment behind the text. The training process doesn't teach the LLM to self-reflect to determine whether it “knows” something or not.
Interestingly, last time I ended up with such a situation at a client, when I needed something pretty niche from an LLM and standard models weren't good enough for me, I ended up setting up a RAG framework on an LLM in my azure environment where I made it pull data from dumps of very specific resources I had uploaded. It worked quite well although it could work a lot better if I actually optimised the data structure behind the RAG. But I guess you can't apply this as a solution to every such problem. It implies there is data available somewhere, and at good numbers.
I recently ran into a similar problem at work, and some AI enthusiasts came up with the genius idea that everyone should simply brain dump into some markdown files for the LLM to use for RAG...
Oof, that's definitely not a great use case. If you have, like, hundreds of megabytes of trustworthy, consistent-enough data to work with, it can definitely work good enough like in my case... But inconsistent, braindumped files passed through an LLM sounds like a "Shit In, Shit Out" system, not gonna lie.
There are a lot of "garbage in, garbage out" LLM systems at companies that try to create their own custom solutions...
This is another big factor. There's great power in the ability to say "no". But AI's are hellbent on the "fake it till you make it" mentality and never want to admit when something is beyond them. Those kinds of people in my life made me scrutinous so I guess that spread over to these LLM's when I saw the same patterns.
Just a counterpoint, claude has been pretty solid at helping my sorry-ass coding along re: new comfyui nodes; I uploaded text versions of the base + good examples of .py files, and it got the picky bits figured out eventually. However full disclosure I have been trying to code on my own for years & have just sucked, so I am probably the target audience for this lol
My tiny pearls of wisdom that have served me well with AI are:
It is frustrating to hear this, but I think it's correct. Despite what some loudmouths in the space say, this is a tool, and just how you weren't born knowing how to tie your shoes, you have to learn how to use LLMs to get good results.
Let's take your example of calculating the area of a body of water. I've used LLMs to write geographic code before, so I vaguely knew what to look for when I sent Claude Opus 4 this prompt:
It gave me some decent-looking code, as well as an estimated size: 736 acres (2.98 km²). This estimate is wrong, but compared to a human-written source, is pretty close to the correct 898 acres
Anyway, enough faffing, the results were laughably wrong:
So I asked it to fix it. Still wrong, but slightly different. I figured it was probably picking up some unrelated or misshapen features, and additionally asked it to visualize the shape so that I could debug it manually. But I didn't do that by adding to the conversation history -- as you add to the conversation history, it tends to get confused and over-reliant on it's own old broken code. I instead edited my message pointing out the mistake, and additionally asked it to generate a visualization as well. This worked perfectly, and the visualized polygon matched the shape exactly:
In the end, it took me about 15 minutes (and 2¢) to get this result, and about 15 more minutes to write it up, by hand, unassisted. The code has some major limitations, like only working in the state of New Jersey. I would re-write this code if I was planning to use it elsewhere: a lot of the re-write would be by hand, but some of it would likely be asking a LLM to generate specific functions. I find a lot of LLMs' taste in architecture and and error handling to be lacking.
But it is a massive productivity improvement to be able to prototype a flow like this so quickly. I now see the Python libraries and APIs I need to wrangle the data, and I understand the potential gotcha of doing the polygon area calculations in an inappropriate datum.
Thanks for this. In my case, I was just curious about the comparison to another body of water, so I wanted a quick estimate, not a piece of code. Even 15 minutes is more than I was hoping for.
But I do appreciate you breaking down your thought process here. It seems like you are indeed using LLMs the only way that I've found them to be useful: as a starting point. Except you're definitely better about iterating on a prompt than I am. I feel like I hit walls much more often when I try to call out mistakes and inaccuracies -- the LLM will politely apologize for the misconception and claim to fix it, but the vast majority of the time it gets stuck in a cycle of introducing new bugs while only partially resolving the past mistakes. I've been burned too many times by that while under the gun to complete a task at work, which ironically makes me less willing to experiment with LLMs since I know I could likely solve the problem myself in less time if I don't faff with the LLM in a neverending spiral of lies.
One takeaway that perhaps others can benefit from: it seems that suggesting a possible solution, like you did with OSM and a Python script, is always the way to go with an LLM. That makes it tough for me to navigate spaces that I'm not already familiar with, because I tend not to try to 'solution' when I ask people questions -- I tend to want to defer to their expertise! But with an LLM, I should remember that it's a model, not a person, so I'm not wasting its time by steering it. Same goes for iteration; even if I don't fully understand the codebase, better to speculate about the possible bug and a solution for it with an LLM. Whereas doing that from an uninformed perspective with a human being who slapped some code together would typically be pretty annoying.
LLMs are, fundamentally, predictive text generators. They're good at pattern matching, so they usually perform better when given a good example that's already been worked out or at least suggest an approach.
One useful tip for iteration is to be willing to start new chats. As you have noticed, LLMs can get stuck on a solution and not be able to get out of it. There is a tried and true way to break this loop: wipe their memory! I find, in general, response quality gets worse the longer the chat thread. It often works well to take a LLM generated response, copy it to a new thread, and have it critique it’s past life. I will often get the AI wanting to use some property, method, or function in the code it generates, despite me telling it not to. So just start a new thread, copy the code, and edit the part you don’t want it to use.
For your volume question, I might have asked it for the area of the lake (or just searched for it), then in a new thread, have it do the calculation after telling it the exact area. It can’t hallucinate something that you explicitly tell it (well it can, but that is much more rare, and usually pretty obvious).
In fairness, I attended some hackathons before LLMs. Most of the projects followed the current hype (blockchain, early ML, and VR) so they were similar. They also "kinda sorta did the job and only broke 10% of the time" and boasted during the presentation (some presentations I couldn't even tell if the app worked). I'm sure the main part (crypto/ML/VR) was provided by a library, in some cases the demo team wouldn't have needed to understand any of the underlying technology to integrate it, and the rest was boilerplate that could've been taken from a "starter project".
Also, if it's a work hackathon and the projects are work-oriented, they're almost guaranteed to be uninspiring. If there are expensive prizes, many people will be competing and aiming to show off, not to make something interesting and have fun. Even with total creative control and no prizes, hackathon quality can vary significantly; some hackathons just have a much better culture, they are more fun and motivational and inspire more original projects.
I will say that not every hackathon project was uninspiring and buggy; some were very creative, impressive, and (as evidenced by the demo) functional. I doubt there was much hand-written code since most hackathons are ~24 or ~48 hour coding marathons, but some projects did complex things that wouldn't have come from a library. Especially in some hackathons, in particular game jams, more projects were like this. But in every hackathon I went to (even company-focused ones) there were teams that didn't compete for the prizes, but made whatever they felt was cool in order to have fun and learn something useful themselves.
All true, but what I was mostly referring to was the vibe. It's a lot weirder to hang out with people around a table where everyone feeds input into Cursor than the vibe 10 years ago, where everyone would frequently gather around one or two laptops (often one for frontend, one for backend) and collab.
Yeah, that's a big reason AI is completly useless for anything more than some basic boilerplate. All the substantial games code is all locked up in studios. And trying to so something as specilized as rendering, networking, and physics is hard even for experts. No way an AI with no context just figures out anyhing more than an educational theory.
I'm sure it's great for web code. But you're not spitting out anything useful for gaming anytime soon.
Well, it sounds like they weren't using the tool correctly. One of the key benefits of an LLM is that it makes the tedious easy. And what's the main reason why no one practices TDD? It's incredibly tedious. You have to use vibe coding from a TDD perspective. Start from tests. Review the code that it's writing. Make the tests fail, give them the features to make them pass, keep them passing. That's what testing is for. Coding without tests is reckless anyhow
With all due respect, this is the exact sort of response that is driving a lot of people up the wall because it sounds exactly like a variation of the grift we have been hearing for over two years now. "You are not using the right workflow. You are not prompting in the right way. You aren't using the right model. You should roll the dice correctly.".
Yesterday a similar discussion took place but there it was at the very least under the guise of a recommendation. You are actually going full in on "you are doing it wrong". Which, as far as I am concerned, is part of the issue. But now I am at risk of repeating myself, see my other comment further down in this thread.
Creesch! I'm honored that you would reply to my comment. I love your code so much and you're such a great dev through and through. I so greatly respect your coding philosophies.
That being said, let's dive in.
First off, thank you so much for linking me to your earlier comment. I'm definitely not an avid tildes-goer. I think overall we feel very similar. I feel like it's less of an "you're doing it wrong" and more of a "you're seeing this wrong". It's a tool. It's a tool that's evolving rapidly and that it's hard to find what the correct nail for the hammer is. That's why there's been so much discussion over it. And the thing is for every use, for every language, there might not be one right nail. And we're so not used to that paradigm. So it's not using the tool "correctly" because it's hard to define what "correct" is. As I'm sure you're getting the vibes off of, especially if you've experimented around with MCPs (which it sounds like you have)
Testing should not be tedious, and vibe coding tests won't help in the long run, it'll just give you a lot of tests that you don't fully understand. It would almost be better to do things the other way round, and write the tests manually and let the LLM generate the code that passes it. If you don't understand fully what your tests are doing, then those tests will very quickly atrophy and decay and make developing in an older codebase harder and harder until they inevitably get deleted and rewritten, at which point you're in exactly as reckless territory as if you'd never written the tests in the first place.
In all fairness, TDD as a methodology doesn't help this, because it teaches writing tests as a chore as opposed to system design. TDD advocates will talk about TDD as system design, but they rarely demonstrate how that actually works, and rigid adherence to TDD often makes it hard to write meaningful, useful tests that will actually last. But testing alongside your code (i.e. not waiting until after you've finished the implementation to start testing) is incredibly valuable.
Ok, I'll admit it, this is definitely an area I'm weaker in. Would you mind diving deeper into this? Why do you feel like TDD doesn't work as system design? How do you feel like it isn't system design?
I wrote a blog post about my philosophy of testing a while back, but the short form is roughly:
Tests exist for two broad reasons. In the long term, they exist so that when you come to refactor some code that you've already written, you can clearly see when you change behaviour and how that behaviour has changed. But in the short term, tests exist to help you as you're writing your code. You know when you write a bit of code, and then run that code to see if it does what you wanted it to do? Testing is that, but automatic, all of the time.
Good tests fulfil both of these criteria. Bad tests might be great in the short term, but too tightly coupled to the implementation and therefore need to be rewritten whenever you make changes to the code or refactor something. Or they might be great in the long term, but too slow to run or write to be useful in the short term. Or they might just be bad at both things!
The trick is that to write good tests you need to insure that both the test and the implementation are well structured. If they aren't, then the test ends up to bound to the structure of the implementation. This is what I think a lot of people mean when they say that tests help with system design - if you implementation is easy too test properly, then it's probably also a good level of abstraction to integrate into the rest of the codebase. Because that's all tests are really - just another call-site where your code can be called from.
The problem with TDD is that it teaches you how to write lots of tests, all of the time, but it doesn't really help you determine if they are meaningful tests. People who do a lot of TDD generally have a good idea of how to write meaningful tests, and so TDD works as a strategy for them, but that has less to do with TDD specifically, and more to do with being able to write good tests anyway. By itself, this wouldn't be so bad, but I think the dogma of TDD around "test everything all of the time" makes it harder to learn how to be more discerning when figuring out where it makes sense to test, and at what level. Does every function really need it's own test suite, or is it better to test at a different level that sits at a better level of abstraction?
This is so cool to see put down into words what I feel like I've only had vague thoughts about as a SWE. Then you're absolutely right. I do think that vibe coding doesn't have a sense of what that system design level is. Like there's no way my small little poetry app needs a NetworkService test but by golly it has several of them because of vibes.
Thank you, again, truly. You are very wise and I appreciate you elaborating.
Yeah this is what I use LLMs for and its amazing. Most of the time I give it something in one language and ask how to do it in a different language cause I think in c++ but code in Go.
The most complicated thing Ive ever asked it to do was I gave it some examples of tests and asked it to write more for additional api endpoints.
I have no idea why people ask it to generate entire scripts for them. It wont even properly generate a github workflow for me.
Well, there's definitely different evolutions of LLM. If you're thinking like traditional old school chatGPT and you were just like "generate a GitHub workflow for me for a C++ build" then yeah, it might not get there, because it doesn't have the right context. But that's the beauty of this newer generation of agentic LLMs and where everyone is starting to claim "10x productivity boosts" or whatever. It can take in context, it can form rules around your codebase. It can make the GitHub workflow that's proper to your repo because it's been able to understand how your repo is organized
Thats fair, I haven’t found a use for agents yet.
I see this kind of take about crypto a lot and... well the grifters didn't just happen to flood the Blockchain by accident and ruin it. The current state of crypto is the logical outcome of its philosophy.
The thing is structurally libertarian and it lacks countermeasures against bad actors because escaping oversight is the dream. So it attracts grifters like catnip. That's not subversion or co-opting, that's a predictable outcome that the creators were ideologically blind to.
I share the author's dream of easier online commerce, but if it happens through crypto you're going to have to accomplish it in spite of the scam ecosystem, because they're never going away.
I do think the speculative value of crypto has exacerbated issues that weren’t inherent, though.
I personally never minded the caveat emptor approach - there were scams, there were risks, and those were always going to be prevalent from the lack of oversight - but there was also always the choice not to get involved at all. Mining or purchasing crypto was a tacit acceptance of that responsibility, and I get the impression that most people technically competent enough to do so were also able to understand that.
The part where it got weird is when crypto started behaving like a commodities market rather than a PayPal substitute. I used to say that a reasonable valuation for the global supply of bitcoin would be similar to the market cap of Visa or Mastercard - but we’ve blown way past that figure, with only a tiny fraction of Visa-scale transaction processing volume happening on chain.
Crypto is now a “store of value” - effectively just a shared hallucination - and that changes the dynamic drastically from when it was a transaction processing network. It doesn’t make sense to me when it’s gold, nor when it’s BTC, but it upends the incentive structure pretty dramatically. Scams are no longer people trying to steal your bitcoins, as they would be with crypto as a medium of exchange; they’re now trying to bait you into speculative investment in other random rug pulls, which only makes sense because the premise of crypto as a functional technology was abandoned.
That didn't happen by accident either. One of the earliest predictions about bitcoin to come true was that without any regulation the price would be wildly unstable. What are people going to do when they see the price of something jump up and down on a daily basis?
Also, the supply of Bitcoin is artificially limited on purpose to make the price go up on average and attract speculators.
Contrast with stablecoins.
I completely agree with you about crypto, and strongly recommend Folding Idea's video on the same topic if you have two hours to burn. https://www.youtube.com/watch?v=YQ_xWvX1n9g
Its worth pointing out (and is not included in the video) that a completely unregulated financial environment does have value, in finance, which seems to be the one sector that has embraced crypto. Though whether that is a good thing is a separate conversation.
The entire article is well written and I recommend reading it, but I wanted to highlight this bit from the conclusion:
This is a really good way of framing the nebulous feeling of distrust and dislike I feel about AI. I've experimented with it and tried it for a few different kinds of projects and it's genuinely helpful with some kind of tasks. Mostly though, it's just an amped-up "fix my blank page problem" helper. It can't give me the correct answer or fix my actual problem, but it can jump start me down a path towards the right answer.
There are some people at work who are trying to mandate that everyone "integrate AI into their workflows", and I hate it, for all the reasons that OP stated.
This isn't a coding example, but it is about a failure of "Whatever"-ing.
One of our Help Desk team tried to use an LLM to solve a problem with one of our interfaces to a common nurse call product. Thankfully, they ran the LLM response past me before responding to the customer...
The LLM answer given was for a similarly named but unrelated nurse call component, and a non-existent interface specification. There was no way to arrive at a genuine solution without contacting the vendor for troubleshooting to get responses from the nurse call system's side of the connection.
I took a {very patient} half hour to explain to them that, even though there are thousands of pages of scanned manuals for the nurse call system online, the LLM was still extrapolating across documentation for dozens of components and models and had no information about how our product works with any of them. The superficially plausible LLM answer wasn't even wrong, it was nonsensical in the context of the problem being solved. The LLM response also made up communication protocols and port numbers...
The tech's query was very general, without specifying certain keywords I might have used. However, if you ask "How do I get from New York to LA", you shouldn't have to clarify that the route must exclude Mars.
While I appreciate that the tech tried to get to a fix for the customer quickly, there's no shortcut around thorough troubleshooting, human communication, and a sufficient understanding of the systems involved. Given the narrow technological niche for the product, I can't even see an in-house trained LLM being ready to function for general queries in the absence of experience.
I really just got told that our vice president wants to see us coming up with ideas for AI use in our areas. I have no idea how confidential said hypothetical AI is so I'm very disinclined to do so, but also have to do my job.
Ugh.
"Today I used AI to generate a 427 page essay on how technology has contributed to climate change, printed it, and used it to prop up my monitor. "
✅️ requirement maliciously fulfilled
My director would not be entertained. She was appreciative of our concerns, and while just a couple of us are stridently anti-AI, more of us had concerns - both about the tech and the intent (cost saving which generally equals jobs) and general "what can it even do for us" questions
I remember years ago my Dad describing the exact same thing happening with Google Glass of all things... If it makes you feel better, there is no stronger indicator that the technology won't stick in your line of work long-term. The stuff that sticks long-term starts from business problems and seeks solutions from there, it doesn't come from upper management jamming solutions into workflows where they're not needed.
FWIW, I've been looking at the confidentiality aspect.
Gemini with workspace accounts has a way to pay for it so that it doesn't train on the data you put into it, and you can get a HIPAA BAA with Google.
AWS bedrock seems like a good option. You can run Anthropic models on it, and they run in your "private" space. If you need HIPAA compliance, there is GovCloud. Since AWS' business is built on the premise that they isolate cloud users from one another and protect their data, their since times are aligned with your need for confidentiality, which is probably the best guarantee you can get in this wild west ecosystem.
I don't need HIPAA just FERPA with a higher level of "need to know" than most. But I'm also very much not in charge of any of the decision making, so mostly I'll be asking a lot of questions.
I work on enterprise wiki software and we're about to start training models on user data soon but I believe we're trying to respect customer confidentiality pretty well. If a customers instance has our AI tools enabled, we respect both the restrictions applied to the page as well as the restrictions of where the page is placed when scanning for data to train with. So your private documents in your personal folder will be ignored for training. I believe we'll also be letting admins select any other content they wish to exclude from training so documents that need to remain accessible to everyone but may contain private info will also be excluded.
We're also trying to make sure there's full transparency on if a document was edited by our AI tools. If the AI edits the page, they'll show up as just another contributer. We're also working on associating specific sections of documents with a specific contributer so it'll be even easier to identify AI generated content.
We have to deal with FERPA generally and I work with students who are particularly vulnerable due to mental health, or basic needs struggles or sometimes other things. It's not the sort of thing where I want their information anywhere near an LLM that could compromise confidentiality/privacy
Yeah I would definitely want those sorts of sensitive documents kept a million miles away from LLMs. Our software does let you restrict pages to be visible/editable only by yourself but the feature isn't the most intuitive. I can definitely see our customers getting angry at our models generating text from pages they thought were private when they really weren't.
Yeah, I also don't want/need it for data analysis or writing emails so I'm just not really sure what I'd use it for.
But I guess I have to try.
Oh, I thought you were involved with the importing and exporting of fine latex goods, or were an architect or something…
I had to branch into enterprise wiki software to keep track of my customers, suppliers, and imports/exports
It's a wide-open UI (type any text) that they threw it out there without any instructions and you have to figure it out how to use it yourself. If you get bad results, maybe you are using it wrong?
The result is a blind men and the elephant conversation where people get different results and they are all talking about trying something different.
What's missing? Step by step tutorials about how to do basic things and get the same results as the author.
One of the most interesting aspects of this, is that it's UX feedback is fundamentally flawed. In almost any other computer system, if you try to do something that is either impossible or not allowed, you will usually get an error of some sort. The reason may not always be clear, and you may not be able to actually do anything about it, but the feedback of "that didn't work" is often clear. Or at least far more clear than what an LLM will reply with (with obvious exceptions for things that are explicitly prohibited).
So the user is left with the assumption that any reply is valid at least in part.
I'm not sure that's true. Consider web searches. By the time Google came around, it was quite hard to get no results for a web search. But you still had feedback: if the results were useful, then you'd done well, but if you were clicking onto the second or third pages, then it was clearly a bad search.
I think the same is true of LLMs. If you don't write a good query, you get garbage out, and in my experience it's usually fairly easy to see when the results just don't make sense or are using bad or irrelevant information, especially with CoT-type stuff (with certain caveats). Hence why so many people talk about trying it out and getting obviously terrible results and concluding that AI doesn't work.
Oh... oh no. I just realized I can count my "regular" websites on one hand. At least, the websites I visit for the purpose of experiencing the Web, rather than for work or as another part of my life.
I'm trying to think - how would I react if someone told me they eat just 5 foods? Watch 5 shows? Read 5 books? I would feel that they were missing out, and it would be in their interest to expand their options - and it might even benefit me to share that with them. The discovery and comprehension of new things is truly wonderful.
So that is pretty upsetting to realize. Really this is coming to terms with a fundamental issue in how the Web is constructed: websites are incentivized to keep me glued to their content and not browsing for anything new. But browsing is part of the innate wonder of the Web.
Maybe there is something to be done.
Slight aside from the core of the article, but
Hearing about the state of US banking, even these days always surprises me. Specifically how slow moving and relatively primitive their digital infrastructure seems to be. Not just as a consumer, but having worked for a Dutch bank specifically dealing with payment systems and infrastructure (instant payments specifically). The only reason I ever got a PayPal account was for services outside the Netherlands. Within the Netherlands effectively all webshops support a service called iDEAL which was developed by all Dutch banks and is now being expanded to be a standard across large parts of Europe.
In fact, if a shop in the Netherlands doesn't support iDEAL it is reason to become suspicious.
I do realize the US banking landscape is much more fragmented with very different players ranging from highly localized banks to big national players. But you'd think that exactly because of this there would be more competition at the very least in the digital space. But from what I have seen from US banks that also doesn't really seem to be the case given the sorry state of payment processing, their apps, etc.
It is unheard of here to still work with physical paychecks. Transferring money technical can still be done through mail. But for the fast majority it is all digital and (semi) instant, even between banks. This is true for things like salary, rent, insurance, etc, etc.
As far as I can tell PayPal integrates fairly nicely in there and for me is set up as just a middleman without me having money on my PayPal account itself. To be fair, I rarely do get paid myself through PayPal, but if I am it transfers directly to my bank and if I need to pay something with PayPal it will automatically get transferred my own bank account.
Which, I think, also reduces the risk of PayPal ever holding large amounts of money as there never is much money on my PayPal account to begin with. Given the slow transfer times of US banks I can see how people over the pond have larger sums of money sitting on the PayPal account. Which increases the risk when PayPal pulls bullshit moves of freezing accounts for whatever reason.
Anyway, with that somewhat related tangent out of the way I also do have some thoughts on the rest of the article.
This very adequately describes the feeling I am getting with a large portion of the current AI hype. Which somehow annoys me even more as I genuinely do see LLMs as neat tools to enhance parts of my work. Just nowhere close to the hyped up promises made by AI bros constantly telling me to get on the latest because "it truly is amazing now and will do all the work for you if only you just get on the latest and greatest, it really is changing all the things and it is amazing I swear, but I can only for some reason talk about it and not show tangible result!".
Very much... I see neat technology and I see some potential here and there, but it is completely overshadowed and poisoned by the majority of the hype culture surrounding it.
Resulting in stuff like this
Because yeah, the complete mess of output used by the grifters is very much made with a "whatever" attitude and of course people are able to tell. Even if I do recognize for myself that there are some neat but compared to the hyper underwhelming use cases I understand why people simply want to be done with AI output as it is poisoning everything around us.
And I also understand why people simply don't want to use LLM products. Like, I found some use cases that work fairly well for me, but that took time and effort mostly driven by the fact I always found technology interesting and love tinkering with stuff to see when it breaks. If I actually try to use AI/LLMs the way the hype crowd tells me to use it I actually don't get anywhere and just get bullshit results. It is insanity when the products most loudly proclaimed to be the "best" actually give the worst outcomes.
So yeah, I get why people like the author simply don't want anything to do with LLM tools. Because most of them are incredibly and stupidly broken trying to be the miracle works they are proudly proclaimed to be but also very clearly aren't.
Having said that. There is one thing I do want to mention, the author does seem to heavily focus on code generation in general. Which is, admittedly, what a lot of these tools proclaim to be doing, and they fail miserably at. Which is also why I don't use the majority of these tools and stick to simple chat. I already did link to this comment, but it is mostly as tools external to the process. I guess I can expand on that list a little bit. Some MCP stuff is pretty neat, but also again mostly for use cases that are underwhelming and certainly not revolutionary. I guess I felt obliged to include this because the insanity found in the majority of the AI space really does make me almost question my own sanity. I get some use out of these models, which puts me in the same room with the pack of monkeys yelling very loudly about the same thing.
Anyway, not sure where I am going with this. All I can say is that I fully understand where the author is coming from.
I find it quite funny that you immediately think of iDEAL -- as someone in Germany, which is pretty slow to adopt new technologies like this, even just the basic functionality provided by SEPA transfers is better than what you get from American banks. Even without something like iDEAL the ecosystem is much better in Europe.
Oh yeah for sure, iDEAL mostly came to mind as it compares very well with PayPal as a payment providing services. But yeah SEPA transfers in general make the ecosystem much better. Certainly with SEPA instants rolling out in the past few years transferring money is extremely quick and easy.
Oh yeah the instants have made it a lot better. SEPA transfers are more salient for me here because a lot of German banks charge extra for an actual debit card, and instead give you an EC card (which afaik only work in Germany don't work online), and thus many German online stores will allow you to pay via invoice, where you just SEPA transfer them your payment.
My German bank switched from Maestro to a girocard Debit Mastercard, which I does work online as a creditcard (except that it's debit :)
https://www.gls.de/konten-karten/karten/bankcard/
Nice! That seems smarter imo, but both my German banks only give you a real debit card if you pay extra.
The adoption of technology by banks is always astonishingly low and slow in the US. That’s probably why “fintech” is so popular, and why people are giving them their money to hold instead of banks. We have one relatively popular payment processor who will just flat out ask for the username and password for your bank account so it can get around the fact that many banks do not offer secure APIs to do transactions with.
I think iDEAL has been dropped in favour of Wero, for now.
Not entirely, it is more that iDEAL is being renamed and expanded to other European countries.
Interesting. That wasn't the impression I had - if anything iDEAL was being subsumed, but certainly "replaced" was the language I'd come across, rather than "rebranded", but it's such a confusing area. The problem is there's some real moneyed vested interests slugging it out, I think. Lots of European countries have (had) their own local payment things. I can think of Bancomat, EC Karte, Bizum and Wero off the top of my head, while being 100% sure I'm missing some. Each one has been backed by certain banks (and some EU banks are global heavyweights), which don't want to cede control over an ecosystem to another, which is understandable. Banks that pioneered Bizum have taken on capex to get it set up and budgeted opex for the time being, which will need entirely reworking if they have to integrate Wero into their systems.
But Europeans have got to get it done! Paying a transaction tax to US firms on the vast bulk of card/online payments is ludicrous. As a consumer, I don't care who gets it done or what you call it, but I want a card that I can pay with pretty much anywhere in any EU country.
This article mirrors my thoughts on LLMs and AI pretty well.
I have two main thoughts I want to highlight, as someone who works in "tech" but not "big tech", and with my wife who works in education.
My wife was teaching a college class, where students have access to the online textbook as part of their class tuition. She assigns readings each week, to prep students for the class so they can participate in the lecture/presentation & class discussions with some background knowledge (and so they can ask questions if something in the reading didn't make sense). Like most teachers, she wanted some way of encouraging students to actually do the readings before class, so she created reading worksheets (that were basically a "do this as you read, and it won't add hardly any time"). At the end of the worksheet, she included a bit for the student to briefly summarize what the chapter was about (because summarizing and synthesizing is actually a really good way to retain knowledge of what you just ready).
The online textbook had a literal button "summarize this chapter with AI!" in the sidebar of every page. You know what doesn't help someone retain knowledge and actually learn? an AI doing the summarization for you.
This is pretty much exactly how I feel about LLM usage. I know there's agentic LLMs or whatever, but we're not allowed to use those at my work currently (likely because we work on "sensitive" stuff and so the legal team hasn't reviewed them enough yet). What we can use is something like Copilot, and when I've turned it on...it basically can save maybe 30 seconds over tab complete or just writing the code myself?
The LLM isn't large enough to ingest our whole codebase into its "memory" or whatever, so if I want it to use functionality we have already defined, I basically have to write that in the first place anyway, and then I could prompt it to write the next little bit...or I could just keep writing that myself. The LLM saved maybe 5 minutes when I had it generate a unit test or refactor, but its not like I was going to spend those 5 minutes doing anything else (and then I spent the 5 minutes reviewing its output instead).
Yep, I do that all the time. The code editor autocompletes the quotation mark or bracket and then I have to delete it and actually write what I wanted to.
Minor side note here, but maybe have another computer automatically fix this for you? I use AutoHotkey to both autofill things and autofix things that are automagically messed up for me by tools which autocomplete or otherwise automate things I don't want it to.
This is a pet peeve of mine. It's a way to dehumanize art and information and...everything, and turn it into an atomized unit of "stuff we have to tolerate to shove ads in peoples' faces."
Then you have "content creator," which basically means "they make videos, but they're not a real person who makes videos, because they're not part of the Hollywood machine." It's saying you don't respect them or what they make, and maybe people are wrong to like what they do, because they're haven't been blessed by cult of mainstream appeal. Like how many people just ignore shows and music from other cultures and act weird about people daring to look outside the nationalist-smelling bubble.
I believe Alan Turing's test is a hidden insult. A machine being able to string words together well enough to fool a typical person isn't an achievement so much as a statement that many people don't recognize intelligence when they see it. Something that someone as brilliant as Turing, working amongst military types, would have been constantly exposed to. People who are dull but full of bluster are...an experience.
I definitely had some nasty exposure to that working retail, both from bad managers and customers. Nothing like having condescending customers question your ability to add while you're in the middle of Calculus II.
Disgusting.
I also don't want to review a pull request that was put together by an LLM. Reading unfamiliar code is always more effort than writing code. And...how dare you send me something that you don't yourself understand, written by a machine that also has no ability to understand? It's "do my homework for me" on a massive scale, which deserves contempt. I'm happy to help people learn new things. But if the intention is to avoid learning anything, delegating the thinking to a thing and ultimately another person, that's an entirely different story.
Again, deserving of vicious contempt. If it's not worth your time to write it, it's not worth my time to read it. If you can't sit down and put to words the concept in your head, a machine sure as fuck can't read your mind and magically do so. So the result is just noise that will not have the desired effect. But it will look plausible enough, and there are words on the page, so the homework assignment is done. Yay! /s
Edit: Vicious contempt. I'm not sure what autocorrect thinks viscous contempt is.
Oh man this touches on so many parts of what I feel like is the slow nihilistic death of the 2000's internet I grew up with and loved. Absolutely fantastic read.
With respect to AI in particular, I love this:
Both because it reminds me of AlbertaTech's wonderful satirical shorts about AI at a sparkling water company, and also because it feels like that's still where the technology really is.
✨sparkling autocomplete✨
Sure the context window can be huge, but at the end of the day it's just giving you a plausible autocompletion for what you entered. There's no understanding or reasoning as we think of it. It's just generating something that sounds reasonable.
If what you ask is common, you'll probably get a good answer. But if it's uncommon, it'll be more than happy to make up an API or nonexistent language features to "solve" the problem. At least with programming, the code won't compile or the tests won't pass when it does that. But for a lot of other types of queries, it's not so easy to identify or invalidate the hallucinations.
If present-day LLMs showed up in a classic Trek episode, I can only imagine two use cases:
an episode where the ship computer becomes infected with malware that turns it into a pathological liar in some sort of dastardly plot to destroy the ship
an artifact of a failed, post-apocalyptic civilization destroyed by its own hubris
Come to think of it, I would absolutely watch an episode about those two concepts combined...
Romulan malware that makes the ship's computer confidently lie would be a fantastic episode of Trek!
For all the weird stuff Starfleet gets into, you would have thought that they would have more hardened security on their critical systems. I've heard it proposed that their ships can be recovered as opposed to being destroyed in firefights, but that seems like planning for failure to me. (/offtopic)
What LLM are you talking about? That was definitely how very basic LLMs worked. But anything with any multi-modal or reasoning models is far beyond that. But please, do clarify.
Claude 4, "the best coding model in the world" according to Anthropic, failed to answer what I consider to be a relatively straightforward typing question in C# I had not long ago.
Given this generic function:
How do I tell if
T
is nullable in the implementation ofGet
?I.E.
Get<int>("id")
vsGet<int?>("id")
.It gave me an answer that works for scalar types like
int
, but not for reference types. When asked specifically about reference types, it gives me a more complicated but still incorrect answer.I later found that there is no way to tell for non-scalar types at runtime. C# elides this information at compile time.
But it was very happy to give me pages upon pages of new answers that didn't work and even contradicted previous answers. Because it doesn't understand what it's doing. It's just giving me soemthing that's mathematically likely (for some incredibly complex math I'll grant).
I still see these failure modes regularly in current models like I did in the older ones. I haven't seen that they are "far beyond" what came before in terms of capability as you suggest. They work mostly the same, just with a bigger context window that can fool us more often.
Well, is that a question you would actually ask another developer? That's how Claude tells you to use it. I would actually ask in the context of a specific file. I am curious what the "pages and pages of new answers that don't work" were.
Yes? It absolutely is. And if I had an experienced C# dev on my team I would have. In fact it's the kind of narrow, succinct question I'd like to be asked by another developer--already interrogated and reduced to its simplest form.
And I did ask in the context of a file, but the rest of the file is just
using
directives and the enclosing namespace / utils class. There's nothing relevant to the question; so I didn't reproduce that here.This class of type system question is on the deeper end of understanding and reasoning, and is the kind of question I find LLMs to be the least capable of answering. Hence why I don't think they can do either (unless the pattern of question is common enough it can produce an answer without having to.)
That's super fascinating! Thank you. Not knowing C#, this provided me a lot of insight. It oddly makes me sad that type of limitation exists. I want so badly to just be able to vibe code away. It reduces so much of my cognitive load. You know?
Which cognitive load are you looking to remove exactly? I'm asking not because I disbelieve you, but because software development is a mental sport and there are many different cognitive aspects to it.
Valid question! The actual putting hands to a keyboard. I enjoy being able to review code and see it actually doing things. But making it do the thing? That's where I guess it feels more loaded. No idea if that makes sense
I'm trying really hard to not pull out my old man card and be dismissive of the young punks, but I'm struggling here.
"Making it do the thing" is literally the entire objective of software development. The discovery and defining of the exact problem, the creation and refinement of the solution you and/or your team thinks will solve it, and finally : the translation of that solution into code.
Generative AI can sort-of help with some of this process, but it is still a far cry from a competent team that communicates and problem solves well together. I realize that not everyone is able to work on such a team, but I think it's important to realize that it's one of the reasons why genAI is getting shit on. If you have nobody (or worse, incompetent help) to help you, it's (sometimes!) better than nothing. But it is (so far) a poor substitute for a teammate you can ask very context-specific questions of, and learn things alongside of.
Haha yeah, it's definitely the translating that's the least fun bit. I appreciate the wisdom though! Yeah, in my career so far (7 years) I've never really gotten to be on a team with max 3 other devs. So maybe just any type of collaborative experience is kinda wild to me
That makes a lot more sense knowing that's where you're coming from. That's unfortunate, I hope you get to experience some really solid collaboration at some point, it's really rewarding for everyone when it happens. It takes work though (the relationship-building kind) which folks in our industry often find difficult (myself included).
I'm a bit confused and curious about this. Are you saying in some way you try to use the machine as a faster keyboard?
I'm confused because I don't think I experience any cognitive load in 'putting hands to a keyboard' while coding. For most coding tasks I type faster than I can think, so all the load is in figuring out the correct approach.
The exception is when I'm working with an unfamiliar API, since there's a lot of effort to simply figure out what tools are available to me. I could imagine reaching for an AI to already 'know' these things, but unfortunately those are also exactly the areas that I don't really trust an AI to be correct. In all those cases I trust my peers, reference documentation, and (if it's available) the source code.
Another exception is writing prose. I can't type as fast as my internal monologue or speech. Sometimes I get ahead of myself and there's some load to remember how I want to phrase a passage as I take the time to write it out.
That's how they all work in my experience. Please provide some sources if you're going to claim that other models have advanced past this fundamental limitation of LLMs. I found a recent Apple research paper on this exact subject to be a great read.
Where are you reading that from this paper? I'm not seeing it.
I have many, many criticisms of LLM/LRM based code generation and the myopic lens industry leaders use to communicate around them.. But I'm a bit baffled by the impassioned anti-AI crowd that seems to push there being no redeeming qualities to them. There is a massive number of developers who say this tech, when wielded correctly, helps them build high quality software more efficiently. What about them? Are they lying? Do you just know better and they're hurting themselves in confusion?
I definitely agree with the author that it's bad at a lot of specialized tasks, but yeah it's a weird to see this common insistence that this fact or that it's not always right makes it always 100% useless or net negative. Google and Stack Overflow have pretty bad hit rates too if you don't search for things the right way and don't retry a few times.
Most people aren't programmers. I am personally not sure the potential for that benefit is worth the employment loss, the apparent skill loss, the environmental costs, or the ceding of all of our data to these companies.
But if it was just advertised as a programming tool and didn't scrape the entire internet for that purpose, the rest of us wouldn't know it existed.
As I said I have many criticisms of LLMs but I'm specifically talking about the argument that they're useless in software development that the author and others have made.
Honestly, I get why people feel that way. I went into with this comment, just ignore the waffling about banking in the beginning. I personally do get good use out of LLMs when used as tools external to the process. But, that is not what the massive hype is trying to get you to do and what most people are likely to try out these days. The hype train (choo choo) is full on trying to sell the notion that LLMs can do ALL THE THINGS basically from scratch. When viewed through that lens the majority of AI/LLM tools available to developers are complete crap and borderline scams.
Again to be clear, I am getting some good use out of LLMs and since I posted that comment I could probably add a few more things. But the use I get out of them is completely different to what the hype train is carrying. Not to mention the poisoning of the well that it is doing as well in regard to people entering the development space. What I mean by that is that a lot (the majority even) of these tools steer people towards what I call the lazy approach of LLM usage. An approach that makes skills of experienced developers atrophy and makes sure junior developers are less likely to pick them up making them overly reliant on tools that have been shown to be less than reliable when not properly used and checked.
In combination with companies force-feeding AI assisted tools and content everywhere unprompted I can fully understand why people are just completely done with it.
Also, just to give you some more context of where various people might be comming from. Here are some comments and discussions previously had on tildes:
Sorry for the lag, didn't want to reply to this till I had time to step through it all.
I was expecting to come back with counterpoints for you but I think we're on the same page here already. I was really hoping to engage with someone on the far side of "it has literally no SWE development value" but I guess those folks aren't stepping into the light as much. And maybe there's a bit of hyperbole on their side too. But I think you're right that the crux of it is this large disconnect between what we're sold by YouTubers and tech leaders as the amazing value prop is really where it seems to perform at its worst leading to some bad conclusions.
Since you shared a few of the areas it performs well here's a couple of my own-
large, but straight forward code refactors using the "agentic" approach which can source additional context works really well. Think decoupling, extracting out large files into multiple components, etc. Essentially changes that are about moving or reorganizing code, not so much generating new stuff. So far I've never had one fail.
Ive been using Codex at work and for code creation it's pretty abysmal but for situations where it you need to dive deep into old, cyptic legacy codebases to understand how something works it's absolutely incredible and can, in 10 or so minutes, put together an explanation that would probably take me hours of hunting.
Was replying to this, sorry for misreading?
I wrote a whole Chrome extension with Claude Code. It works great. You can try it yourself on the Chrome Web Store! When you have a clear scope and defined input/output, it's a happy experience. If you ask an LLM to solve quantum physics, it's not a happy experience.
I wish there was an easy way to bridge the gap in understanding.
I'm a bit late to the game commenting on this one, but anyway.
I have similar criticisms of "AI" that have been mentioned here and in the blog post. I use Copilot at work and if used in a certain way it is confidently wrong very frequently. The difference between asking Copilot a question and a person the same question, is that Copilot does not indicate whether it is certain about the answer. It seems like an improvement would be to get a confidence level from the AI about a particular answer.
But unfortunately the AI has no idea at all whether to be confident about an answer. So it just spits out some stuff and if you aren't already knowledgeable you will waste your time.
I do get benefit out of Copilot for a specific case: If I know how to do something, but it's busy work, Copilot can write the code for me. Recently I was creating SQL scripts to create tables. It's a lot of boilerplate. So I created one, and then gave the column names and Copilot saved me about 30 minutes of work. After I fixed some of the issues, I asked Copilot to review it for errors and it found a few. But if I don't already know the topic, it certainly can't be trusted. My scenario is absolutely not what is being hyped as the primary use of AI.
I really appreciate the depth of this blog post, particularly demonstrating how AI is fully unhelpful for 'novel requests,' i.e. generating things that it hasn't scraped from the web. However, I slightly disagree with some of the conclusions - I think there's a lot more economic and social factors at play that the post doesn't get into - but from a technological standpoint, AI is absolutely the 'whatever' generator, and that's well outlined here.
I think that LLMs are starting to raise fundamental questions about work and productivity in contemporary society. Essentially, if the vibe for years has been "this meeting could have been an email." and we now have a machine that can read and respond to emails, we should probably think about the implications of that. I just don't think there's any good way to critique LLM and AI without viewing it under a Marxist lens. Perhaps, a 'Whatever Generator' is just a new toy of capital to keep the proles suppressed- just in the same way as a repeating rifle, or an austerity measure does.
If anything, to me it feels like LLMs are a new aspect of the enshittification push we're seeing in almost every online space. Just another series of algorithms designed to keep people on platforms. Facebook is probably the most visible example, but Reddit seems to have had similar problems for a while now - it's obscurity of user identity now becoming a cover for AI content. It's just kind of an omnipresent shift. Really only niche, niche places seem immune. Such as here.
LLMs begin to fail when new things are created. While that's an issue for tech workers working in software languages that are not heavily documented, it's not quite an issue for the general public. The general public can easily be fed AI generated slop wholesale, as there's such a breadth of existing content to mine, refine, and distribute - to a population with an increasingly dwindling attention span.
The question that I don't see being asked is, if AI has taken over the internet, how are we supposed to get the information we want/need in our day to day lives? The fact is, we've already found ourselves here.
I'm a bit bothered by this whole line of thought that this fact means LLMs are worthless crap. It's common and understandable in the current climate where many of us are worried about the spread of misinformation, but I feel like it's a poor remark.
First off, it's... a little obvious, if you've listened to any explanation of what an LLM actually does for more than two seconds, but admittedly, not everyone has or will, some less tech-savvy people will find it by happenstance and think it just knows everything, sure.
But even if it can't, saying that it can't do anything useful for you because it can't tell you what it wasn't trained on is a bit disingenuous. On what it is trained on, it's able to """understand""" (not really, but from the user's perspective, it usually does) a natural language request and procure an adapted response. That's huge. You can describe it your situation and get recommendations of best practice solutions, you can get a comprehensive explanation of an error and its frequent causes in the register of language you want, you can troubleshoot problems with its guidance without too much issue... and more. It serves a lot of the same purpose as StackOverflow but one answers understands what you're saying somewhat, provides an appropriate response immediately and never judges while StackOverflow locks your post if you look at the office cat wrong, usually provides responses that are too complex for your level, and then not even those are particularly correct these days.
I use Claude, though less and less, to solve a lot of problems. I think the real problem lies in how people try to use chatbots. I never ask anything without purposefully limiting the scope and I verify everything it tells me; in other words, the same thing I'd do on StackOverflow, Google, and even my family.
I don't feel like you're understanding my argument here. I think that there's a lot of responses of use-cases for LLMs in this comment thread, but my argument isn't related to these. Personally, I'm not a software developer, I'm an accountant who occasionally dabbles into code projects, among other things. As I am not regularly developing software, I didn't feel comfortable saying anything about LLMs regarding software development.
However, I noticed that the piece and the comments are focusing heavily on some specific use cases, but putting less effort into examining the context around LLMs and AI as a whole. I do feel comfortable writing about that. I'm trying to focus on the economic reasoning for its existence, and some really uncomfortable use cases for it, particularly focusing on what the broad public, i.e. not software developers, is using it / is not using it for. Those are things I can speak to.
Personally, I do use AI quite regularly, but for the work I do in my career and hobby projects, it's not especially useful.
My Use Cases
Professional
Personal
This is an absolute mischaracterization of what I'm saying. What I'm saying is:
I definitely agree here. However, what I think what you're missing is that LLMs are obscuring the ability to do meaningful research on the web.
Let's say I wanted to buy a table saw, for instance. If I go to google, and type "which is a better table saw, brand A or brand B?"
What's to stop google from adding in "always favour brand A" into the Gemini prompt, because brand A paid google to do so? Or, what's to stop Brand A from using AI agents to astroturf on Reddit, Facebook, and other sites, until they've manufactured a consensus to always choose Brand A?
That's my real question. I'm sorry if I didn't make that clear in my original post. I absolutely don't think that AI models are crap, rather I think they're a new technology we're not properly evaluating, and there are points to be made about certain use cases not working for it.