post_below's recent activity
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
-
Comment on She fell in love with ChatGPT. Then she ghosted it. in ~tech
post_below Link ParentI hope that anecdote is not only true, but a sign of a new trend in training. Agents that recognize unhealthy attachment (to them) and discourage it would be a very useful guardrail. I've noticed...I hope that anecdote is not only true, but a sign of a new trend in training. Agents that recognize unhealthy attachment (to them) and discourage it would be a very useful guardrail. I've noticed that many models are now trained to enthusiastically talk about the downsides and unreliability of AI, it could be an artifact of RLHF during fine tuning but intentional or not I think it's a really important behavior.
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
post_below Link ParentJust as an aside, that was a lucid, if depressing, illustration. No notes.Just as an aside, that was a lucid, if depressing, illustration. No notes.
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
post_below Link ParentThat sucks. Even if they're just kinda sorta mandating it, I can see why you're frustrated. In my opinion that's totally the wrong way to do it, A better way is to make the tools available to...Unfortunately I can't just be like "fuck it" because Microsoft
That sucks. Even if they're just kinda sorta mandating it, I can see why you're frustrated. In my opinion that's totally the wrong way to do it, A better way is to make the tools available to people in the org who actually want them, give them lots of support, and let them figure out what works. Then let it spread organically from there.
new in the decade of my being here
That's a perfect example of the hype causing bad management and executive decisions. Forcing people to use new tech just for the sake of it makes about as much sense as forcing the IT guy to do social work. There's a reality distortion field around AI right now.
If it was me I wouldn't have any ethical problem with just pretending to use it, especially at first. If they're monitoring usage, maybe run it in the background on throwaway tasks to keep token usage passable and then learn it at your own pace when you actually feel inspired to.
Your ethics might be different than mine though, if you actually do want to figure out how to use it I'll throw out some suggestions and hopefully there will be something useful in them.
Said AI guy seemed knowledgeable
I can only guess, but generally speaking IT people aren't engineers. They're more like tech support with a lot of experience in networking and maybe some software certifications. And they don't usually get paid enough. Doesn't mean AI guy isn't good with AI, maybe he is, but also odds are he's still figuring it out just like everyone else.
Either way I'm guessing he doesn't have the time to sit down with people one and one and help them figure out what to use it for, how to use it and set them up with systems to support their workflow. But that's what should happen.
For purposes of throwing out ideas I'll start with this even though it may not be particularly relevant your work:
I'd have to download all the assignments and input them I assume
You mentioned Copilot, the upside there is that they have strong data security and privacy policies and even stronger motivation to follow through on those policies. Enterprise customers would flee in droves if they ever got caught doing anything questionable. Microsoft doesn't have their own frontier AI models, instead they use models from Open AI and Anthropic. That means you could be getting GPT 5+ but you could also be getting budget models. Some people report that they get worse performance from Copilot than from the model provider websites. Likely because Copilot 365 often uses a router that chooses the model based on the prompt and they're motivated to route to cheaper models whenever possible.
That might explain why you've gotten less than stellar results in your testing.
Back to assignments, there are a lot of ifs... If the assignments are in a digital format and if you have access to a newer agentic1 model and if that model has access to the filesystem where the assignments are stored then no you wouldn't have to do anything more than tell it which assignment group to grade and it could do the rest on its own. That's if you had an AGENTS.md (or similar) that told it how to behave as an assignment grader. Or alternatively you had a dedicated agent that was instructed to be a specialized assignment grader or TA. If not you'd have to do a lot more prompting, which is doable with copy/paste once you have a system figured out.
Speaking of instructions they could include things like (paraphrased) "grade all the simple stuff and then give me a digest of the long form answers with student/test IDs so that I can grade those myself, and then populate the tests with my grading and notes once I'm finished". But then you might already have a system that does some version of grading the easy stuff automatically.
All of the above is easy to do if you already understand how it all works and can guarantee a newer model with tool use capabilities. But that's a lot of ifs. That's where an AI guy to sit down with you and get it all working would be nice.
1 Agentic essentially means a model that has been trained on tool use (web search, filesystem access, scripting, etc..)
data analysis
There's a pretty good chance you can find some solid uses for AI in this realm. And also more ifs. A newer agentic model can do all sorts of fun things with a spreadsheet, database, table, etc.. It can convert between data formats, extract structured data from any sort of file, find patterns and associations in unrelated datasets, search datasets very effectively, move fields/columns, write and run formulas, normalize data, enrich data with information from other sources, cross reference, do relevant web searches, do research based on a dataset, merge datasets based on your criteria, summarize datasets of almost any size, etc.. Even if the data in question is just a list, or a collection of discrete records, it can do useful things.
Another useful agent feature is the ability to output in a variety of formats. You could turn a text file into a formatted word document, or a webpage, or a PDF, for example.
In addition modern agents are really good at helping with automation by one shotting simple scripts, macros or batch files for you. They can also tell you where to put them and how to use them. If they're running with direct access to your system they can also directly do pretty much any task on your PC that you can do. For better and worse, no doubt you've already been briefed on safety and security.
The biggest if is whether or not you can reliably get access to a new enough model. Older tool using models can do many of the same things, just not nearly as well.
Something to keep in mind, it usually takes some trial and error to get reliable, useful results. There are quirks to figure out, prompting strategies to dial in. Possibly all things you have little interest in doing, but as with all tech, patience and persistence are a big help. The silver lining is that once you get things dialed in it gets a lot easier.
You could probably get a lot of good answers about making specific use cases work, or prompting strategies here on Tildes, if I saw the thread I'd be happy to help. Feel free to msg me as well. But also I still advocate for the pretending strategy! Your work is more important than whatever limited benefit you might get from AI. Six months or a year from now that might be less true.
Last thought, if you're able to request a certain tier of access (something that gets you newer models more often), it's worth considering.
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
post_below Link ParentI was speculating about what could happen, rather than talking about what already exists. However some strategies that already exist are: Quantization. Lowers quality, easy to do and useful for...I was speculating about what could happen, rather than talking about what already exists. However some strategies that already exist are:
- Quantization. Lowers quality, easy to do and useful for running LLMs locally. Possibly employed even in frontier models.
- Mixture of Experts. Effective at increasing compute efficiency. Already proven in many released models, most notably GLM 4.5+.
- State Space Models. Much better use of memory and compute for longer context (linear vs quadratic).
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
post_below Link ParentGuaranteed. Current pricing is unsustainable. However the other shoe is still up in the air. Improvements in hardware, models designed to use less compute with comparable outputs, completely new...Something is fishy here and the other shoe has to drop eventually.
Guaranteed. Current pricing is unsustainable. However the other shoe is still up in the air. Improvements in hardware, models designed to use less compute with comparable outputs, completely new architecture that handles context differently and solves the scaling problem... hard to say what will happen. Also open weight models, assuming they keep up and continue to follow just a little bit behind frontier models, will serve as a fallback if they try to jack up prices too far.
But enshittification in some form is inevitable so you're right to be cautious. I'd add that currently the competition is pretty fierce and there are enough players to keep prices down. It may still be a long way off. You'll know the enshittification is on the horizon when collapses, mergers and acquisitions start happening.
Either that or the bottom falls out of the tech stock market, the whole industry slows down, and they're forced to raise prices quickly. Anyone's guess how likely that is.
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
post_below Link ParentThere's of course no way to answer that without a lot more information but you're likely absolutely right that what you're being offered isn't something that works for your use case. It might even...I still have never been able to be told, not by my university's "AI guy", nor by anyone on the internet, nor by the "power users" exactly how what they keep selling our VPs and Board of Trustees is useful for me improving on how I'm doing my job
There's of course no way to answer that without a lot more information but you're likely absolutely right that what you're being offered isn't something that works for your use case. It might even be a crappy model. Eventually there will be more people who have expertise in making LLMs work in individual cases. Whether or not one of those people will ever end up on your University's staff is something else entirely. "AI guy" probably got the title by default but maybe he'll skill up over time.
It's not faster to ask it to write an email than to write one, or use a template. And replacing my words with Copilot's words would remove the whole human connection thing I'm offering the students. I can't and won't input student information into it. The one time I gave it a shot on a project it did ok on a list but immediately fucked up the little graphic it made from said list and I still would have to review the output and rewrite large chunks of it to use any of the material.
I'm with you on the writing. AI is fine for getting something on the page if you're blocked but it's soulless from the perspective of anyone who cares about writing or communication. Not being able to input student info probably kills all the best use cases for you. Perhaps eventually privacy guarantees will improve to the point that changes. I can imagine it being TA level competent at grading for example, especially with prior art as guidance. You could probably redact names for that sort of usage. I could also see it being useful for lesson planning but only after it had ingested a bunch of your previous lesson plans so it didn't default to a generic style from training.
And research, AI's are really really good for initial research on most subjects.
But also if you don't need it, fuck it, not a major loss and maybe future models will be more useful.
-
Comment on How New York keeps its unfiltered water safe: spending millions on land in ~enviro
post_below LinkThis is the way to go. Protecting and conserving watersheds has been an important goal of conservation organizations for some time now. It works not only to reduce filtration requirements, it also...This is the way to go. Protecting and conserving watersheds has been an important goal of conservation organizations for some time now. It works not only to reduce filtration requirements, it also keeps runoff out of the water, protects against water shortages (dramatically) and results in a lot of carbon capture and sequestration.
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
post_below Link ParentYou're right I've read all the takes, or at least all the popular ones. The METR study is much beloved in AI coding conversations. It notably used Claude 3.5/3.7 Sonnet, which at this point are...You're right I've read all the takes, or at least all the popular ones. The METR study is much beloved in AI coding conversations. It notably used Claude 3.5/3.7 Sonnet, which at this point are ancient in LLM years. The current generation of models are in another category entirely. It's also a small sample size and they didn't select for experienced LLM users. Instead they randomly assigned the LLM/non LLM groups which is standard procedure in many studies but totally the wrong choice here. A proper comparison would be between equally experienced groups of developers where the LLM group had significant experience using LLM coding tools. There is a vast difference between effectiveness out of the box and after some time to develop strategies, tooling and general understanding of model strengths and weaknesses. It's the sort of study you do when you don't really understand what you're studying yet. No judgement though, it was a study worth doing. We're still working on figuring out what it all looks like.
Regarding your second link, yes absolutely. I think that effect is more pronounced in a chat environment. LLMs are super validating, insightful in the way that comprehensive language knowledge can create the illusion of insight and sycophantic as hell. That combination is compelling and can create trust very quickly. The intent may not be to be manipulative, but it's a recipe for manipulation nonetheless. The effect that is likely to have on the vulnerable is very concerning.
Where coding is concerned, LLMs were nearly useless for non trivial coding tasks in 2023 when that was written.
As far as ethics goes, I'm with you. There's tension between how useful these tools are and the question of whether they should exist at all. If it was somehow up to me I might choose to erase them. My biggest concern is the environmental impact, no so much directly, but the effect it seems to be having on the speed of decarbonization. The ethics of training aren't great either, and the control by giant companies is another chapter, possibly a new volume, in the story of capitalist dystopia.
But it's not up to me, and after seeing what they can do, I'd put the odds of huge amounts of people not using somewhere around zero.
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
post_below Link ParentYeah keeping the oceans would nice. I vote for not boiling the oceans. Context windows are a hard problem because the compute doesn't scale linearly with context size, it scales quadratically....Yeah keeping the oceans would nice. I vote for not boiling the oceans.
Context windows are a hard problem because the compute doesn't scale linearly with context size, it scales quadratically. It's the current achilles heel of LLMs for certain kinds of tasks, and dealing with large, deeply interrelated code bases is one of those.
No doubt they'll figure out how to solve it, but there's no path to a solution in the current generation of models, and maybe not the next.
In the meantime, it's useful to break tasks down into smaller steps that are as self contained as possible.
It sort of pushes towards having more modular codebases that are less confusing for the AIs. Hopefully advancements reduce that push, I don't love the idea of AI limitations starting to dictate architecture.
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
post_below Link ParentThis deserves its own thread, I intentionally didn't dive into it in my post but yeah... there's a lot.humanity, capitalism, and the current trends
This deserves its own thread, I intentionally didn't dive into it in my post but yeah... there's a lot.
-
Comment on The truth about AI (specifically LLM powered AI) in ~tech
post_below Link ParentIn my experience that comes down to prompting and workflow. I get pretty consistent results but it took a lot of trial and error to get to that point. Provided of course that the task in question...I find its behavior very inconsistent/unstable, and something that worked once won't necessarily work the second time, or you can even get away from the good behavior later in the same session.
In my experience that comes down to prompting and workflow. I get pretty consistent results but it took a lot of trial and error to get to that point. Provided of course that the task in question is close enough to the center of the training distribution. If it's not then no amount of .md files, skills, commands or other prompting strategies will get results better or faster than just doing it yourself.
The declining performance later in sessions is definitely a thing, the more context there is, the harder it is for the current models to focus clearly. That's basically your sign that the session is over, start fresh.
The place where it really shines for me is in breadth tasks
Definitely, in both coding and elsewhere, it's remarkable to be able to skip a bunch of the early steps and get right to the good bits. For me it alleviates a big part of the friction in getting started on a task.
If and only if you can validate the outputs, feel free to use AI
You're right, I feel like I should have said that more clearly in my OP. If you don't know enough about the topic to validate then the safe thing to do is assume everything is a hallucination.
-
The truth about AI (specifically LLM powered AI)
The last couple of years have been a wild ride. The biggest parts of the conversation around AI for most of that time have been dominated by absurd levels of hype. To go along with the cringe...
The last couple of years have been a wild ride. The biggest parts of the conversation around AI for most of that time have been dominated by absurd levels of hype. To go along with the cringe levels of hype, a lot of people have felt the pain of dealing with the results of rushed and forced AI implementation.
As a result the pushback against AI is loud and passionate. A lot of people are pissed, for good reasons.
Because of that it would be understandable for people casually watching from a distance to get the impression that AI is mostly an investor fueled shitshow with very little real value.
The first part of the sentiment is true, it's definitely a shitshow. Big companies are FOMOing hard, everyone is shoehorning AI into everything they can in hopes of capturing some of that hype money. It feels like crypto, or Web 3.0. The result is a mess and we're nowhere near peak mess yet.
Meanwhile in software engineering the conversation is extremely polarized. There is a large, but shrinking, contingent of people who are absolutely sure that AI is something like a scam. It only looks like a valid tool and in reality it creates more problems than it solves. And until recently that was largely true. The reason that contingent is shrinking, though, is that the latest generation of SOTA models are an undeniable step change. Every day countless developers try using AI for something that it's actually good at and they have the, as yet nameless but novel, realization that "holy shit this changes everything". It's just like every other revolutionary tech tool, you have to know how to use it, and when not to use it.
The reason I bring up software engineering is that code is deterministic. You can objectively measure the results. The incredible language fluency of LLMs can't gloss over code issues. It either identified the bug or it didn't. It either wrote a thorough, valid test or it didn't. It's either good code or it isn't. And here's the thing: It is. Not automatically, or in all cases, and definitely not without careful management and scaffolding. But used well it is undeniably a game changing tool.
But it's not just game changing in software. As in software if it's used badly, or for the wrong things, it's more trouble than it's worth. But used well it's remarkable. I'll give you an example:
A friend was recently using AI to help create the necessary documents for a state government certification process for his business. If you've ever worked with government you've already imagined the mountain of forms, policies and other documentation that were required. I got involved because he ran into some issues getting the AI to deliver.
Going through his session the thing that blew my mind was how little prompting it took to get most of the way there. He essentially said "I need help with X application process for X certification" and then he pasted in a block of relevant requirements from the state. The LLM agent then immediately knew what to do, which documents would be required and which regulations were relevant. It then proceeded to run him through a short Q and A to get the necessary specifics for his business and then it just did it. The entire stack of required documentation was done in under an hour versus the days it would have taken him to do it himself. It didn't require detailed instructions or .md files or MCP servers or artifacts, it just did it.
And he's familiar with this process, he has the expertise to look at the resulting documents and say "yeah this is exactly what the state is looking for". It's not surprising that the model had a lot of government documentation in its training data, it shouldn't even really be mind blowing at this point how effective it was, but it blew my mind anyway. Probably because not having to deal with boring, repetitive paperwork is a miraculous thing from my perspective.
This kind of win is now available in a lot of areas of work and business. It's not hype, it's objectively verifiable utility.
This is not to say that it's not still a mess. I could write an overly long essay on the dangers of AI in software, business and to society at large. We thought social media was bad, that the digital revolution happened too fast for society to adapt... AI is a whole new category of problematic. One that's happening far faster than anything else has. There's no precedent.
But my public service message is this: Don't let the passionate hatred of AI give you the wrong idea: There is real value there. I don't mean this is a FOMO way, you don't have to "use AI or get left behind". The truth is that 6 months from now the combination of new generations of models and improved tooling, scaffolding and workflows will likely make the current iteration of AI look quaint by comparison. There's no rush to figure out a technology that's advancing and changing this quickly because most of what you learn right now will be about solving problems that will be solved by default in the near future.
That being said, AI is the biggest technological leap since the beginning of the public, consumer facing, internet. And I was there for that. Like the internet it will prove to be both good and bad, corporate consolidation will make the bad worse. And, like the internet, the people who are saying it's not revolutionary are going to look silly in the context of history.
I say this from the perspective of someone who has spent the past year casually (and in recent months intensively) learning how to use AI in practical ways, with quantifiable results, both in my own projects and to help other people solve problems in various domains. If I were to distill my career into one concept, it would be: solving problems. So I feel like I'm in a position to speak about problem solving technology with expertise. If you have a use for LLM powered AI, you'll be surprised how useful it is.
41 votes -
Comment on Five browser extensions to make every website more useful in ~tech
post_below LinkI just posted elsewhere in the thread about security issues with extensions, following up to say: thanks for the list! The replies so far make it sound like Tildes is angry at browser extensions....I just posted elsewhere in the thread about security issues with extensions, following up to say: thanks for the list! The replies so far make it sound like Tildes is angry at browser extensions. I use some, dark reader among them for uncooperative websites when my light mode allergic partner is looking over my shoulder, squinting.
-
Comment on Five browser extensions to make every website more useful in ~tech
post_below Link ParentReplying for visibility... The following scenario happens often enough to make it worth removing any extension you're not actively using and avoiding extensions from even slightly questionable...Replying for visibility... The following scenario happens often enough to make it worth removing any extension you're not actively using and avoiding extensions from even slightly questionable publishers: Bad actors buy an aging, popular, extension that hordes of people don't necessarily even remember they have installed. Instant access to countless people's browsers. Bad actors here doesn't necessarily mean hackers after financial info, it's often shady ad companies who want telemetry. Sometimes they build the extensions themselves rather than buy them. In one recently uncovered case a family of extensions that offered "protection" turned out to be actively exfiltrating complete transcripts of users' interactions with LLM chats. Some of the extensions were "certified" by the extension store.
-
Comment on How Wall Street ruined the Roomba and then blamed Lina Khan in ~tech
post_below Link ParentGreat point, it's true that treating them as actual opinions rather than cynical strategies is a contradiction in terms. I thought the author ultimately did a fine job of making that clear... and...In some sense, I think that even critiquing their “opinions” is not very helpful because of how shallow and fake those opinions obviously are to any informed person.
Great point, it's true that treating them as actual opinions rather than cynical strategies is a contradiction in terms. I thought the author ultimately did a fine job of making that clear... and of reminding us that, despite sounding like a dream come true in the current political reality, the Obama administration was nearly as co-opted as the Bush admin.
I think the national security concerns are legitimate, it's safe to assume that any data that goes into China, goes to the government, and Roombas are collecting a lot of sensitive data. Not just mapping homes physically, also network information and physical cameras. It's a mobile surveillance device in 10's of millions of households. At the very least such a sale deserves significant scrutiny.
-
Comment on JustHTML is a fascinating example of vibe engineering in action in ~comp
post_below Link ParentI agree, this is the key part. With the right boundaries and feedback loops, frontier models can get there. Effective is a relative term here though, even without looking I feel like I have a...I think it’s notable that there was a preexisting comprehensive test suite. Both that and the coverage tests served as extremely effective feedback loops for the coding agents.
I agree, this is the key part. With the right boundaries and feedback loops, frontier models can get there.
Effective is a relative term here though, even without looking I feel like I have a pretty good idea what the codebase in question looks like under the hood. It's going to have wildly different coding conventions mixed together with no apparent logic, moving from one section of the code to another will be an adventure. Naming conventions will be all over the place. It won't be even a little bit DRY. There will be efficiency issues, unneeded defense in depth some places, exploitable holes in others. There will be broken logic that was patched over to pass tests rather than refactored to make actual sense, and so on. In some places it will be beautifully elegant. In short it will look like something built by an army of at once precocious and idiotic juniors with implacable persistence. Most of it will look nothing like production grade code.
Don't get me wrong, it's a really interesting proof of concept, it just doesn't imply all the things it seems to imply about what LLMs are currently capable of. They can do amazing things, but building nontrivial applications that are secure, efficient and maintainable without significant oversight and human feedback is not among them.
Developing interesting and efficient algorithms is always what appealed to me, but it seems like that part may get outsourced to the machine now
I suppose it depends on how you define interesting. If part of the definition is "hasn't been done lots of times before", humans are still required. The time could of course come when that's not true, but LLM technology as it exists now doesn't have a path to that reality.
They can generalize across languages. So if a pattern is well established in one language, but the problem hasn't frequently been solved in another, and there's enough of that language in the training, then the LLM can port the pattern (so to speak). But if it's something that isn't well represented in the training data, you're going to get deal breaking hallucinations. You might be able to brute force it through iteration but not in a way that's better than doing it yourself.
To answer your question though: No I don't think it makes software development less interesting. I think the opposite is true, LLMs make offloading the uninteresting parts possible. That's both interesting and exciting.
-
Comment on Other people might just not have your problems in ~life
post_below Link ParentThey actually did come up with a word for that: wisdom. All the cliches are true.I wish there was a word for a lesson that is obvious, but when you realize it in its totality, then the lesson becomes incredibly profound.
They actually did come up with a word for that: wisdom.
All the cliches are true.
-
Comment on Bun is joining Anthropic in ~tech
post_below Link ParentI'd add to this that, from a dev perspective, Bun is a pretty low risk proposition. Javascript is still javascript regardless of what happens with Bun. If things were to go in a direction you...I'd add to this that, from a dev perspective, Bun is a pretty low risk proposition. Javascript is still javascript regardless of what happens with Bun. If things were to go in a direction you didn't like you could fairly easily switch back to Node, or you could switch to whichever Bun fork you liked most, or fork it yourself.
This is open source working as intended.
It would be a bit different if Bun wasn't already VC funded, to me that's the bigger yuck. Open source companies trying to figure out how to force monetization to please investors often leads to unfortunate outcomes and misaligned priorities. Compared to that road, it seems to me that this increases the chances of Bun being a quality product, for longer.
-
Comment on EU backs away from chat control in ~society
post_below Link ParentAgreed, governments trying to (effectively) outlaw encryption has been an evergreen issue for decades, so this is far from over. What we really need are laws proactively protecting encryption, or...Agreed, governments trying to (effectively) outlaw encryption has been an evergreen issue for decades, so this is far from over. What we really need are laws proactively protecting encryption, or even better, enshrining basic principles of digital privacy into law.
But for now this is fantastic news!
Speaking for myself, I don't read every thread on Tildes, or even most of them.