Diff's recent activity

  1. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    This is very true, it's worth noting that each subsequent ask was after dismissing the old thread and starting a new. Within each "conversation" it was entirely consistent message to message, only...

    This is very true, it's worth noting that each subsequent ask was after dismissing the old thread and starting a new. Within each "conversation" it was entirely consistent message to message, only digging itself deeper in that round's particular hole. Makes it all the more odd that it was so difficult to get any thread to just do a search.

    This is a huge reason why my favorite AI-assisted IDE is Zed, it allows you to rewrite the context window and remove mistakes and refinements that will just send it off the rails faster. Things work smoother for longer when the model's only aware of a small scope of only correctness.

  2. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    Heartily. It's one of my pet peeves that people treat the new bullshit like the old bullshit when the old bullshit has at the very least a homeopathic amount of human/reality connection. New...

    Heartily. It's one of my pet peeves that people treat the new bullshit like the old bullshit when the old bullshit has at the very least a homeopathic amount of human/reality connection. New bullshit is completed untethered.

  3. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    Even if it's badly written fluff, it tells you what the writer thinks their strengths are and what is worth highlighting about them. Even if it's extrapolated heavily or cut with bullshit, it...

    Well it would have been badly written, but in the case of a catering company it likely would have told you what it is they can cook.

    Even if it's badly written fluff, it tells you what the writer thinks their strengths are and what is worth highlighting about them. Even if it's extrapolated heavily or cut with bullshit, it reflects something about someone who has actually had some experience or contact with this thing.

    2 votes
  4. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    I've tried many different strategies for this, all too often they just have unintended consequences, and it doesn't take well to instructions being scoped to just certain situations. Have you...

    I've tried many different strategies for this, all too often they just have unintended consequences, and it doesn't take well to instructions being scoped to just certain situations. Have you found an approach that doesn't leave it feeling like my instructions are being interpreted by a subtly vindictive genie in a bottle?

    1 vote
  5. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

  6. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    LLMs are still getting tripped up by "How many letter Xs are in Y." The one that's been back in headlines recently is whatever model powers Google's search overviews and word definitions. The...

    LLMs are still getting tripped up by "How many letter Xs are in Y." The one that's been back in headlines recently is whatever model powers Google's search overviews and word definitions. The nondeterministic outputs also fuzz things. You only need to be advised to glue your cheese to your pizza once for it to be a problem, even if it only shows up in the output 0.1% of the time or only happens when the context grows too long or only after being asked about recursive thought entities.

    2 votes
  7. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    Or if it refuses to actually do a web search. Once while I was driving, I heard a song I liked. Apparently asking Gemini to "Add this song to my XYZ playlist" or even "What is this song?" is too...

    Or if it refuses to actually do a web search. Once while I was driving, I heard a song I liked. Apparently asking Gemini to "Add this song to my XYZ playlist" or even "What is this song?" is too advanced (even though some of these used to work just fine with pre-LLM assistants), so I tried "Do a web search, find what song has lyrics XYZ."

    It gave me an answer, but it smelled off to me. Ask again, and it gave a completely different answer, rinse and repeat. I asked it to confirm it did a web search, it was happy to confirm that. It hallucinated pages of the results, gave more detailed information for a song from an album that did not even exist. No amount of persuasion or coercion could actually get it to just do a web search. Later I did a search myself and the correct answer was the first result.

    6 votes
  8. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    It was 3.7 Sonnet's watch when the term "vibe coding" was coined, but that's not the hill I'm trying to die on. Things have undeniably improved since then. Incremental improvements eventually add...

    It was 3.7 Sonnet's watch when the term "vibe coding" was coined, but that's not the hill I'm trying to die on. Things have undeniably improved since then. Incremental improvements eventually add up.

    The point I was making is that there have been no significant generational leaps with any release. I find Mythos unlikely to have broken that pattern and to have something special in its model that can't be replicated with other modern models given the same harness.

  9. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    I wouldn't consider it to be, but I also wouldn't share your assessment of 4.6 or 4.5, outside of the context window buff. The extra context is nice to have but models struggle to actually utilize...

    I wouldn't consider it to be, but I also wouldn't share your assessment of 4.6 or 4.5, outside of the context window buff. The extra context is nice to have but models struggle to actually utilize a full window.

    1 vote
  10. Comment on Updates to store tags: additions, removals, and edits in ~games

    Diff
    Link Parent
    Off the top of my head I can only think of one that'd be suitable, Never Alone which features a lot of narrative video segments and interviews from alaska natives.

    ...I am genuinely, sincerely curious what games could be tagged "Documentary". That's not really a genre that's suitable for games. Is there any record of games previously tagged with that??

    Off the top of my head I can only think of one that'd be suitable, Never Alone which features a lot of narrative video segments and interviews from alaska natives.

    4 votes
  11. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    I've been hearing that exact statement for multiple years now. New models are incremental improvements over old. There have been no generational, game-changing releases that have unquestionably...

    The way I see it, it’s only been a few months since models became capable of producing high quality code.

    I've been hearing that exact statement for multiple years now. New models are incremental improvements over old. There have been no generational, game-changing releases that have unquestionably dominated. It's difficult to assign numbers to this since benchmarks are both heavily gamed and inadequate.

    What models were we working with a few months ago? Gemini 3 Pro? Claude 4.6? Whatever ChatGPT is doing? My usage of them hasn't significantly changed in the new minor releases since. I feel like I could use 3.1/3.0 and 4.7/4.6/4.5 practically interchangeably. I know many feel 4.7 is a downgrade. ChatGPT's goblin babbling has only gotten worse. What differences are you seeing in your usage?

  12. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    It's the tried and true strategy in many areas. CPUs and GPUs as well are being flooded with hundreds of extra watts of power to pull a little extra performance out of a particular architecture's...

    It's the tried and true strategy in many areas. CPUs and GPUs as well are being flooded with hundreds of extra watts of power to pull a little extra performance out of a particular architecture's power/efficiency curve. To the point that many high end parts are becoming fire hazards.

    2 votes
  13. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    I suspect you could reproduce these results with even bigger swarms of agents and at a fraction of the cost with a different model. We haven't seen a model with a substantial, hard intelligence or...

    I suspect you could reproduce these results with even bigger swarms of agents and at a fraction of the cost with a different model.

    We haven't seen a model with a substantial, hard intelligence or skill improvement in quite a while. This is similar to what we see with Claude Code as well. The secret sauce isn't in the models, it's in how they're being driven.

    1 vote
  14. Comment on Gemini 3.2 Flash rumored to hit 92% of GPT-5.5 performance at lower cost in ~tech

    Diff
    Link Parent
    I haven't noticed that trend myself. More and more these releases feel like more and more marginal improvements. Hard to be able to verify it either way factually when the benchmarks are both...

    I haven't noticed that trend myself. More and more these releases feel like more and more marginal improvements. Hard to be able to verify it either way factually when the benchmarks are both apparently heavily gamed and often fail to properly evaluate tasks even when played straight.

    Subjectively, the newest models still struggle on my personal benchmark tasks. And I feel less and less the need to stay on top of the latest developments to get the most out of things. The only big headliners I see anymore are efficiency. Which is fantastic, but they don't come with corresponding performance leaps. That all smells logarithmic to me.

    8 votes
  15. Comment on The boy that cried Mythos in ~comp

    Diff
    Link Parent
    Based on what? Their own very-limited experience with Mythos so far? Or the misleading numbers and marketing that Anthropic put out? Nobody who was willing to commit to that publicly. The only one...
    • Exemplary

    Companies are taking the Mythos security threat seriously.

    Based on what? Their own very-limited experience with Mythos so far? Or the misleading numbers and marketing that Anthropic put out?

    Companies are finding and fixing security bugs.

    Nobody who was willing to commit to that publicly. The only one who somewhat has, Mozilla, is somewhat dispelled in this article as the numbers were inflated and none represented actionable real-world exploits as claimed.

    Could they have found most of the bugs using a cheaper existing model? Who cares?

    We should all care about being blatantly, openly lied to.

    15 votes
  16. Comment on The boy that cried Mythos in ~comp

    Diff
    Link Parent
    If Mythos could do what's claimed, why do none of Anthropic's numbers demonstrate that capability? Why did they need to lie so plainly? If bugs are being found through AI-enabled means, why isn't...

    If Mythos could do what's claimed, why do none of Anthropic's numbers demonstrate that capability? Why did they need to lie so plainly? If bugs are being found through AI-enabled means, why isn't Anthropic presenting that data instead?

    Anthropic's false and distorted numbers are worth discussing whether people are just being haters or not. Much of their claims and the expectations of security experts are built solely on those numbers which are horrifically misleading at best.

    13 votes
  17. Comment on The boy that cried Mythos in ~comp

    Diff
    Link Parent
    Selling access prevented none of that, and GPT-2 wasn't the inflection point for that. Even in 2026, current spam and propaganda on the internet still very often gets along just fine with non-AI...

    Selling access prevented none of that, and GPT-2 wasn't the inflection point for that. Even in 2026, current spam and propaganda on the internet still very often gets along just fine with non-AI bots with standard templates and character substitutions vs human-run social accounts spewing set talking points, occasionally with an AI-generated image or comic for extra punch. The viral Facebook BS, SEO spam sites targeting every niche, and LinkedIn post economies have been revolutionized, though.

    5 votes
  18. Comment on The boy that cried Mythos in ~comp

    Diff
    Link
    This is a citation-heavy teardown of basically every claim Anthropic made about Mythos. The key takeaway for me was that Mythos is not any sort of generational improvement. The numbers have been...

    This is a citation-heavy teardown of basically every claim Anthropic made about Mythos. The key takeaway for me was that Mythos is not any sort of generational improvement. The numbers have been heavily fudged and their methodology obfuscated to cover the fact that even Sonnet models can go toe-to-toe with it when you aren't counting single issues multiple times, with those single issues being in highly contrived unrealistic environments (again) contrary to what was claimed.

    It probably isn't surprising, but since 2019's GPT-2 the "too dangerous to publicly release" narrative still falls short of the marketing.

    11 votes
  19. Comment on Why I find woke criticism of veganism and effective altruism so outrageous in ~society

    Diff
    Link Parent
    I think the argument is that the rich use this as additional justification to further exploit their workers and increase wealth disparity, because the money is doing better work in their hands.

    I think the argument is that the rich use this as additional justification to further exploit their workers and increase wealth disparity, because the money is doing better work in their hands.

    3 votes