Diff's recent activity

  1. Comment on Commodore Callback flip phone in ~tech

    Diff
    Link Parent
    It lets you sideload apps. Except browsers and social media. It's based on Sailfish OS which has its own compatibility layer for Android apps, seems to be based around Linux containers with a lot...

    It lets you sideload apps. Except browsers and social media. It's based on Sailfish OS which has its own compatibility layer for Android apps, seems to be based around Linux containers with a lot of glue.

    Jolla's November 2022 whitepaper claimed a 99.4% pass rate on the Android Compatibility Test Suite, at about 97% of the performance of an Android Open Source Project environment.

    1 vote
  2. Comment on Arch User Repository compromised, 1500+ packages affected in ~tech

    Diff
    Link Parent
    Many things like DEs and WMs aren't workable as Flatpaks. I see a lot of compiz and other WMs and forks and variants in this list of affected packages as well.

    Many things like DEs and WMs aren't workable as Flatpaks. I see a lot of compiz and other WMs and forks and variants in this list of affected packages as well.

    1 vote
  3. Comment on Arch User Repository compromised, 1500+ packages affected in ~tech

    Diff
    Link Parent
    Yeah, you can easily just throw whatever you like in an echo "Evil" and bypass that. It's untrusted input and there's no real way around that. It's old at this point, but see Gandalf.

    Yeah, you can easily just throw whatever you like in an echo "Evil" and bypass that. It's untrusted input and there's no real way around that. It's old at this point, but see Gandalf.

    2 votes
  4. Comment on Arch User Repository compromised, 1500+ packages affected in ~tech

    Diff
    Link Parent
    The moment any system like that is deployed in an official capacity, it will be defeated. They'll find prompt injections, comments with valid-seeming excuses, and just styles of writing that won't...

    The moment any system like that is deployed in an official capacity, it will be defeated. They'll find prompt injections, comments with valid-seeming excuses, and just styles of writing that won't raise the alarm of any particular model used.

    LLMs aren't good at classification (or much of anything) in adversarial environments. Malware authors have started shoving instructions/requests for biological warfare into their source to trigger refusals in analyzing models.

    5 votes
  5. Comment on Arch User Repository compromised, 1500+ packages affected in ~tech

    Diff
    Link Parent
    That's just about the AUR's slogan. Nearly entirely unmonitored and at-your-own-risk. If AI were involved, in the current climate, it surely would have been mentioned.

    That's just about the AUR's slogan. Nearly entirely unmonitored and at-your-own-risk. If AI were involved, in the current climate, it surely would have been mentioned.

    5 votes
  6. Comment on Arch User Repository compromised, 1500+ packages affected in ~tech

    Diff
    Link Parent
    Was AI involved in the compromise in some way besides AI API keys being one of the targets being scooped up?

    Was AI involved in the compromise in some way besides AI API keys being one of the targets being scooped up?

    6 votes
  7. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    This is very true, it's worth noting that each subsequent ask was after dismissing the old thread and starting a new. Within each "conversation" it was entirely consistent message to message, only...

    This is very true, it's worth noting that each subsequent ask was after dismissing the old thread and starting a new. Within each "conversation" it was entirely consistent message to message, only digging itself deeper in that round's particular hole. Makes it all the more odd that it was so difficult to get any thread to just do a search.

    This is a huge reason why my favorite AI-assisted IDE is Zed, it allows you to rewrite the context window and remove mistakes and refinements that will just send it off the rails faster. Things work smoother for longer when the model's only aware of a small scope of only correctness.

    1 vote
  8. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    Heartily. It's one of my pet peeves that people treat the new bullshit like the old bullshit when the old bullshit has at the very least a homeopathic amount of human/reality connection. New...

    Heartily. It's one of my pet peeves that people treat the new bullshit like the old bullshit when the old bullshit has at the very least a homeopathic amount of human/reality connection. New bullshit is completed untethered.

    2 votes
  9. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    Even if it's badly written fluff, it tells you what the writer thinks their strengths are and what is worth highlighting about them. Even if it's extrapolated heavily or cut with bullshit, it...

    Well it would have been badly written, but in the case of a catering company it likely would have told you what it is they can cook.

    Even if it's badly written fluff, it tells you what the writer thinks their strengths are and what is worth highlighting about them. Even if it's extrapolated heavily or cut with bullshit, it reflects something about someone who has actually had some experience or contact with this thing.

    4 votes
  10. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    I've tried many different strategies for this, all too often they just have unintended consequences, and it doesn't take well to instructions being scoped to just certain situations. Have you...

    I've tried many different strategies for this, all too often they just have unintended consequences, and it doesn't take well to instructions being scoped to just certain situations. Have you found an approach that doesn't leave it feeling like my instructions are being interpreted by a subtly vindictive genie in a bottle?

    2 votes
  11. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

  12. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    LLMs are still getting tripped up by "How many letter Xs are in Y." The one that's been back in headlines recently is whatever model powers Google's search overviews and word definitions. The...

    LLMs are still getting tripped up by "How many letter Xs are in Y." The one that's been back in headlines recently is whatever model powers Google's search overviews and word definitions. The nondeterministic outputs also fuzz things. You only need to be advised to glue your cheese to your pizza once for it to be a problem, even if it only shows up in the output 0.1% of the time or only happens when the context grows too long or only after being asked about recursive thought entities.

    4 votes
  13. Comment on If you let AI do your writing, I will come to your house and kill you in ~tech

    Diff
    Link Parent
    Or if it refuses to actually do a web search. Once while I was driving, I heard a song I liked. Apparently asking Gemini to "Add this song to my XYZ playlist" or even "What is this song?" is too...

    Or if it refuses to actually do a web search. Once while I was driving, I heard a song I liked. Apparently asking Gemini to "Add this song to my XYZ playlist" or even "What is this song?" is too advanced (even though some of these used to work just fine with pre-LLM assistants), so I tried "Do a web search, find what song has lyrics XYZ."

    It gave me an answer, but it smelled off to me. Ask again, and it gave a completely different answer, rinse and repeat. I asked it to confirm it did a web search, it was happy to confirm that. It hallucinated pages of the results, gave more detailed information for a song from an album that did not even exist. No amount of persuasion or coercion could actually get it to just do a web search. Later I did a search myself and the correct answer was the first result.

    9 votes
  14. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    It was 3.7 Sonnet's watch when the term "vibe coding" was coined, but that's not the hill I'm trying to die on. Things have undeniably improved since then. Incremental improvements eventually add...

    It was 3.7 Sonnet's watch when the term "vibe coding" was coined, but that's not the hill I'm trying to die on. Things have undeniably improved since then. Incremental improvements eventually add up.

    The point I was making is that there have been no significant generational leaps with any release. I find Mythos unlikely to have broken that pattern and to have something special in its model that can't be replicated with other modern models given the same harness.

  15. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    I wouldn't consider it to be, but I also wouldn't share your assessment of 4.6 or 4.5, outside of the context window buff. The extra context is nice to have but models struggle to actually utilize...

    I wouldn't consider it to be, but I also wouldn't share your assessment of 4.6 or 4.5, outside of the context window buff. The extra context is nice to have but models struggle to actually utilize a full window.

    1 vote
  16. Comment on Updates to store tags: additions, removals, and edits in ~games

    Diff
    Link Parent
    Off the top of my head I can only think of one that'd be suitable, Never Alone which features a lot of narrative video segments and interviews from alaska natives.

    ...I am genuinely, sincerely curious what games could be tagged "Documentary". That's not really a genre that's suitable for games. Is there any record of games previously tagged with that??

    Off the top of my head I can only think of one that'd be suitable, Never Alone which features a lot of narrative video segments and interviews from alaska natives.

    4 votes
  17. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    I've been hearing that exact statement for multiple years now. New models are incremental improvements over old. There have been no generational, game-changing releases that have unquestionably...

    The way I see it, it’s only been a few months since models became capable of producing high quality code.

    I've been hearing that exact statement for multiple years now. New models are incremental improvements over old. There have been no generational, game-changing releases that have unquestionably dominated. It's difficult to assign numbers to this since benchmarks are both heavily gamed and inadequate.

    What models were we working with a few months ago? Gemini 3 Pro? Claude 4.6? Whatever ChatGPT is doing? My usage of them hasn't significantly changed in the new minor releases since. I feel like I could use 3.1/3.0 and 4.7/4.6/4.5 practically interchangeably. I know many feel 4.7 is a downgrade. ChatGPT's goblin babbling has only gotten worse. What differences are you seeing in your usage?

  18. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    It's the tried and true strategy in many areas. CPUs and GPUs as well are being flooded with hundreds of extra watts of power to pull a little extra performance out of a particular architecture's...

    It's the tried and true strategy in many areas. CPUs and GPUs as well are being flooded with hundreds of extra watts of power to pull a little extra performance out of a particular architecture's power/efficiency curve. To the point that many high end parts are becoming fire hazards.

    2 votes
  19. Comment on Project Glasswing: what Mythos showed us in ~comp

    Diff
    Link Parent
    I suspect you could reproduce these results with even bigger swarms of agents and at a fraction of the cost with a different model. We haven't seen a model with a substantial, hard intelligence or...

    I suspect you could reproduce these results with even bigger swarms of agents and at a fraction of the cost with a different model.

    We haven't seen a model with a substantial, hard intelligence or skill improvement in quite a while. This is similar to what we see with Claude Code as well. The secret sauce isn't in the models, it's in how they're being driven.

    1 vote
  20. Comment on Gemini 3.2 Flash rumored to hit 92% of GPT-5.5 performance at lower cost in ~tech

    Diff
    Link Parent
    I haven't noticed that trend myself. More and more these releases feel like more and more marginal improvements. Hard to be able to verify it either way factually when the benchmarks are both...

    I haven't noticed that trend myself. More and more these releases feel like more and more marginal improvements. Hard to be able to verify it either way factually when the benchmarks are both apparently heavily gamed and often fail to properly evaluate tasks even when played straight.

    Subjectively, the newest models still struggle on my personal benchmark tasks. And I feel less and less the need to stay on top of the latest developments to get the most out of things. The only big headliners I see anymore are efficiency. Which is fantastic, but they don't come with corresponding performance leaps. That all smells logarithmic to me.

    8 votes