Greg's recent activity

  1. Comment on DeepSeek’s safety guardrails failed every test researchers threw at its AI chatbot in ~tech

    Greg
    Link Parent
    Oh that’s cool! Very pleased to see it’s being done for usability rather than obfuscation!

    Oh that’s cool! Very pleased to see it’s being done for usability rather than obfuscation!

    3 votes
  2. Comment on DeepSeek’s safety guardrails failed every test researchers threw at its AI chatbot in ~tech

    Greg
    Link Parent
    No worries! I’m finding it quite interesting to watch all the different perspectives flying around on openness, actually. It’s comparatively rare to have academic researchers, end users, techies,...

    No worries! I’m finding it quite interesting to watch all the different perspectives flying around on openness, actually. It’s comparatively rare to have academic researchers, end users, techies, and big companies all having a hand on the same thing at the same time, and I’m seeing everything from career scientists who couldn’t care less about practical application but really want to replicate every byte of the research from scratch, right along the spectrum to completely non-techie people who just want to use whatever gets decent results to their phone the fastest.

    It makes for an unusual collision of interests all at once, compared to the more usual researcher figures it out -> hacky open source version -> polished proprietary version -> end user using it progression over a few years that tends to happen in tech.

    2 votes
  3. Comment on DeepSeek’s safety guardrails failed every test researchers threw at its AI chatbot in ~tech

    Greg
    Link Parent
    There's a bit more nuance when it comes to ML models - the source that defines the model architecture is open, it necessarily has to be otherwise those weights wouldn't be usable*. Model...

    “Open weights” is functionally “binary download available”. Many people are arguing that they are not open source because that would require all the training data and program used to train the weights (basically the source code). The weights are the output of this training program (the release binary in typical software parlance). But the community seems to have settled on open source meaning open weights.

    There's a bit more nuance when it comes to ML models - the source that defines the model architecture is open, it necessarily has to be otherwise those weights wouldn't be usable*. Model architecture isn't particularly meaningful to the end user, but for the people who would be making use of the source code at all it's generally quite a lot more important than the training code or data: it defines the majority of what makes a given model different to others.

    I'd like to have the training code and data open too, no question, but ultimately if I want to replicate their work from scratch I absolutely can wrap a training loop around the exact same model code that's running the real thing, and that wouldn't be possible if it were equivalent to a closed source binary. Sure, the knowledge to do that is a barrier to entry, but so's the few million dollars of compute time it'd take to get a meaningful result from scratch - on the knowledge side, people are already working on it, and on the cost side the answer is to use those existing weights as a starting point, which are the direct product of that compute time spent by DeepSeek.

    "Open weights" covers 85% of the technical work and 98% of the compute cost that would go into replicating something like this independently - that's a far cry from a binary release that tells you almost nothing about how it was created.


    * You technically could wrap the weights in a binary-only executable, but I've never seen it done and it would be a clear enough departure from the norm that nobody would be describing it as "open source" in that situation

    3 votes
  4. Comment on Is there a reason that we aren't seeing pushback to US President Donald Trump's blitzkreig? in ~society

    Greg
    Link Parent
    This seems like almost a non sequitur. People like the status quo, so they voted for a person who promises to be an agent of radical, chaotic change? People were shouted down by assholes, so they...

    This seems like almost a non sequitur.

    People like the status quo, so they voted for a person who promises to be an agent of radical, chaotic change?

    People were shouted down by assholes, so they voted for a party that have made mocking and bullying and victimising others their standard for communication?

    People were worried about schooling standards being reduced, so they vote to cripple the department of education?

    People are concerned about crime, so they voted for a felon who’s backed by rioters?


    I do see what you’re getting at: that perhaps people voted for Trump because they have experience with people somewhere under the broad banner of “left and/or liberal” doing things to piss them off.

    The problem is that on pretty much any legitimate grievance, it’s immediately apparent from even a cursory glance that Trump is going to make things worse. Either the people didn’t know what they voted for, meaning they’re less right wing and more just uninformed, or they did know, in which case they have no justification in using “fed up with the Democrats” as a fig leaf for actively choosing cruelty.

    12 votes
  5. Comment on AI is creating a generation of illiterate programmers in ~tech

    Greg
    Link Parent
    I'd suggest taking a look at Runpod as well, they're my go to for ad hoc GPU stuff nowadays. They tend to be cheapest (especially "community cloud" instances, which is them soaking up excess...

    I'd suggest taking a look at Runpod as well, they're my go to for ad hoc GPU stuff nowadays. They tend to be cheapest (especially "community cloud" instances, which is them soaking up excess capacity from other organisations that own hardware but aren't using 100% of it at that moment) and there's a lot less enterprise-y cruft to deal with than the big three - pretty much just choose or upload a container and hit go.

    I haven't tried their serverless product specifically (I tend to use them for burst capacity on larger training runs rather than quick inference calls), but it's pretty much what I was thinking of above: your machine image floats in the aether until you make a call, and then grabs a GPU just for the duration.

    3 votes
  6. Comment on AI is creating a generation of illiterate programmers in ~tech

    Greg
    Link Parent
    Self-hosted LLMs are a situation where cloud compute can work out really nicely: you still have all the advantages of picking your own open source model, fine tuning it if you like, keeping your...

    Self-hosted LLMs are a situation where cloud compute can work out really nicely: you still have all the advantages of picking your own open source model, fine tuning it if you like, keeping your conversations private rather than sending them off into the giant data aggregation vortex, all that good stuff - but you also get to grab just the few seconds of GPU time you need for a given request rather than having expensive hardware sitting mostly idle (or cheap hardware sitting mostly idle but limiting you to much more basic models when you do need to use them).

    2 votes
  7. Comment on AI is creating a generation of illiterate programmers in ~tech

    Greg
    Link Parent
    Sometimes I’m reading some beautifully elegant leap of logic in a codebase written by a grad student a decade younger than me, trying to really understand the fundamentals that got them there, and...

    Sometimes I’m reading some beautifully elegant leap of logic in a codebase written by a grad student a decade younger than me, trying to really understand the fundamentals that got them there, and I get a pang of imposter syndrome wondering if I ever would’ve figured it out on my own.

    Lines like

    I’m not suggesting anything radical like going AI-free completely—that’s unrealistic.

    are a nice reassurance that in the larger scheme of things, I’m probably doing alright after all.

    9 votes
  8. Comment on What is China’s DeepSeek and why is it freaking out the AI world? in ~tech

    Greg
    Link Parent
    Yeah, it sounds like you've got a solid intuition for it! If you really boil it down an ML model is more or less just a colossal mathematical function that turns one tensor (aka n-dimensional...

    Yeah, it sounds like you've got a solid intuition for it! If you really boil it down an ML model is more or less just a colossal mathematical function that turns one tensor (aka n-dimensional vectory thing, which is phrasing I like very much) into another, and the weights are the numeric values in that function.

    For an LLM your input starts as a list of words, gets converted to a list of numeric IDs using a tokenizer (basically just a slightly fancier dictionary lookup), and then that gets fed into the actual model. Everything from there is a mathematical operation involving the input and some subset of the weights: first mapping the input into a vector embedding space - that's the bit you might be visualising if you've seen the big point cloud diagrams like this - and then a series of operations to transform that into meaningful output.

    You're generally working with somewhere between 2D and 5D tensors depending on exactly what a given layer of the model is doing, and on the order of half a billion elements in that tensor to pass through each layer after the embedding step, so hopefully that gives you some insight on where all these tens or hundreds of billions of parameters in the weights are coming from: they're the fixed values you need in order to perform a handful of operations on an object of that size.

    2 votes
  9. Comment on What is China’s DeepSeek and why is it freaking out the AI world? in ~tech

    Greg
    (edited )
    Link Parent
    Strongly agreed that the panic seems overblown. Anyone sufficiently close to the research side of the field has known for at least a year that DeepSeek are a serious competitor, and anyone who...
    • Exemplary

    Strongly agreed that the panic seems overblown. Anyone sufficiently close to the research side of the field has known for at least a year that DeepSeek are a serious competitor, and anyone who wasn’t expecting some level of significant gains in efficiency as the known-but-unsolved problems get solved isn’t someone I’d trust to make bets on the future, given the history of technology in general and ML research in particular.

    The big question is whether the price crash was a correction from a speculative bubble, meaning the current price sticks, or a group panic, meaning it’ll bounce back up soon enough. Given that the numbers are essentially made up in either case, I wouldn’t want to guess which it is.

    Open sourcing it matters a lot, though. It serves as a reference implementation for the techniques outlined in the paper, and as working proof of the efficacy of their mathematical approach. The model weights (the things that contain the actual “knowledge” from the few million spent on training) are just numbers, and can serve as a leg up for anyone who wants to train beyond that either with DeepSeek’s code or their own. The code serves as a template for anyone who wants to implement a similar transformer architecture elsewhere, potentially even in a completely different space (medical imaging, audio analysis, whatever).

    In short: we already know what they’ve changed, because anyone really knowledgeable can look at their paper and say “hey, that’s a really nice solution to this bottleneck we’ve all been struggling with, good job guys”. Reimplementing it from the paper wouldn’t be too onerous, but we don’t even have to do that because we’ve got a working reference implementation too. And experimenting with further advancements on top can start where they left off, because the weights are open too, rather than needing a few million to train from scratch.

    It’s a big deal, and I understand why it’s caught popular attention like this, but for anyone actually in the field I don’t think it should come as some foundation-shaking shock that someone figured out a particularly tricky problem.


    The article that @krellor posted below is a great technical overview, and it takes pretty much the tone I’d expect of someone who really knows the state of research here:

    I see many of the improvements made by DeepSeek as “obvious in retrospect”: they are the kind of innovations that, had someone asked me in advance about them, I would have said were good ideas. However, as I’ve said earlier, this doesn’t mean it’s easy to come up with the ideas in the first place.

    I’ve heard many people express the sentiment that the DeepSeek team has “good taste” in research. Based just on these architectural improvements I think that assessment is right. None of these improvements seem like they were found as a result of some brute-force search through possible ideas. Instead, they look like they were carefully devised by researchers who understood how a Transformer works and how its various architectural deficiencies can be addressed.

    None of us are saying “oh yeah, I could’ve thought of that”. It’s a legitimate breakthrough that the DeepSeek team should be proud of. But I’m suspicious of anyone presenting as an expert saying they didn’t expect anyone to work out meaningful improvements in these areas.

    [Edit] Expanded quote and fixed link

    9 votes
  10. Comment on What is China’s DeepSeek and why is it freaking out the AI world? in ~tech

    Greg
    Link Parent
    Depends on the workload, but the consumer cards are excellent for AI/ML dev work if you can live with the VRAM limitations. If you’re a large company buying thousands for a datacenter then yeah,...

    Depends on the workload, but the consumer cards are excellent for AI/ML dev work if you can live with the VRAM limitations. If you’re a large company buying thousands for a datacenter then yeah, you’re getting the five figure cards, but in terms of raw mathematical performance per dollar the xx90 cards are actually hard to beat. Anecdotally I know several smaller companies using them in their standard dev workstations - honestly I see the 5090 release as a good value CUDA card rather than an expensive gaming card.

    That said, it doesn’t apply so much to the xx60 and xx70s that people are more realistically going to buy for gaming, and for those I think it’s more just a matter of where NVIDIA are allocating their resources. Everything that goes into gaming cards has to be justified as money/staff/manufacturing capacity that isn’t allocated to those five figure datacenter chips that are already so in demand they have a waitlist, and the consumer card price goes up until it’s a worthwhile trade off.

    4 votes
  11. Comment on I put a toaster in the dishwasher (2012) in ~science

    Greg
    Link
    That was a great little read! It kind of reminds me of styropyro's 'Is it the volts or amps that kill?' video - he starts from a similar place, looking at comments that say with absolute...

    That was a great little read! It kind of reminds me of styropyro's 'Is it the volts or amps that kill?' video - he starts from a similar place, looking at comments that say with absolute confidence that the answer is X (a decent number of which contradict each other) and then takes the premise to a kind of terrifying extreme to demonstrate the reality in different situations.

    8 votes
  12. Comment on What trustworthy resources are you using for AI/LLMs/ML education? in ~tech

    Greg
    Link
    Jay Allamar has written some excellent pieces on the serious technical fundamentals, which go deep but are still accessible to a non-expert reader with a decent technical background. I'd start...

    Jay Allamar has written some excellent pieces on the serious technical fundamentals, which go deep but are still accessible to a non-expert reader with a decent technical background. I'd start with The Illustrated Transformer (or maybe his article on attention, one step before that), which I think hits the most important topics that are relevant to models you'll see day-to-day.

    Two minute papers on YouTube often covers newly released research on AI/ML topics, explained in brief by an academic with a strong background in mathematics and GPU computing.

    Papers With Code is where I go when I need to use a model for a specific task and don't know what's current in that particular area. Their benchmark tables aren't perfect - they depend on what tests a given author chose to do, so it can be misleading either because of cherry-picked results or because the actual state of the art model didn't happen to be tested on the dataset for the chart you're looking at - but I've still found it a very good starting point for what I should be looking at when I need to get something done.

    Fireship (also YouTube) is a bit more divisive in tone - very cynical and meme heavy - but he undeniably does a good job of hitting the salient points quickly. It's a general programming news and info channel, but hits on AI topics fairly regularly because they're a big part of the tech world nowadays.

    Hugging Face writes some pretty good blog posts, although they're definitely also intended for promotion, and influenced by their position in the industry. On the plus side, that position is broadly similar to GitHub's, so their interests tend to be more on technical progress and less on snake oil, even if they want to spin things in a way that keeps them as the hub of that progress.

    I'm also still getting Nofil Khan's newsletter, although I'll be honest and say it normally goes into the "I'll read that later" bucket nowadays and then gets forgotten about. He was one of the earlier voices keeping on top of the rapidly evolving space a few years back and I really appreciated it then, but now I find there's a bit less need for what he does as the dust has settled somewhat and the mainstream tech zeitgeist is more on top of developments in the field.

    Finally, if you want to go behind the curtain a bit, take a look at the models people are using and discussing on Kaggle. There are big financial incentives there, and the results depend on hard mathematical results with nowhere to hide any marketing fluff, so the discussion threads and sample code tend to hit on the actual state of the art models and techniques for a given task pretty quickly - and the people competing have often meaningfully advanced those models too by the time a competition is done and the results are published.

    3 votes
  13. Comment on How nine popular YouTubers helped US President Donald Trump win a second term in ~society

    Greg
    Link Parent
    Oh for sure - I actually mentioned never-Trump Republicans further up as an example of people who just disagree on the path rather than the underlying facts, and in the immediate term it makes...

    Oh for sure - I actually mentioned never-Trump Republicans further up as an example of people who just disagree on the path rather than the underlying facts, and in the immediate term it makes sense to reach out to anyone who’s likely to listen.

    And yeah, I see plenty of shitty and illogical behaviour from people with progressive politics too; I’m not suggesting we’re some shining beacon here, just that there’s a huge gulf from where we are (and where many traditional conservatives are/were) to the outright reality denial of MAGA.

    I’ve taken a heavy tone in my replies here partly because the jumping off point was MAGA-promoting influencers specifically, partly because I’m quite seriously shaken about the mask-off use of Nazi symbolism this week and the amount that’s being downplayed, and partly because even pulling together a supermajority leaves tens of millions of actively pro-fascist true believers and that scares the shit out of me.

    2 votes
  14. Comment on How nine popular YouTubers helped US President Donald Trump win a second term in ~society

    Greg
    Link Parent
    It really feels like we’re talking past each other here. I’m watching large swathes of America make apologies and excuses for a man who embraced Nazi symbolism on stage, at the same time as the...

    It really feels like we’re talking past each other here.

    I’m watching large swathes of America make apologies and excuses for a man who embraced Nazi symbolism on stage, at the same time as the actual self-professed neo-Nazis weigh in with their support for what he did.

    I’m watching people call for the deportation and/or death of a leader in their own claimed religion for saying the following:

    In the name of our God, I ask you to have mercy upon the people in our country who are scared now.

    And then going on to explicitly say that some LGBT+ people and immigrants in particular are afraid for their lives.

    I watched an attempt at a violent coup four years ago, and I’m watching people cheer as the perpetrators are pardoned this week.

    So yes, I invoked the language of dealing with those indoctrinated into cults. Maybe my word choice was wrong, maybe it wasn’t, I honestly don’t know. But I stand by my meaning.

    I said it in the context of those things, immediately after calling out the Nazi apologism, and your reply is to frame their behaviour as “what [their] loved ones taught [them]” and then throw in a mention that both sides do dumb things (which, yes, of course - my point isn’t about “dumb” it’s about “wilfully destructive to themselves and others based on a deeply held belief in misinformation”).

    I’m not the voice we need. I said that in my first post and I meant it. I’m scared, I’m angry, and I’m tired, none of which are conducive to getting my point across well - or at all, to those who aren’t already at least somewhat aligned.

    But I stand by my meaning, and I do not accept that support of a literal fascist platform can be downplayed as a mere teachable moment, on par with any other. I don’t object to you saying education is needed, that’s clearly true. I object to drawing an equivalence between simple ignorance and active defence of hateful lies.

    12 votes
  15. Comment on How nine popular YouTubers helped US President Donald Trump win a second term in ~society

    Greg
    Link Parent
    I would absolutely love for this to just be a matter of teaching people who haven’t figured things out for themselves - that implies a straightforward absence of knowledge, a space for a...

    I would absolutely love for this to just be a matter of teaching people who haven’t figured things out for themselves - that implies a straightforward absence of knowledge, a space for a conclusion that just hasn’t been reached yet.

    The problem is year after year after year of explaining and pointing out evidence being rebuffed as “fake news” or “deep state conspiracy” or “actually that wasn’t a Nazi salute”. It’s not teaching that’s needed, it’s deprogramming.

    I think it’s actually a pretty serious false equivalence to imply that it’s just a question of people not having figured things out independently, and to compare it to education in general. The stance isn’t “fuck you for not figuring out the obvious” it’s “well of course you got hurt from doing <obvious thing>, we all told you it was dangerous but you just smirked and called us brainwashed”, often with a side of “I do not know how to explain to you that you should care about other people”.

    11 votes
  16. Comment on How nine popular YouTubers helped US President Donald Trump win a second term in ~society

    Greg
    Link Parent
    That totally makes sense, and to be clear my point wasn’t party political in favour of the Democrats per se. If someone had a principled objection and chose not to vote, that’s a perfect example...

    That totally makes sense, and to be clear my point wasn’t party political in favour of the Democrats per se. If someone had a principled objection and chose not to vote, that’s a perfect example of who I’d put in the “we’d be disagreeing on policy” bucket (or in this case potentially even agreeing on policy but disagreeing on the action to take) - I believe their decision was unacceptably dangerous in the sense that it increases the chance there won’t be a free and fair vote at all next time, regardless of how much there is to legitimately dislike about the Democratic party, but there are plenty of ways they could have come to that decision reasonably and simply disagree with me on the risk.

    Progressives, pro-corporate liberals, the more radical left, disillusioned Democrats, and never-Trump Republicans are all examples of people who I’d broadly trust to inhabit a shared reality, even if there isn’t a shared idea of how to navigate it and all you get from putting them in the same room is a blazing argument.

    Anyone who looked at Trump the first time around and voted for him again with the belief that he will somehow improve their lives, beyond the tiny handful of them that are actually part of his grift, isn’t in that same reality. Those are the people I was talking about above.

    15 votes
  17. Comment on How nine popular YouTubers helped US President Donald Trump win a second term in ~society

    Greg
    Link Parent
    There’s a deep irony in worrying about how not to be condescending while we’re talking about the need to convince people to accept objective, provable reality that they’re currently denying in...

    There’s a deep irony in worrying about how not to be condescending while we’re talking about the need to convince people to accept objective, provable reality that they’re currently denying in favour of fantasy. The belief that we know more is baked right into the fact we’re even having this conversation.

    That doesn’t mean I think you’re wrong: quite the opposite, I think we need every psychologist and marketing expert we can find to figure out what it takes to package factual information in a way that has any hope of countering misinformation. Whatever tone actually works is the tone we should adopt.

    But the underlying truth of the matter is that we are saying that people aren’t capable of figuring things out for themselves, because if they were we’d be disagreeing on policy rather than trying to explain that letting a billionaire felon loot the country while imposing tariffs on every major trading partner will not, in fact, reduce the price of eggs.

    And yes, I am well aware my phrasing is patronising as hell and is probably exactly what you were objecting to - I’m never going to be the voice we need and I understand that - but I hope my point still comes across here, to an audience that largely understands the fundamentals already. Condescension is the natural consequence of one side being, just… wrong, and avoiding it will be an active and delicate effort.

    32 votes
  18. Comment on A shower thought on cameras in ~talk

    Greg
    Link Parent
    I’m still disappointed that their utterly absurd 755 megapixel, 300 fps, three meter long cinema camera never saw real use before they went bankrupt. I came across it in a retrospective on YouTube...

    I’m still disappointed that their utterly absurd 755 megapixel, 300 fps, three meter long cinema camera never saw real use before they went bankrupt.

    I came across it in a retrospective on YouTube recently and had a lot of the same thoughts as you about how amazing it would be to capture all of that extra information for posterity!

    But yeah, like @PetitPrince said, the main drawback is the hardware needed to capture all of that data. When you’re ultimately only presenting a tiny slice of the full file at any given time, it’s either going to be very low resolution (as their consumer cameras were), or require that absolute behemoth of a camera to capture orders of magnitude more information so it eventually flattens out to a resolution that matches the industry standard.

    The VR suggestion gives me a little bit of hope, though… On screen, you’re by definition only ever seeing a 2D slice of the total data - the extra capabilities are only really seen by the editor - whereas nowadays we’ve got more situations where the end user might be able to benefit from those capabilities too. Throw an array of tiny, high density 2025-era phone camera sensors and some modern image processing into a full frame body and maybe there’s potential here…

    6 votes
  19. Comment on Live updates of day one executive orders / actions taken by US President Donald Trump in ~society

    Greg
    Link Parent
    Got you, that makes sense!

    Got you, that makes sense!

    1 vote
  20. Comment on Live updates of day one executive orders / actions taken by US President Donald Trump in ~society

    Greg
    Link Parent
    I get what you're saying; we're in a world where Nazi salutes at a US presidential inauguration happened and were applauded, that in itself would have sounded fucking insane if we hadn't just seen...

    I get what you're saying; we're in a world where Nazi salutes at a US presidential inauguration happened and were applauded, that in itself would have sounded fucking insane if we hadn't just seen it broadcast, so it's reasonable to assume the absolute worst of a platform directly controlled by the guy who did it. All I'd say is it's also worth remembering that these people are the ones who directly benefit from chaos and uncertainty.

    In general, we should trust nothing on Twitter unless corroborated elsewhere. But at the same time we have to be ready to accept that if something is all over every social media platform and the international news, and has been for almost 24 hours without the person it's attributed to making any attempt to refute it, that's reasonable corroboration.

    Much as we need to be on the lookout for misinformation, we also need to be careful not to talk ourselves or others out of believing things that we do know with reasonable confidence.

    8 votes