Greg's recent activity

  1. Comment on Request for help: Backing up NASA public databases in ~space

    Greg
    Link Parent
    I’m assuming the total outgoing bandwidth available has to be a fair amount higher than that, otherwise researchers wouldn’t be able to get enough data out of the system to realistically work on,...

    I’m assuming the total outgoing bandwidth available has to be a fair amount higher than that, otherwise researchers wouldn’t be able to get enough data out of the system to realistically work on, so it’s probably reasonable to plan on the basis of anywhere from 10x that (full backup in about six months) to 100x that (full backup in a few weeks). Running 10 or 20 or 50 or even 500 lightweight cloud VMs on separate connections for a few weeks or months isn’t going to be that onerous in the context of what a few PB of storage costs.

    If NASA’s upstream bandwidth is actually a bottleneck, there’s going to be no way around getting them involved to give you physical access. Unless you’re OK with a six year timeline to get the data out, which I’m assuming you’re not, it just doesn’t work at 20MB/s. But the good news is I really doubt that’s close to a hard limit for any systems hosting a dataset of this size, because you’d have queues of PhD students waiting six months for their research samples to download if it were!

    In terms of actual storage, it’s a mid five figure to low six figure project, which isn’t horrible in the context of larger university budgets and is almost nothing in the context of large tech company budgets (although I’d understand not wanting to go all in on trusting them). If you can get someone to donate the space - Wikimedia, Internet Archive, Internet2, Google, AWS, Backblaze, Hugging Face, and/or any number of universities all seem like possible candidates there - I’d be more than happy to get involved on the software and data handling side. It’s not sounding wildly dissimilar to projects I’ve been part of in the past from the tech side, but I think pulling together the resources to make it happen will be the challenge on this one.

    4 votes
  2. Comment on AI generates surge in expense fraud in ~finance

    Greg
    Link Parent
    You got it! I’ve always thought there’s a lot summed up about modern economics in that one…

    You got it! I’ve always thought there’s a lot summed up about modern economics in that one…

    8 votes
  3. Comment on AI generates surge in expense fraud in ~finance

    Greg
    Link Parent
    Amusingly, there are some specific reporting requirements in the financial sector because apparently on a few occasions employees realised that there was an overlap in that industry between “life...

    Amusingly, there are some specific reporting requirements in the financial sector because apparently on a few occasions employees realised that there was an overlap in that industry between “life of luxury on a paradise island” money and “the company would rather not publicise this because the reputational damage will cost them more than just eating the loss”. Nowadays they’ve got to file a report if an employee finds a way to escape with even a mere couple of million, which put a stop to everyone’s fun!

    10 votes
  4. Comment on You don't need Anubis in ~comp

    Greg
    Link Parent
    Cheers, that's good to know - you're absolutely right about them being a black box, so it's really helpful to piece together details like this when people are able to test them.

    Cheers, that's good to know - you're absolutely right about them being a black box, so it's really helpful to piece together details like this when people are able to test them.

    3 votes
  5. Comment on You don't need Anubis in ~comp

    Greg
    Link
    The biggest surprise to me here is that LLM bots apparently aren't executing JS - the web depends on it so heavily nowadays that I'm surprised they can get a decent content scrape on a lot of...

    The biggest surprise to me here is that LLM bots apparently aren't executing JS - the web depends on it so heavily nowadays that I'm surprised they can get a decent content scrape on a lot of sites without actually rendering the page and letting the XHRs complete.

    Out of interest, @fxgn, did you verify that the actual JS execution is the blocking part? I'm wondering if they're running the JS but perhaps deliberately blocking cookies (perhaps cookies that aren't set by an HTTP request) and/or navigation events, rather than JS as a whole. Seems like a good solution for now even if that is the case, but it'd be good to understand the mechanics too!

    10 votes
  6. Comment on OpenAI moves to complete potentially the largest theft in human history in ~tech

    Greg
    Link Parent
    If I’m understanding correctly, they’re saying that the value of OpenAI is supposed to go to the beneficiaries of the nonprofit (and by extension the public as a whole?), so the transfer of...

    If I’m understanding correctly, they’re saying that the value of OpenAI is supposed to go to the beneficiaries of the nonprofit (and by extension the public as a whole?), so the transfer of ownership to the for-profit entity is theft in the same way that spending charity funds on personal luxuries would be.

    I don’t know enough about the deep minutiae of OpenAI’s structure to know whether I agree in this specific case, but at least conceptually I’m a big believer in the importance of collectively held value, so it seems justified to at least be concerned about it.

    24 votes
  7. Comment on Python Foundation goes ride or DEI, rejects US government grant with strings attached in ~society

    Greg
    Link
    Yup. I’m glad to see the python foundation making it clear that they are taking a stance as a matter of principle, but this hits the nail on the head even if it were a totally amoral business...
    • Exemplary

    "Part of the problem here is all the uncertainties," Crary told us. "Even if we wanted to give up anything that might be considered [DEI] work - which we don't - part of the risk here is that all these restrictions are new, the language is very broad ... I had no interest in being the test case."

    To make matters worse, the terms included a provision that if the PSF was found to have voilated that anti-DEI diktat, the NSF reserved the right to claw back any previously disbursed funds, Crary explained.

    Yup. I’m glad to see the python foundation making it clear that they are taking a stance as a matter of principle, but this hits the nail on the head even if it were a totally amoral business making the decision.

    You can’t make significant financial decisions based on a buzzword from a government that’s shown zero consistency and no willingness to act in accordance with the law. You really, really can’t make them if that government also has the power to take the money back after you’ve spent it if you do something to piss them off. Or if the commander in chief happens to see an AI generated video that pisses him off and has a word in it that sounds vaguely like the name of your organisation, for that matter.

    This is why corrupt states don’t work. It’s not just a moral issue (but it is a big moral issue too), it’s a practical one. Business only works if there’s trust. Finance only works if there’s trust. And if the people enforcing that trust are the ones trying to fuck you over, the whole system crumbles.

    52 votes
  8. Comment on Making liquid nitrogen from scratch (an absurd amount) in ~science

    Greg
    Link Parent
    I got the vibe that he ended up with a super high end system for about the price of a low end one, rather than the original plan of an equivalent low end one for cheap, and he was probably playing...

    I got the vibe that he ended up with a super high end system for about the price of a low end one, rather than the original plan of an equivalent low end one for cheap, and he was probably playing it up the a bit for the narrative (it’s been 15 years since I was last in an actual lab, but I remember liquid helium stuff being on a whole different level to liquid nitrogen so I’m guessing that hasn’t drastically changed).

    But yeah, there were definitely a few places the sketchiness made me wince a bit, and I’m just hoping that was better managed off camera too.

    6 votes
  9. Comment on What is happening to Japan? in ~tech

    Greg
    Link Parent
    I think that dynamic actually made it one of the most interesting videos I’ve seen in a while. You can really hear the difference in his voice a couple of times towards the end as he slips out of...

    I think that dynamic actually made it one of the most interesting videos I’ve seen in a while. You can really hear the difference in his voice a couple of times towards the end as he slips out of “professional with a script” tone and into “I am genuinely upset” - I haven’t often heard creators just straight up say they hate their thumbnails and titles rather than putting a bit of even handed corporate spin around it, for example, and certainly not heard them say it with enough emotion that I fully believe they mean it.

    11 votes
  10. Comment on In the early 1990s, Sweden faced one of the worst economic crises in its modern history – the lessons for other countries, especially France, deep in its own budget crisis, are simple, if not easy in ~finance

    Greg
    Link Parent
    I’m not an expert on how the euro is administered, so I could be missing something here, but isn’t it just transferring that authority from national to supranational level, rather than handing it...

    I’m not an expert on how the euro is administered, so I could be missing something here, but isn’t it just transferring that authority from national to supranational level, rather than handing it to the private sector?

    4 votes
  11. Comment on Amazon ordered to pay $20K after British Columbia customer says package never arrived in ~tech

    Greg
    Link Parent
    Yeah, you’re probably right about the legal outcome, and I’m always disappointed (as it sounds like you are too) when “throw it in the 10,000 word T&Cs that nobody will ever see” is given as an...

    Yeah, you’re probably right about the legal outcome, and I’m always disappointed (as it sounds like you are too) when “throw it in the 10,000 word T&Cs that nobody will ever see” is given as an acceptable solution.

    I was mainly thinking that a photo would help prevent it going as far as the court in the first place - it shows whether it’s a simple mistake and helps the customer fix it, or whether they need to make a police report for theft. I guess I’m also thinking that the absence of a photo points the finger more at Amazon, even if the presence of one wouldn’t definitively absolve them; less from a legal perspective and more just from a “this seems sketchy compared to my personal experience” perspective.

    And yeah, they definitely have OTP infrastructure in place here (UK), but it’s only used very rarely. I’ve seen it maybe two or three times ever, and only on high value but physically small electronics (CPUs, large SSDs, but not on a GPU, or a phone, both of which I would have expected to fit that pattern). I’m sure there’s some complex heuristic behind what does/doesn’t get flagged, probably a pretty good one given how much data they have, and I get that it adds a lot of friction so they don’t want to enable it too broadly, but it seems wild that when they also have a “don’t refund this specific order for this specific customer” heuristic, that doesn’t trigger a OTP!

    1 vote
  12. Comment on Anthropic aims to nearly triple annualized revenue in 2026, sources say in ~tech

    Greg
    Link Parent
    In theory it’s not that wild for a rapidly growing company to be judged on recent performance - especially if they’re working with longer B2B contracts, in which case the whole bunch of deals...

    In theory it’s not that wild for a rapidly growing company to be judged on recent performance - especially if they’re working with longer B2B contracts, in which case the whole bunch of deals you’ve just signed last month are already agreed to continue for the next 12+ months anyway, and it’d be more misleading to average over the 11 prior months where those revenue streams didn’t exist than to project them forward into the year you already know they will exist.

    But in practice, anything remotely reasonable in business has been stretched far past its breaking point, the pieces have been sold off for scrap, and then 10x their actual value has been recorded as a tax loss. The concept can be sound, but I wouldn’t trust the people using it as far as I could throw them.

    8 votes
  13. Comment on Amazon ordered to pay $20K after British Columbia customer says package never arrived in ~tech

    Greg
    Link Parent
    That's a very interesting nuance about the photographic evidence, but I'd say Amazon are weasel wording it a bit - there's a middle ground on the spectrum between "no photo" and "fully...

    That's a very interesting nuance about the photographic evidence, but I'd say Amazon are weasel wording it a bit - there's a middle ground on the spectrum between "no photo" and "fully identifiable photo of an individual". I'm not in Canada, so I can't speak for PIPA, but I am in a location subject to GDPR and it's totally standard for delivery drivers to take a photo of the package in the customer's hands or in the doorway if they deliver it directly to a person. They do always take a picture of the package specifically, not the person, which suggests they've probably been trained to do it that way.

    Even a photo on the porch does quite a lot - it shows that they got the right house rather than a neighbour who's still within the GPS margin of error, it shows that the driver actually got out of their van rather than just stopping to satisfy the GPS and then driving off because it's cold or rainy or they don't want to bother lifting heavy boxes (this is an actual fight I've been having with a non-Amazon delivery company recently, where I thankfully have video evidence on my side), it shows that they did leave it on the porch rather than "helpfully" hiding it in the recycling bin an hour before it gets emptied or anything like that, it shows that the label they scanned was on a box of the correct approximate size, etc. etc.

    Like you rightly say, OTP is the only way to really ensure that a specific person received the package (or at least received a package - even then there are edge cases like a driver handing over three boxes, asking for the code, and the customer not realising the code applied to a fourth box), but a photo rules out almost all honest mistakes. It narrows it down to porch pirates, customer fraud, or driver theft after the photo was taken.

    8 votes
  14. Comment on Amazon ordered to pay $20K after British Columbia customer says package never arrived in ~tech

    Greg
    Link
    I used to decry the fact that companies would delegate decision making to automated systems with no means to escalate to a human for review - I'm actively in favour of automation in general, but...

    I used to decry the fact that companies would delegate decision making to automated systems with no means to escalate to a human for review - I'm actively in favour of automation in general, but it seemed insane to me that they'd accept the machine's decision 100% of the time, rather than taking it as a 99% efficiency gain and letting a human correct the 1% of obvious failures when they're flagged.

    What I'm seeing that seems much worse recently is the humans sticking just as rigidly to these absurd decisions as the machine would have done. These are all wildly varying in severity, but they seem like the same root problem of rigidity over common sense: just this week there's been guns pulled on a teenager because a bag of doritos was flagged as a gun, a £150 fine for tipping half a cup of coffee down the drain (yes, they relented, but only after two days of internet mockery - they originally doubled down even when contacted by the press), and now an Amazon case that cost $20,000 of lawyer time rather than being a ten second conversation along the lines of "Are we sure that was delivered? Well, there's no photo, so I can't be certain... Right, refund it, what's next".

    11 votes
  15. Comment on Amazon ordered to pay $20K after British Columbia customer says package never arrived in ~tech

    Greg
    Link Parent
    It could be “suspicious” against the customer, it could be a statistical fluke, or it could be a sign of a repeated issue on Amazon’s part: a software issue incorrectly labelling the address, a...

    It could be “suspicious” against the customer, it could be a statistical fluke, or it could be a sign of a repeated issue on Amazon’s part: a software issue incorrectly labelling the address, a confusing building layout that new drivers aren’t familiar with, a habit of leaving things in unsafe locations, even a delivery driver taking them.

    Surely the onus is on Amazon, who has every means to create proof that they did deliver, to show that the customer is at fault for this “suspicious” pattern? Because it’s just as likely for that suspicion to point right back at Amazon, either through malice or incompetence, and the customer has no means to definitively prove that something wasn’t delivered.

    The lack of a delivery photo seems like a big enough oversight to side with the customer by default, and if Amazon already know an account is flagged for “return abuse” they have plenty of options to create additional proof rather than summarily denying refunds. There’s already a mechanism in place to require a one time code for higher value items, they could trivially enable that on any item that would otherwise not be refunded. Hell, they could even make it a click through: “Use a one time code or accept the risk that this is non-refundable even if it doesn’t turn up, your choice” - it’s not great UX, but neither is taking a customer’s money for something that didn’t arrive, so I can only assume that most people with an honest delivery problem would be actively happy to solve it this way.

    22 votes
  16. Comment on The majority AI view in ~comp

    Greg
    Link
    It’s a breath of fresh air to see an article like this. It seems like everything I read is billionaire hype or (understandable, justified, but often technically misguided) backlash against the...

    [W]hat they all share is an extraordinary degree of consistency in their feelings about AI, which can be pretty succinctly summed up:

    Technologies like LLMs have utility, but the absurd way they've been over-hyped, the fact they're being forced on everyone, and the insistence on ignoring the many valid critiques about them make it very difficult to focus on legitimate uses where they might add value.

    If we were to simply listen to the smart voices of those who aren't lost in the hype cycle, we might see that it is not inevitable that AI systems use content without the consent of creators, and it is not impossible to build AI systems that respect commitments to environmental sustainability. We can build AI that isn't centralized under the control of a handful of giant companies. Or any other definition of "good AI" that people might aspire to. But instead, we end up with the worst, most anti-social approaches because the platforms that have introduced "AI" to the public imagination are run by authoritarian extremists with deeply destructive agendas.

    It’s a breath of fresh air to see an article like this. It seems like everything I read is billionaire hype or (understandable, justified, but often technically misguided) backlash against the billionaire hype - and as someone who works in scientific ML, a field that’s firmly swept into the “AI” catch all but has nothing to do with LLMs, it’s nice to hear a voice of moderation.

    And the author’s absolutely right, if we treat it more like the “normal technology” that it is, we might just break this idea that it’s synonymous with Sam Altman and Mark Zuckerberg’s bullshit, defuse some of the backlash, and have an opportunity to make use of it in a positive way.

    35 votes
  17. Comment on Not sure if coincidence or I should give up (on USB flash drives) in ~tech

    Greg
    Link Parent
    I tend to compare on IOPS if I'm looking to get an idea of performance between drives - it gives you a single number to compare when scrolling through a list, sequential performance is rarely a...

    I tend to compare on IOPS if I'm looking to get an idea of performance between drives - it gives you a single number to compare when scrolling through a list, sequential performance is rarely a bottleneck, and I find IOPS is a better real world measure of what to expect than trying to balance the relative impact of controller type, DRAM cache, SLC cache, and NAND type. Something like the WD SN7100 is a good example that looks like it should be middling performance based on the lack of DRAM, but actually comes close to the Samsung 990 Pro in most real world tests even though it's 30% cheaper, which is a lot easier to see if you're comparing price/IOPS.

    I'll go deeper and look up actual benchmarks of a specific drive if performance is critical, but like @bitwaba said, even the basic performance numbers are probably not worth worrying about for a pseudo USB stick. There's a whole other can of worms around USB-NVMe bridge chips there if you're actually trying to maximise performance, too, so unless you're planning on booting from it or running a database on there or something like that, almost anything on the market should be fine here.

    4 votes
  18. Comment on Not sure if coincidence or I should give up (on USB flash drives) in ~tech

    Greg
    Link Parent
    I buy a fair amount more drives than the average person, and I’d have no major reliability concerns buying from any of the brands you’ve likely heard of (Crucial/Kingston/Samsung/Western...

    I buy a fair amount more drives than the average person, and I’d have no major reliability concerns buying from any of the brands you’ve likely heard of (Crucial/Kingston/Samsung/Western Digital/Seagate/Sabrent/Lexar/probably a couple of others I’ve missed). As long as you don’t get a counterfeit (looking at you, Amazon), and don’t buy something from an all-caps-no-vowels brand that’s sourcing their flash from the waste pile behind a chip fab (looking at you again, Amazon third party sellers), you should be fine.

    You might not be getting great value right at the bottom end of the market - random write performance in particular (copying thousands of small files, rather than a few big files) can be 500% better for a 50% price increase - but that’s comparing to other SSDs. If you’re replacing a USB flash drive even the cheapest SSD should be a decent improvement, and if you’re only copying a few GB at a time you’ll never have time to notice it’s “slow” regardless of write patterns.

    I’m normally a bit more concerned about performance than is relevant here, but for what it’s worth most of what I do end up getting is Samsung or Crucial (which is just Micron with a consumer-friendly hat on) and they’ve always been solid - that’s also less a question of brand loyalty and more because the two largest manufacturers of actual flash chips seem to do a good job of hitting the price/performance sweet spot for assembled drives! I grabbed a couple of WD drives the other day because they were on sale and didn’t think twice about it, for example.

    3 votes
  19. Comment on What's a quantum computer? in ~tech

    Greg
    (edited )
    Link Parent
    They do exist (one of my closest friends programs them), but they’re somewhere around where classical computers were in the 1950s-60s. Big pieces of research equipment with teams of academics...

    They do exist (one of my closest friends programs them), but they’re somewhere around where classical computers were in the 1950s-60s. Big pieces of research equipment with teams of academics simultaneously working out how to build them and how to use the things they’ve just built.

    I’ll never say never on them somehow finding their way into personal computing - science and tech seems to have a strong track record of finding uses that nobody expected for things as they advance - but realistically I’d be surprised to see it happen in our lifetime, and if anyone’s trying to pitch end user applications for these things in the next couple of decades they’re probably lying.

    That said, it’s not quite at the LHC level either: that’s a global one-off with pure science as its goal, so I’d put quantum computers somewhere in the middle. Maybe something like an injection moulding machine, or a combine harvester - big, expensive, specialist equipment that does a particular thing for a particular job. Even if you or I could somehow spend six or seven figures on one, there’d be no reason to: I don’t need to mould 150,000 identical widgets, and my cheap 3D printer is better for making the one off bits I do need; I don’t have a field to harvest, and farm equipment is a terrible option for driving to the shops.

    The kind of companies that would buy a room-sized IBM machine in 1962 may well have a reason to buy a quantum computer in 2032, but they’re unlikely to be something the rest of us really encounter any time soon. Also how in the hell does “plausible date a few years from now” get me to twenty thirty goddamn two?!

    5 votes
  20. Comment on Looking for feedback on a homelab design in ~tech

    Greg
    Link Parent
    Very glad it was helpful! And I've definitely come across that same challenging feeling when searching for info, so I'm pleased to have had a reason to get it all written down in one place....

    Very glad it was helpful! And I've definitely come across that same challenging feeling when searching for info, so I'm pleased to have had a reason to get it all written down in one place.

    Totally fair on the JBOD side; I'm limited by rack space, and even more limited by floor space for another rack, so that's probably colouring my thoughts even when I'm trying to be more general. I think I had a bit of an impression that you were worried about overflowing a 45 bay shelf, but it doesn't sound like that's the case!

    Similar on the VDEV layout, I think I'm naturally assuming large drives just because I'm always keeping compactness in mind, so when you say 11 drive VDEVs starting with two and scaling to four or eight, I hear half a petabyte to start, possibly growing to 2PB. That's a pretty serious installation even by large organisation standards (cough Korean government), and a solid few thousand euros/dollars/pounds if you need to add a VDEV, but if you're looking at old 4TB or 8TB SAS drives it's a whole different ball game. I'd probably still lean towards smaller VDEVs with bigger drives just because they're likely to be newer, but that's likely just bias creeping in on my side.

    That failure visualisation site is cool, by the way, I hadn't come across it before! I think what I'd say there is just to keep in mind the externalities. You're not worried about the probability of failure per se, you're worried about the probability of multiple failure in the day or two it takes for a replacement drive to be delivered, and even with a "bad" VDEV layout the numbers are so low that you want to be looking at them alongside the chances of losing the server as a whole to a faulty PSU or burst water main or lightning strike on the power line. A single hot spare takes delivery time out of that equation and stacks the odds even more your way.

    There's certainly no harm in thinking about drive failure rates, or in optimising against them to a degree, but (and I say this as someone very prone to bikeshedding, who needs to hear it myself!) if you get a multiple simultaneous drive failure it's more likely to be because a meteorite hit your server and you need to restore from offsite backups anyway.