3 votes

Which future?

1 comment

  1. skybrian
    Link
    From the article: [...] [...] [...] [...] [...] [...] [...]

    From the article:

    In 1954 the United States carried out its first full-scale test of a thermonuclear bomb, on Bikini Atoll in the Western Pacific. Known as Castle Bravo, the bomb's designers expected a 6 megaton blast. They were shocked when it yielded 15 megaton, an excess of 9 megatons, about 600 times the Hiroshima blast. The unexpected radiation fallout caused many deaths – the exact number is disputed – and serious radiation exposure to more than a thousand people, triggering a major international incident.

    What went wrong? The bomb contained both the lithium-6 and lithium-7 isotopes of lithium. The bomb's designers believed only the lithium-6 would contribute to the yield, while the lithium-7 would be inert. But the lithium-7 converted to tritium far faster than expected, and that turned out to almost triple the yield. It's a case where a group of outstanding scientists failed to anticipate a deadly possibility latent in nature.

    [...]

    Progress on viral design lags proteins, but there's a lot of effort and strong market incentives to improve. And as with proteins, there is a mad scientist ideal in which you say what properties you want in a virus, and if it's possible you'll be given a design and synthesis instructions.

    [...]

    So let's suppose future AI-based models do enable de novo virus design. This will have many benefits, but the possibility of designing pandemic agents (and similar threats) will necessarily give rise to a field of AI biosafety engineering, similar to the safety engineering preventing misuse of large language models. What will safety in such AI models look like?

    In the early foundation models, the safety engineering is primitive and likely easily circumvented. Let's use as an example Arc Institute's well-known Evo 2 model5. This model was trained on sequence data from roughly 128,000 genomes. Their main safety measure was to exclude from the training data all viruses that infect eukaryotic hosts. This worked well, in the sense that the trained model regards actual viruses infecting humans as biologically implausible. The model also performs extremely poorly when prompted to generate human-infecting viruses.

    [...]

    The real underlying issue is that such models aim to capture an understanding of how biology works. The deeper that understanding, the better the models will function. But the understanding is fundamentally value-free: there's nothing intrinsically "good" or "bad" about understanding, let us say, what makes a protease cleavage site efficient. That's just part of understanding biology well. "Good" and "bad" are downstream of such understanding. And we only get benefits by learning how to control things like immune evasion, the rate of lethal side effects, and the rate of viral spread. When you learn to control a system so as to improve outcomes, you can very often apply the same control to make things worse. Benefits and threats are intrinsically linked.

    [...]

    Another possibility is to secure the built environment. A lot of people are putting serious work into that, but I will just mention one amusing and perhaps slightly tongue-in-cheek observation, I believe first made by Carl Shulman: the cost of BSL-3 lab space is within a small multiple of San Francisco real estate (both currently in the very rough range of $1k / square foot). Obviously I don't mean we should all live in BSL-3 labs! But it does suggest that if biological disasters become common or severe enough, we may have both the incentive and the capacity to secure the entire built environment.

    I mention this in part because it is very similar to the strategy humanity uses to deal with fire. As of 2014, the US spent more than $300 billion annually on fire safety9. That means investment in new materials, in meeting the fire code, in surveillance to detect and respond to threats, and many other measures. The fire code is expensive and disliked by many people, but it provides a form of collective safety, somewhat similar to the childhood vaccine schedule. We don't address the challenge of fire by putting guardrails on matches, making them "safe" or "aligned". Instead, we align the entire external world through materials and surveillance and institutions.

    [...]

    Much safety is supplied by the market in advance: companies have strong incentives to ensure your toaster doesn't electrocute you, your airplane doesn't crash, and so on. In this sense, market-supplied safety is a fundamental part of progress. We see this in the technical alignment work I just discussed. Indeed, when such work isn't done, it's often harshly punished by the market. In 2016, Microsoft released an AI chatbot named Tay on Twitter. It was a spectacular public failure, rapidly learning from users to use racial slurs, deny the holocaust, and so on:

    [...]

    It'd need a full class to survey the ideas people are exploring, so I'll restrict myself to just one brief observation today: across areas where we've made progress securing reality, surveillance often plays an important role. It's fundamental to effective fire safety, biosafety, nuclear safety, aviation safety, and in many other areas. Seeing problems is often at the root of solving them. But surveillance also creates a problem: the surveillors gain power over the surveilled. Healthy surveillance systems balance the needs and rights of multiple parties in ways enforced (ideally) technically or by the laws of nature. That is: you want humane values by design, not relying on trusted governing authorities – that's a single point of failure, and a recipe for authoritarianism.

    [...]

    Let's return to alignment. I've said the idea of an aligned AI system is too unstable to make sense – it's too much like making matches safe for fire. A better way of posing the alignment problem is: how can a civilization systematically supply safety sufficient to flourish and defend itself from threats, while preserving humane values? This version of the alignment problem includes technical alignment – you still don't want AI systems going rogue – but it's now subsidiary to an overriding goal.

    This perhaps seems like abstract motherhood and apple pie, but this formulation has real consequences. For a good future, civilization must solve the alignment problem over and over again. It's not a problem which can be solved just once. Indeed, even if ASIs were to wipe out humanity, any posthuman successors would also face the alignment problem with respect to their successors, worrying about some powerful new posthuman kid on the block going rogue. For any society to flourish, whether it be human, posthuman, or some mix, it must supply adequate safety in an ongoing fashion. It won't be a single Singularity from the inside.

    1 vote