5 votes

The dangers of LLM self-exfiltration: AI alignment and cybersecurity challenges

Posted October 2, 2023 by asukii

Tags: security.cyber, artificial intelligence, ai alignment, language models.large, substack.musings on the alignment problem, author.jan leike

https://aligned.substack.com/p/self-exfiltration

Link information

This data is scraped automatically and may be incorrect.

Title: Self-exfiltration is a key dangerous capability
Authors: Jan Leike
Word count: 1393 words

1 comment

asukii (OP)
October 2, 2023
Link
In this article by Jan Leike, who co-leads the Superalignment Team at OpenAI, he discusses some potential long-term challenges in securing increasingly intelligent AI systems. This is a complex...

In this article by Jan Leike, who co-leads the Superalignment Team at OpenAI, he discusses some potential long-term challenges in securing increasingly intelligent AI systems. This is a complex multi-faceted problem to be tackled on several fronts, including "AI alignment" (in a nutshell, making sure the AI's values align with our own), cybersecurity (to protect against not only internal and external threats to the company, but also potential threats from the model itself), and more. State-of-the-art AI models are still quite a ways away from the level of capabilities needed for these issues to be a problem now, but given the likely difficulties in finding a good solution and the rapid pace in which the field is advancing, they're important to start seriously thinking about sooner than later.

4 votes