in addition to its contents, Human Compatible is important as an artifact, a crystallized proof that top scientists now think AI safety is worth writing books about. Nick Bostrom’s Superintelligence: Paths, Dangers, Strategies previously filled this role. But Superintelligence was in 2014, and by a philosophy professor. From the artifactual point of view, HC is just better – more recent, and by a more domain-relevant expert. But if you also open up the books to see what’s inside, the two defy easy comparison.
[...]
Chapters 7 and 8, “AI: A Different Approach” and “Provably Beneficial AI” will be the most exciting for people who read Bostrom but haven’t been paying attention since. Bostrom ends by saying we need people to start working on the control problem, and explaining why this will be very hard. Russell is reporting all of the good work his lab at UC Berkeley has been doing on the control problem in the interim – and arguing that their approach, Cooperative Inverse Reinforcement Learning, succeeds at doing some of the very hard things.
From the article:
[...]