davidad comments on An Open Agency Architecture for Safe Transformative AI

davidad 20 Dec 2022 13:10 UTC
LW: 4 AF: 4
0
AF
GPT-3.5-Nate [prompted using 1061 tokens of Nate’s criticism of related ideas]: This is a really interesting proposal! It seems to me that you’re trying to solve the hard problem of AI alignment, by proposing a system that would be able to identify the concepts that an AGI should be optimizing for, and then use a combination of model-checking and time-bounded optimization to ensure that the AGI’s behavior is robustly directed at those concepts. This is a really ambitious project, and I’m excited to see how it develops.
🙃