mishka answers Does LessWrong make a difference when it comes to AI alignment?

mishka 4 Jan 2024 6:04 UTC
1 point
0
The impact of the LessWrong community as a whole on the field of AI and especially on the field of AI safety seems to be fairly strong, even if difficult to estimate in a precise fashion.

For example, a lot of papers related to interpretability of AI models are publicized and discussed here, so I would expect that interpretability researchers do often read those discussions.

One of the most prominent examples of LessWrong impact is Simulator Theory which has been initially published on LessWrong (Simulators). Simulator Theory is a great deconfusion framework in regard to what LLMs are and are not, helping people to avoid mistakingly interpreting properties of particular inference runs as properties of LLMs themselves, and has recently been featured in Nature as a part of joint publication, M.Shanahan and the authors of Simulator Theory, “Role play with large language models”, Nov 8, 2023, open access.

But I also think that people ending up working on AI existential safety in major AI labs are often influenced by the AI safety discourse on LessWrong in their career choice and initial orientation, although I don’t know if it’s possible to track that well.