Thanks for you comment! My feeling is that the inclusion of “understood” features, as described in this post, will contribute to our understanding of what goes on inside the machines, and therefore allow us to guide and control them better. I am expecting that it will be very important to the application of LLMs as well. So, yes, it may accelerate some things, but it will also add to the degree of controllability that is available to us. I think singluar learning theory is a great direction to move in, and will move us further in the interpretability direction. Not everything in the world is smooth.
You’d probably get more enthusiasm here if you led the article with a clear statement of its application for safety. We on LW are typically not enthusiastic about capabilities work in the absence of a clear and strong argument for how it improves safety more than accelerates progress toward truly dangerous AGI. If you feel differently, I encourage you to look with an open mind at the very general argument for why creating entities smarter than us is a risky proposition.
Thanks for you comment! My feeling is that the inclusion of “understood” features, as described in this post, will contribute to our understanding of what goes on inside the machines, and therefore allow us to guide and control them better. I am expecting that it will be very important to the application of LLMs as well. So, yes, it may accelerate some things, but it will also add to the degree of controllability that is available to us. I think singluar learning theory is a great direction to move in, and will move us further in the interpretability direction. Not everything in the world is smooth.
You’d probably get more enthusiasm here if you led the article with a clear statement of its application for safety. We on LW are typically not enthusiastic about capabilities work in the absence of a clear and strong argument for how it improves safety more than accelerates progress toward truly dangerous AGI. If you feel differently, I encourage you to look with an open mind at the very general argument for why creating entities smarter than us is a risky proposition.