Great question. I am relatively new to the conversation on AI, but have developed several critical physical systems requiring high reliability (in human spaceflight). Could you help me understand why the focus on the spec which I presume would be used in post-training alignment? My instinct is to focus on the pre-training data set instead as the foundation for making these models as useful as possible for critical system work. Using the example you offered, could a focus on Jeff Bezos’ training set, (both technical and leadership training, including experiential and academic training prior to and during his time at Amazon) lead to a more productive reproduction of his success vs a post-training spec?
And since I’m a systems engineer at heart… I have to ask… could a more useful spec instead be written for the training data? Or did you intend this and I just misunderstood? Feels like we are leaving much on the table without significant training data curation and quality assurance (and possibly we should be increasing a model’s awareness of its own training set—I wrote a little about this here: A Risk-Informed Framework for AI Use in Critical Applications — EA Forum).
Also, curious if you think Anthropic’s updated constitution is a step in the right direction for focusing on a model’s “motivation, or its disposition” as you mention near the conclusion of your write up? Seems like carefully designed pre-training data and motivational post-training could be a good combination for navigating the unknown-unknowns.
Great question. I am relatively new to the conversation on AI, but have developed several critical physical systems requiring high reliability (in human spaceflight). Could you help me understand why the focus on the spec which I presume would be used in post-training alignment? My instinct is to focus on the pre-training data set instead as the foundation for making these models as useful as possible for critical system work. Using the example you offered, could a focus on Jeff Bezos’ training set, (both technical and leadership training, including experiential and academic training prior to and during his time at Amazon) lead to a more productive reproduction of his success vs a post-training spec?
And since I’m a systems engineer at heart… I have to ask… could a more useful spec instead be written for the training data? Or did you intend this and I just misunderstood? Feels like we are leaving much on the table without significant training data curation and quality assurance (and possibly we should be increasing a model’s awareness of its own training set—I wrote a little about this here: A Risk-Informed Framework for AI Use in Critical Applications — EA Forum).
Also, curious if you think Anthropic’s updated constitution is a step in the right direction for focusing on a model’s “motivation, or its disposition” as you mention near the conclusion of your write up? Seems like carefully designed pre-training data and motivational post-training could be a good combination for navigating the unknown-unknowns.