I’m new here, but have been reading through the sequences and other posts for the last few weeks and would love some feedback on a post idea. I’m writing my theory of change for AI safety and how I can help. I’ve defined my priors, identified cruxes, and I’m in the middle of reading papers and blog posts to challenge my priors. I’ve seen a few theory of change posts (e.g., Critch’s healthtech post), but I’m wondering if I should post mine as a working document, starting with an unfinished product and updating as I refine my beliefs.
Is an in-progress theory of change interesting/useful for LessWrong, or should I wait until it’s complete?
Since this is also my first post here, I’m dropping some of my background info below.
Background: 10 years of professional experience, with the first 5 in structural engineering and the last 5 in consulting for US Transportation Agency Data/AI projects. This year I’ve been volunteering for Building Humane Technology to help with HumaneBench (benchmark showing how frontier models can be steered toward anti-humane behavior just by adjusting system prompts).
Hello!
I’m new here, but have been reading through the sequences and other posts for the last few weeks and would love some feedback on a post idea. I’m writing my theory of change for AI safety and how I can help. I’ve defined my priors, identified cruxes, and I’m in the middle of reading papers and blog posts to challenge my priors. I’ve seen a few theory of change posts (e.g., Critch’s healthtech post), but I’m wondering if I should post mine as a working document, starting with an unfinished product and updating as I refine my beliefs.
Is an in-progress theory of change interesting/useful for LessWrong, or should I wait until it’s complete?
Since this is also my first post here, I’m dropping some of my background info below.
Background: 10 years of professional experience, with the first 5 in structural engineering and the last 5 in consulting for US Transportation Agency Data/AI projects. This year I’ve been volunteering for Building Humane Technology to help with HumaneBench (benchmark showing how frontier models can be steered toward anti-humane behavior just by adjusting system prompts).