Would you agree chaos GPT has a framework where it has a long running goal and humans have provided it the resources to run to achieve that goal? The goal assigned itself leads to power seeking, you wouldn’t expect such behavior spontaneously to happen with all goals. For example, ’make me the most money possible” and “get me the most money this trading day via this trading interface” are enormously different. Do you think a STEM+ model will power seek if given the latter goal?
Like is our problem actually the model scheming against us or is the issue that some humans will misuse models and they will do their assigned tasks well.
It undermines your claims if there exist multiple models, A and At, where the t model costs 1⁄10 as much to run and performs almost as well on the STEM+ benchmark. You are essentially claiming either humans wont prefer the sparsest model that does the job, fairly well optimized models will still power seek, or.. maybe compute will be so cheap humans just don’t care? Like Eliezers short story where toasters and sentient. I think I agree with you in principal that bad outcomes could happen, this disagreement is whether economic forces, etc, will prevent them.
I am saying that for the outcome “the majority of the atoms belong to power seekers” this requires either the military stupidly gives weapons to power seeking machines (like in T3) or a weaker but smarter network of power seeking machines will be able to defeat the military. For the latter claim you quickly end up in arguments over things like the feasibility of mnt anytime soon, since there has to be some way for a badly out-resourced AI to win. “I don’t know how it does it but it’s smarter than us” then hits the issue of “why didn’t the military see through the plan using their own AI?”.
Would you agree chaos GPT has a framework where it has a long running goal and humans have provided it the resources to run to achieve that goal? The goal assigned itself leads to power seeking, you wouldn’t expect such behavior spontaneously to happen with all goals. For example, ’make me the most money possible” and “get me the most money this trading day via this trading interface” are enormously different. Do you think a STEM+ model will power seek if given the latter goal?
Like is our problem actually the model scheming against us or is the issue that some humans will misuse models and they will do their assigned tasks well.
It undermines your claims if there exist multiple models, A and At, where the t model costs 1⁄10 as much to run and performs almost as well on the STEM+ benchmark. You are essentially claiming either humans wont prefer the sparsest model that does the job, fairly well optimized models will still power seek, or.. maybe compute will be so cheap humans just don’t care? Like Eliezers short story where toasters and sentient. I think I agree with you in principal that bad outcomes could happen, this disagreement is whether economic forces, etc, will prevent them.
I am saying that for the outcome “the majority of the atoms belong to power seekers” this requires either the military stupidly gives weapons to power seeking machines (like in T3) or a weaker but smarter network of power seeking machines will be able to defeat the military. For the latter claim you quickly end up in arguments over things like the feasibility of mnt anytime soon, since there has to be some way for a badly out-resourced AI to win. “I don’t know how it does it but it’s smarter than us” then hits the issue of “why didn’t the military see through the plan using their own AI?”.