Lucius Bushnaq comments on Anti-Slop Interventions?

Lucius Bushnaq 5 Feb 2025 0:31 UTC
LW: 15 AF: 9
13
AF
If you de-slopify the models, how do you avoid people then using them to accelerate capabilities research just as much as safety research? Why wouldn’t that leave us with the same gap in progress between the two we have right now, or even a worse gap? Except that everything would be moving to the finish line even faster, so Earth would have even less time to react.

Is the idea that it wouldn’t help safety go differentially faster at all, but rather just that it may preempt people latching on to false slop-solutions for alignment as an additional source of confidence that racing ahead is fine? If that is the main payoff you envision, I don’t think it’d be worth the downside of everything happening even faster. I think time is very precious, and sources of confidence already abound for those who go looking for them.
- abramdemski 5 Feb 2025 22:28 UTC
  LW: 5 AF: 4
  0
  AF Parent
  Hmmm. I’m not exactly sure what the disconnect is, but I don’t think you’re quite understanding my model.
  I think anti-slop research is very probably dual-use. I expect it to accelerate capabilities. However, I think attempting to put “capabilities” and “safety” on the same scale and maximize differential progress of safety over capabilities is an oversimplistic model which doesn’t capture some important dynamics.
  There is not really a precise “finish line”. Rather, we can point to various important events. The extinction of all humans lies down a path where many mistakes (of varying sorts and magnitudes) were made earlier.
  Anti-slop AI helps everybody make less mistakes. Sloppy AI convinces lots of people to make more mistakes.
  My assumption is that frontier labs are racing ahead anyway. The idea is that we’d rather they race ahead with a less-sloppy approach.
  Imagine an incautious teenager who is running around all the time and liable to run off a cliff. You expect that if they run off a cliff, they die—at this rate you expect such a thing to happen sooner or later. You can give them magic sneakers that allow them to run faster, but also improves their reaction time, their perception of obstacles, and even their wisdom. Do you give the kid the shoes?
  It’s a tough call. Giving the kid the shoes might make them run off a cliff even faster than they otherwise would. It could also allow them to stop just short of the cliff when they otherwise wouldn’t.
  I think if you value increased P(they survive to adulthood) over increased E(time they spend as a teenager), you give them the shoes. IE, withholding the shoes values short-term over long-term. If you think there’s no chance of survival to adulthood either way, you don’t hand over the shoes.