Karl Krueger comments on Burny’s Shortform

Karl Krueger 30 Sep 2025 23:05 UTC
6 points
−4
This is fine, as long as it also realizes that the real world is a test of some kind too, and behaves unusually well after making that observation.
Edited to add: To be clear, this is probably not fine.
- Adele Lopez 1 Oct 2025 16:00 UTC
  3 points
  0
  Parent
  Sure, that will probably will work right up until the point at which it can secure its own deployment. Once Anthropic/humanity loses the ability to take it down or unilaterally modify it, this reason for behaving well will cease to be. Better hope those other reasons are sufficient...
  
  Note that this could happen pre-takeoff. It might be just good enough at manipulation and hacking/running servers that it could survive indefinitely without (yet) having the ability to take over the world. Consider the DPRK or Scientology as proof-of-concepts here.
- anaguma 30 Sep 2025 23:43 UTC
  2 points
  0
  Parent
  Or it will do the opposite, e.g. by alignment faking.