plex comments on Daniel Kokotajlo’s Shortform

plex 29 Sep 2025 14:51 UTC
1 point
0
Patterns implicit in the text from this kind of experiment will at least indirectly leak into future models^[1], making it more likely that future models will display this behaviour earlier. It might be worth it to try and wake the world up, but there are real costs to these patterns being reflected by many brains and implicit in the future text they generate.
1. ^
  Like Subliminal Learning, but with more detail extractable by a stronger training process.