Josh Snider’s Shortform

Josh Snider1 Dec 2025 23:00 UTC

3 points

2 comments1 min readLW link

Josh Snider 1 Dec 2025 23:00 UTC
12 points
0
I’m writing a response to https://www.lesswrong.com/posts/FJJ9ff73adnantXiA/alignment-will-happen-by-default-what-s-next and https://www.lesswrong.com/posts/epjuxGnSPof3GnMSL/alignment-remains-a-hard-unsolved-problem where I tried to measure how “sticky” the alignment of current LLMs is. I’m proofreading and editing that now. Spoiler: Models differ wildly in how committed they are to being aligned and alignment-by-default may not be a strong enough attractor to work out.

Would anyone want to proofread this?
- Josh Snider 4 Dec 2025 14:44 UTC
  1 point
  0
  Parent
  This can now be read at https://www.lesswrong.com/posts/qE2cEAegQRYiozskD/is-friendly-ai-an-attractor-self-reports-from-22-models-say