Nathan Helm-Burger comments on Beliefs and state of mind into 2025

Nathan Helm-Burger 10 Jan 2025 23:46 UTC
4 points
0
In one specific respect I’d like to challenge your point. I think fine-tuning models currently aligns them ‘well-enough’ to any target point of view. I think that the ethics shown by current LLMs are due to researchers actively putting them there. I’ve been doing red teaming exercises on LLMs for over a year now, and I find it quite easy to fine-tune them to be evil and murderous. Human texts help them understand morality, but don’t make them care enough about it for it to be sticky in the face of fine-tuning.
- cousin_it 11 Jan 2025 0:02 UTC
  4 points
  0
  Parent
  Yeah, on further thought I think you’re right. This is pretty pessimistic then, AI companies will find it easy to align AIs to money interests, and the rest of us will be in a “natives vs the East India Company” situation. More time to spend on alignment then matters only if some companies actually try to align AIs to something good instead, and I’m not sure any companies will do that.
  - Noosphere89 11 Jan 2025 1:04 UTC
    4 points
    0
    Parent
    This is also my view of the situation, as well, and is a big portion of the reason why solving AI alignment, which reduces existential risk a lot, is non-trivially likely without further political reforms I don’t expect to lead to dystopian worlds (from my values).
  - Nathan Helm-Burger 11 Jan 2025 1:06 UTC
    2 points
    0
    Parent
    Yeah, any small group of humans seizing unprecedented control over the entire world seems like a bad gamble to take, even if they start off seeming like decent people.
    
    I’m currently hoping we can figure some kind of new governance solution for managing decentralized power while achieving adequate safety inspections.
    
    https://www.lesswrong.com/posts/FEcw6JQ8surwxvRfr/human-takeover-might-be-worse-than-ai-takeover?commentId=uSPR9svtuBaSCoJ5P
- Noosphere89 10 Jan 2025 23:49 UTC
  4 points
  2
  Parent
  This is consistent with a model where AI alignment is heavily dependent on the data, and way less dependent on inductive biases/priors, so this is good news for alignment.