TristanTrim comments on Legible vs. Illegible AI Safety Problems

TristanTrim 7 Nov 2025 14:52 UTC
1 point
0
I definitely like the directions you are exploring in and I agree they are improvements over the implicit AGI lab directed concept. That’s a useful thing to keep in mind, but so is what keeps them from being final ideas.

I’m not convinced they are the same problem

When viewed as OISs from a high level, they are the same problem. Misaligned OIS to misaligned OIS. But you are correct that many of the details change. The properties of one OIS are quite different from the properties of the other, and that does matter for analyzing and aligning them. I think that having a model that applies to both of them and makes the similarities and differences more explicit would be useful (my suggestion is my OIS model, but it’s entirely possible there are better ones).

It seems like considerations to “keep philosophers honest” are implicitly talking about how to ensure alignment of a hypothetical socio-technical OIS. What do you think? Does that make sense at all, or maybe it seems more like a time wasting distraction? I have to admit I’m uncomfortable with the amount I have gotten stuck on the idea that championing this concept is a useful thing for me to be doing.

I do think the alignment problem and the “morality is scary” problem have a lot in common, and in my thinking about the alignment problem and the way it leaks into other problems, the model that emerged for me was that of OIS, which seem to generalize the part of the alignment problem that I am interested in focusing on to social institutions who’s goals are moral in nature, and how they relate to the values of individual people.
- Ebenezer Dukakis 8 Nov 2025 3:44 UTC
  1 point
  0
  Parent
  
  I definitely like the directions you are exploring in and I agree they are improvements over the implicit AGI lab directed concept. That’s a useful thing to keep in mind, but so is what keeps them from being final ideas.
  
  +1
  
  What do you think? Does that make sense at all, or maybe it seems more like a time wasting distraction? I have to admit I’m uncomfortable with the amount I have gotten stuck on the idea that championing this concept is a useful thing for me to be doing.
  
  Glad you’re self-aware about this. I would focus less on championing the concept, and more on treating it as a hypothesis about a research approach which may or may not deliver benefits. I wouldn’t evangelize until you’ve got serious benefits to show, and show those benefits first (with the concept that delivered those benefits as more of a footnote).
  - TristanTrim 10 Nov 2025 16:59 UTC
    1 point
    0
    Parent
    I think the focus on “delivering benefits” is a good perspective. It feels complicated by my sense that a lot of the benefit of OIS is as an explanatory lens. When I want to discuss things I’m focused on, I want to discuss in terms of OIS and it feels like not using OIS terminology makes explanations more complicated. So in that regard I guess I need to clearly define and demonstrate the explanatory benefit. But the “research approach” focus also seems like a good thing to keep in mind.
    Thanks for your perspective 🙏