Wei Dai comments on Legible vs. Illegible AI Safety Problems

Wei Dai 4 Nov 2025 22:06 UTC
LW: 14 AF: 7
6
AF

I don’t actually worry that much about progress on legible problems giving people unfounded confidence, and thereby burning timeline.

Interesting… why not? It seems perfectly reasonable to worry about both?
- johnswentworth 4 Nov 2025 22:30 UTC
  LW: 10 AF: 4
  6
  AF Parent
  It’s one of those arguments which sets off alarm bells and red flags in my head. Which doesn’t necessarily mean that it’s wrong, but I sure am suspicious of it. Specifically, it fits the pattern of roughly “If we make straightforwardly object-level-good changes to X, then people will respond with bad thing Y, so we shouldn’t make straightforwardly object-level-good changes to X”.
  It’s the sort of thing to which the standard reply is “good things are good”. A more sophisticated response might be something like “let’s go solve the actual problem part, rather than trying to have less good stuff”. (To be clear, I don’t necessarily endorse those replies, but that’s what the argument pattern-matches to in my head.)
  - Wei Dai 4 Nov 2025 22:57 UTC
    LW: 8 AF: 5
    1
    AF Parent
    But it seems very analogous to the argument that working on AI capabilities has negative EV. Do you see some important disanalogies between the two, or are you suspicious of that argument too?
    - johnswentworth 4 Nov 2025 23:14 UTC
      LW: 5 AF: 3
      1
      AF Parent
      That one doesn’t route through ”… then people respond with bad thing Y” quite so heavily. Capabilities research just directly involves building a dangerous thing, independent of whether other people make bad decisions in response.
      - Wei Dai 5 Nov 2025 4:46 UTC
        LW: 5 AF: 5
        0
        AF Parent
        What about more indirect or abstract capabilities work, like coming up with some theoretical advance that would be very useful for capabilities work, but not directly building a more capable AI (thus not “directly involves building a dangerous thing”)?
        
        And even directly building a more capable AI still requires other people to respond with bad thing Y = “deploy it before safety problems are sufficiently solved” or “fail to secure it properly”, doesn’t it? It seems like “good things are good” is exactly the kind of argument that capabilities researchers/proponents give, i.e., that we all (eventually) want a safe and highly capable AGI/ASI, so the “good things are good” heuristic says we should work on capabilities as part of achieving that, without worrying about secondary or strategic considerations, or just trusting everyone else to do their part like ensuring safety.