TristanTrim comments on Legible vs. Illegible AI Safety Problems

TristanTrim 6 Nov 2025 15:15 UTC
1 point
0
This is a good point of view. What we have is a large sociotechnical system moving towards global catastrophic risk (GCR). Some actions cause it to accelerate or remove brakes, others cause it to steer away from GCR. So “capabilities vs alignment” is directly “accelerate vs steer”, while “legible vs illegible” is like making people think we can steer, even though we can’t, which in turn makes people ok with acceleration, and so it results in “legible vs illegible” also being “accelerate vs steer”.

The important factor there is “people think we can steer”. I think when the thing we are driving is “the entire human civilization” and the thing we are trying to avoid driving into is “global catastrophic risk”, caution is warranted… but not infinite caution. It does not override all other concerns, merely, it seems by my math, most of them. So unfortunately, I think getting people to accurately (or at least less wrongly) understand the degree to which we can or cannot steer is most important, probably erring on making people think we can steer less well than we can rather than thinking we can steer better than we can as seems to be default to human nature.

An unrelated problem, like with capabilities, there is more funding in legible problems vs illegible ones. I am currently continuing to sacrifice large amounts of earning potential so I can focus on problems I believe are important. This makes it sound noble, but indeed, how do we know which people working on illegible problems are working making worthwhile things understandable and which are just wasting time? That is exactly what makes a problem illegible, we can’t tell. It seems like a real tricky problem, somewhat related to the ASI alignment problem. How can we know an agent we don’t understand, working on a problem we don’t understand, is working towards our benefit?

Anyway, Thanks for the thoughtful post.