Illegible and Legible problems both exist in AI safety research
Decisionmakers are less likely to understand illegible problems
Illegible problems are less likely to cause decisionmakers to slow/stop where appropriate
Legible problems are not the bottleneck (because they’re more likely to get solved by default by the time we reach danger zones)
Working on legible problems shortens timelines without much gain
[From JohnW if you wanna incorporate] If you work on legible problems by making illegible problems worse, you aren’t helping.
I guess you do have a lot of stuff you wanna say, so it’s not like the post naturally has a short handle.
“Working on legible problems shortens timelines without much gain” is IMO the most provocative handle, but, might not be worth it if you think of the other points as comparably important.
“Legible AI problems are not the bottleneck” is slightly more overall-encompassing
Yeah it’s hard to think of a clear improvement to the title. I think I’m mostly trying to point out that thinking about legible vs illegible safety problems leads to a number of interesting implications that people may not have realized. At this point the karma is probably high enough to help attract readers despite the boring title, so I’ll probably just leave it as is.
Makes sense, although want to flag one more argument that, the takeaways people tend to remember from posts are ones that are encapsulated in their titles. “Musings on X” style posts tend not to be remembered as much, and I think this is a fairly important post for people to remember.
“Musings on X” style posts tend not to be remembered as much, and I think this is a fairly important post for people to remember.
I guess I’m pretty guilty of this, as I tend to write “here’s a new concept or line of thought, and its various implications” style posts, and sometimes I just don’t want to spoil the ending/conclusion, like maybe I’m afraid people won’t read the post if they can just glance at the title and decide whether they already agree or disagree with it, or think they know what I’m going to say? The Nature of Offense is a good example of the latter, where I could have easily titled it “Offense is about Status”.
Not sure if I want to change my habit yet. Any further thoughts on this, or references about this effect, how strong it is, etc.?
Scott strongly encourages using well-crafted concept handles for reasons very similar to what Raemon describes, and thinks Eliezer’s writing is really impactful partly because he’s good at creating them. And “Offense is about status” doesn’t seem to me like it would create the reactions you predicted if people see that you in particular are the author (because of your track record of contributions); I doubt the people who would still round it off to strawman versions would not do so with your boring title anyway, so on the margin seems like a non-issue.
I’m mostly going off intuitions. One bit of data you might look over is the titles of the Best of LessWrong section, which is what people turned out to remember and find important.
I think there is something virtuous about the sort of title you make, but, also a different kind of virtue in writing to argue for specific points or concepts you want in people’s heads. (In this case, the post does get “Illegible problems” into people’s heads, it’s just that I think people mostly already have heard of those, or think they have)
(I think an important TODO is for someone to find a compelling argument that people who are skeptical about “work on illegible stuff” would find persuasive)
Not sure. Let me think about it step by step.
It seems like the claims here are:
Illegible and Legible problems both exist in AI safety research
Decisionmakers are less likely to understand illegible problems
Illegible problems are less likely to cause decisionmakers to slow/stop where appropriate
Legible problems are not the bottleneck (because they’re more likely to get solved by default by the time we reach danger zones)
Working on legible problems shortens timelines without much gain
[From JohnW if you wanna incorporate] If you work on legible problems by making illegible problems worse, you aren’t helping.
I guess you do have a lot of stuff you wanna say, so it’s not like the post naturally has a short handle.
“Working on legible problems shortens timelines without much gain” is IMO the most provocative handle, but, might not be worth it if you think of the other points as comparably important.
“Legible AI problems are not the bottleneck” is slightly more overall-encompassing
“I hope Joe Carlsmith works on illegible problems” is, uh, a very fun title but probably bad. :P
Yeah it’s hard to think of a clear improvement to the title. I think I’m mostly trying to point out that thinking about legible vs illegible safety problems leads to a number of interesting implications that people may not have realized. At this point the karma is probably high enough to help attract readers despite the boring title, so I’ll probably just leave it as is.
Makes sense, although want to flag one more argument that, the takeaways people tend to remember from posts are ones that are encapsulated in their titles. “Musings on X” style posts tend not to be remembered as much, and I think this is a fairly important post for people to remember.
I guess I’m pretty guilty of this, as I tend to write “here’s a new concept or line of thought, and its various implications” style posts, and sometimes I just don’t want to spoil the ending/conclusion, like maybe I’m afraid people won’t read the post if they can just glance at the title and decide whether they already agree or disagree with it, or think they know what I’m going to say? The Nature of Offense is a good example of the latter, where I could have easily titled it “Offense is about Status”.
Not sure if I want to change my habit yet. Any further thoughts on this, or references about this effect, how strong it is, etc.?
Scott strongly encourages using well-crafted concept handles for reasons very similar to what Raemon describes, and thinks Eliezer’s writing is really impactful partly because he’s good at creating them. And “Offense is about status” doesn’t seem to me like it would create the reactions you predicted if people see that you in particular are the author (because of your track record of contributions); I doubt the people who would still round it off to strawman versions would not do so with your boring title anyway, so on the margin seems like a non-issue.
I’m mostly going off intuitions. One bit of data you might look over is the titles of the Best of LessWrong section, which is what people turned out to remember and find important.
I think there is something virtuous about the sort of title you make, but, also a different kind of virtue in writing to argue for specific points or concepts you want in people’s heads. (In this case, the post does get “Illegible problems” into people’s heads, it’s just that I think people mostly already have heard of those, or think they have)
(I think an important TODO is for someone to find a compelling argument that people who are skeptical about “work on illegible stuff” would find persuasive)
Making illegible alignment problems legible to decision-makers efficiently reduces risky deployments
Make alignment problems legible to decision-makers
Explaining problems to decision-makers is often more efficient than trying to solve them yourself.
Explain problems don’t solve them (the reductio)
Explain problems
Explaining problems clearly helps you solve them and gets others to help.
I favor the 2nd for alignment and the last as a general principle.