I agree this is a major risk. (Another one is that it’s just infeasible to significantly increase AI philosophical competence in the relevant time frame. Another one is that it’s much easier to make it appear like the AI is more philosophically competent, giving us false security.) So I continue to think that pausing/stopping AI should be plan A (which legibilizing the problem of AI philosophical competence can contribute to), with actually improving AI philosophical competence as (part of) plan B. Having said that, 2 reasons this risk might not bear out:
Empirically the best capabilities people (e.g., STEM/finance workers, managers, politicians) tent to be distinct from the best philosophers. And there are whole cultures (e.g. China) getting very good at STEM but still far behind at making philosophical progress.
But the opportunity cost of learning additional skills for AIs appears much lower than for humans, so this pattern might not carry forward to future AIs.
If I’m right about “philosophy reasoning” being some kind of (currently opaque) general but slow problem solving method, and we already have more legible, specialized, and faster methods for specific areas, such as math, science, and engineering, with “philosophical problems” being left-over problems that lack such faster methods, then making AIs better at philosophical reasoning ought to help with philosophical problems more than other types of problems.
But philosophical reasoning can still help with “non-philosophical” problems, if those problems have some parts that are “more philosophical” that can be sped up by applying good philosophical reasoning.
To conclude I’m quite worried about the risks/downsides of trying to increase AI philosophical competence, but it seems to a problem that has to be solved eventually. “The only way out is through” but we can certainly choose to do it at a more opportune time, when humans are much smarter on average and have made a lot more progress in metaphilosophy (understanding the nature of philosophy and philosophical reasoning).
FYI, normally when I’m thinking about this, it’s through the lens “how do we help the researchers working on illegible problems”, moreso than “how do we communicate illegibleness?”.
This post happened to ask the question “can AI advisers help with the latter” so I was replying about that, but, for completeness, normally when I think about this problem I resolve it as “what narrow capabilities can we build that are helpful ‘to the workflow’ of people solving illegible problems, that aren’t particularly bad from a capabilities standpoint”.
normally when I think about this problem I resolve it as “what narrow capabilities can we build that are helpful ‘to the workflow’ of people solving illegible problems, that aren’t particularly bad from a capabilities standpoint”.
Do you have any writings about this, e.g., examples of what this line of thought led to?
Mostly this has only been a sidequest I periodically mull over in the background. (I expect to someday focus more explicitly on it, although it might be more in the form of making sure someone else is tackling the problem intelligently).
But, I did previously pose this as a kind of open question re What are important UI-shaped problems that Lightcone could tackle? and JargonBot Beta Test (this notably didn’t really work, I have hopes of trying again with a different tack). Thane Ruthenis replied with some ideas that were in this space (about making it easier to move between representations-of-a-problem)
My personal work so far has been building a mix of exobrain tools that are more, like, for rapid prototyping of complex prompts in general. (This has mostly been a side project I’m not primarily focused on atm)
I agree this is a major risk. (Another one is that it’s just infeasible to significantly increase AI philosophical competence in the relevant time frame. Another one is that it’s much easier to make it appear like the AI is more philosophically competent, giving us false security.) So I continue to think that pausing/stopping AI should be plan A (which legibilizing the problem of AI philosophical competence can contribute to), with actually improving AI philosophical competence as (part of) plan B. Having said that, 2 reasons this risk might not bear out:
Empirically the best capabilities people (e.g., STEM/finance workers, managers, politicians) tent to be distinct from the best philosophers. And there are whole cultures (e.g. China) getting very good at STEM but still far behind at making philosophical progress.
But the opportunity cost of learning additional skills for AIs appears much lower than for humans, so this pattern might not carry forward to future AIs.
If I’m right about “philosophy reasoning” being some kind of (currently opaque) general but slow problem solving method, and we already have more legible, specialized, and faster methods for specific areas, such as math, science, and engineering, with “philosophical problems” being left-over problems that lack such faster methods, then making AIs better at philosophical reasoning ought to help with philosophical problems more than other types of problems.
But philosophical reasoning can still help with “non-philosophical” problems, if those problems have some parts that are “more philosophical” that can be sped up by applying good philosophical reasoning.
To conclude I’m quite worried about the risks/downsides of trying to increase AI philosophical competence, but it seems to a problem that has to be solved eventually. “The only way out is through” but we can certainly choose to do it at a more opportune time, when humans are much smarter on average and have made a lot more progress in metaphilosophy (understanding the nature of philosophy and philosophical reasoning).
FYI, normally when I’m thinking about this, it’s through the lens “how do we help the researchers working on illegible problems”, moreso than “how do we communicate illegibleness?”.
This post happened to ask the question “can AI advisers help with the latter” so I was replying about that, but, for completeness, normally when I think about this problem I resolve it as “what narrow capabilities can we build that are helpful ‘to the workflow’ of people solving illegible problems, that aren’t particularly bad from a capabilities standpoint”.
Do you have any writings about this, e.g., examples of what this line of thought led to?
Mostly this has only been a sidequest I periodically mull over in the background. (I expect to someday focus more explicitly on it, although it might be more in the form of making sure someone else is tackling the problem intelligently).
But, I did previously pose this as a kind of open question re What are important UI-shaped problems that Lightcone could tackle? and JargonBot Beta Test (this notably didn’t really work, I have hopes of trying again with a different tack). Thane Ruthenis replied with some ideas that were in this space (about making it easier to move between representations-of-a-problem)
https://www.lesswrong.com/posts/t46PYSvHHtJLxmrxn/what-are-important-ui-shaped-problems-that-lightcone-could
I think of many Wentworth posts as relevant background:
Why Not Just… Build Weak AI Tools For AI Alignment Research?
Why Not Just Outsource Alignment Research To An AI?
Interfaces as a Scarce Resource
My personal work so far has been building a mix of exobrain tools that are more, like, for rapid prototyping of complex prompts in general. (This has mostly been a side project I’m not primarily focused on atm)