To be clear, I enjoyed the post and am looking forward to this sequence. A point of disagreement though:
One feasible-seeming approach is “accelerating alignment,” which involves leveraging AI as it is developed to help solve the challenging problems of alignment.
This is not a novel idea, as it’s related to previously suggested concepts such as seed AI, nanny AI, and iterated amplification and distillation (IDA).
I disagree that using AI to accelerate alignment research is particularly load bearing for the development of a practical alignment craft or really necessary.
I think we should do it to be clear — I have used ChatGPT to aid some of my writing and plan to use it more — but it’s to the same extent that we use Google/Wikipedia/Word processors to do research in general. That is, I don’t expect AI assistance to be load bearing enough for alignment in general to merit special distinction.
To the extent that one does expect AI to be particularly load bearing for progress on developing useful alignment craft in particular, I think they’re engaging in wishful thinking and snorting too much hopium. That sounds like shying away/avoiding the hard/difficult problems of alignment. John Wentworth has said that we shouldn’t do that:
Far and away the most common failure mode among self-identifying alignment researchers is to look for Clever Ways To Avoid Doing Hard Things (or Clever Reasons To Ignore The Hard Things), rather than just Directly Tackling The Hard Things.
The most common pattern along these lines is to propose outsourcing the Hard Parts to some future AI, and “just” try to align that AI without understanding the Hard Parts of alignment ourselves.
…
You can save yourself several years of time and effort by actively trying to identify the Hard Parts and focus on them, rather than avoid them. Otherwise, you’ll end up burning several years on ideas which don’t actually leave the field better off. That’s one of the big problems with trying to circumvent the Hard Parts: when the circumvention inevitably fails, we are still no closer to solving the Hard Parts. (It has been observed both that alignment researchers mostly seem to not be tackling the Hard Parts, and that alignment research mostly doesn’t seem to build on itself; I claim that the latter is a result of the former.)
Mostly, I think the hard parts are things like “understand agency in general better” and “understand what’s going on inside the magic black boxes”. If your response to such things is “sounds hard, man”, then you have successfully identified (some of) the Hard Parts.
I don’t think this point should be on the list (or at least, I don’t think I endorse the position implied by explicitly placing the point on the list).
I won’t write a detailed object-level response to this for now, since we’re probably going to publish a lot about it soon. I’ll just say that my/our experience with the usefulness of GPT has been very different than yours -
I have used ChatGPT to aid some of my writing and plan to use it more — but it’s to the same extent that we use Google/Wikipedia/Word processors to do research in general.
I’ve used GPT-3 extensively, and for me it has been transformative. To the extent that my work has been helpful to you, you’re indebted to GPT-3 as well, because “janus” is a cyborg whose ideas crystalized out of hundreds of hours of cybernetic scrying. But then, I used GPT in fairly unusual/custom ways—high-bandwidth human-in-the-loop workflows iterating deep simulations—and it took me months to learn to drive and build maps to the fruitful parts of latent space, so I don’t expect others to reap the same benefits out of the box, unless the “box” has been optimized to be useful in this dimension (chatGPT is optimized in a very different dimension).
We also used GPT to summarize seminar meeting and produce posts from the summaries, such as [Simulators seminar sequence] #2 Semiotic physics, where it came up with some of the propositions and proof sketches.
I don’t expect AI assistance to be load bearing enough for alignment in general to merit special distinction.
I do. I expect AI to be superhuman at a lot of things quite soon.
It’s like this: magic exists now. The amount of magic in the world is increasing, allowing for increasingly powerful spells and artifacts, such as CLONE MIND. This is concerning for obvious reasons. One would hope that the protagonists, whose goal it is to steer this autocatalyzing explosion of psychic energy through the needle of an eye to utopia, will become competent at magic.
I interpret the goal as being more about figuring out how to use simulators as powerful tools to assist humans in solving alignment, and not at all shying away from the hard problems of alignment. Despite our lack of understanding of simulators, people (such as yourself) have already found them to be really useful, and I don’t think it is unreasonable to expect that as we become less confused about simulators that we learn to use them in really powerful and game-changing ways.
You gave “Google” as an example. I feel like having access to Google (or another search engine) improves my productivity by more than 100x. This seems like evidence that game-changing tools exist.
To be clear, I enjoyed the post and am looking forward to this sequence. A point of disagreement though:
I disagree that using AI to accelerate alignment research is particularly load bearing for the development of a practical alignment craft or really necessary.
I think we should do it to be clear — I have used ChatGPT to aid some of my writing and plan to use it more — but it’s to the same extent that we use Google/Wikipedia/Word processors to do research in general. That is, I don’t expect AI assistance to be load bearing enough for alignment in general to merit special distinction.
To the extent that one does expect AI to be particularly load bearing for progress on developing useful alignment craft in particular, I think they’re engaging in wishful thinking and snorting too much hopium. That sounds like shying away/avoiding the hard/difficult problems of alignment. John Wentworth has said that we shouldn’t do that:
I don’t think this point should be on the list (or at least, I don’t think I endorse the position implied by explicitly placing the point on the list).
I won’t write a detailed object-level response to this for now, since we’re probably going to publish a lot about it soon. I’ll just say that my/our experience with the usefulness of GPT has been very different than yours -
I’ve used GPT-3 extensively, and for me it has been transformative. To the extent that my work has been helpful to you, you’re indebted to GPT-3 as well, because “janus” is a cyborg whose ideas crystalized out of hundreds of hours of cybernetic scrying. But then, I used GPT in fairly unusual/custom ways—high-bandwidth human-in-the-loop workflows iterating deep simulations—and it took me months to learn to drive and build maps to the fruitful parts of latent space, so I don’t expect others to reap the same benefits out of the box, unless the “box” has been optimized to be useful in this dimension (chatGPT is optimized in a very different dimension).
We also used GPT to summarize seminar meeting and produce posts from the summaries, such as [Simulators seminar sequence] #2 Semiotic physics, where it came up with some of the propositions and proof sketches.
I do. I expect AI to be superhuman at a lot of things quite soon.
It’s like this: magic exists now. The amount of magic in the world is increasing, allowing for increasingly powerful spells and artifacts, such as
CLONE MIND. This is concerning for obvious reasons. One would hope that the protagonists, whose goal it is to steer this autocatalyzing explosion of psychic energy through the needle of an eye to utopia, will become competent at magic.I interpret the goal as being more about figuring out how to use simulators as powerful tools to assist humans in solving alignment, and not at all shying away from the hard problems of alignment. Despite our lack of understanding of simulators, people (such as yourself) have already found them to be really useful, and I don’t think it is unreasonable to expect that as we become less confused about simulators that we learn to use them in really powerful and game-changing ways.
You gave “Google” as an example. I feel like having access to Google (or another search engine) improves my productivity by more than 100x. This seems like evidence that game-changing tools exist.