Hey man, looking forward to reading the other posts you referenced soon! In the meantime, I want to push back on some fundamental premises you included here (as I interpret them), in case that might help you tighten your framework up:
Your point #1 reads to me as “alignment solves itself”, provided we “give a sufficiently capable intelligent system access to an extensive, comprehensive corpus of knowledge”. If that is not the sole condition for #1 to occur, then it might be helpful to clarify that? (if that issue is limited to the content of this post only, then it’s less important I suppose)
Thanks for giving good context on your collaborative approach to rationality!
I deliberately emboldened “sufficiently capable” and “extensive corpus of knowledge” as key general conditions. I stated that I view this “along the Scaling Hypothesis” trajectory: sufficient capabilities are tied to compute and parameters, and extensive knowledge is tied to data.
Getting to the point where the system is sufficiently capable across extensive knowledge is the part that I state requires human endeavour and ingenuity. The 8 points listed at the end are the core factors of my world model which I believe need to be considered during this endeavour.
To give a concrete exciting example: based on recent discussions I had in SF it seems we’re close to a new approach for deterministic interpretability of common frontier model architectures. If true, this improves bidirectional integration between humans & AI (improved information exchange) and accuracy of normative closure (stating what is being attempted versus an objective). I’ll post a review of the paper when it comes out if I stop getting rate-limited lol.
Hey man, looking forward to reading the other posts you referenced soon! In the meantime, I want to push back on some fundamental premises you included here (as I interpret them), in case that might help you tighten your framework up:
Your point #1 reads to me as “alignment solves itself”, provided we “give a sufficiently capable intelligent system access to an extensive, comprehensive corpus of knowledge”. If that is not the sole condition for #1 to occur, then it might be helpful to clarify that? (if that issue is limited to the content of this post only, then it’s less important I suppose)
Thanks for giving good context on your collaborative approach to rationality!
I deliberately emboldened “sufficiently capable” and “extensive corpus of knowledge” as key general conditions. I stated that I view this “along the Scaling Hypothesis” trajectory: sufficient capabilities are tied to compute and parameters, and extensive knowledge is tied to data.
Getting to the point where the system is sufficiently capable across extensive knowledge is the part that I state requires human endeavour and ingenuity. The 8 points listed at the end are the core factors of my world model which I believe need to be considered during this endeavour.
To give a concrete exciting example: based on recent discussions I had in SF it seems we’re close to a new approach for deterministic interpretability of common frontier model architectures. If true, this improves bidirectional integration between humans & AI (improved information exchange) and accuracy of normative closure (stating what is being attempted versus an objective). I’ll post a review of the paper when it comes out if I stop getting rate-limited lol.