This is great. I notice I very much want a version that is aimed at someone with essentially no technical knowledge of AI and no prior experience with LW—and this is seems like it’s much better at that then par, but still not where I’d want it to be. Whether or not I manage to take a shot, I’m wondering if anyone else is willing to take a crack at that?
I am not a million miles from that person. I have admittedly been consuming your posts on the subject rather obsessively over the past month or two, and following the links a lot, but have zero technical background and can’t really follow the mathematical notation. I still found it fascinating and think I “got it.”
If anyone writes this up I would love to know about it—my local AI safety group is going to be doing a reading + hackathon of this in three weeks, attempting to use the ideas on language models in practice. It would be nice to have this version for a couple of people who aren’t experienced with AI who will be attending, though it’s hardly gamebreaking for the event if we don’t have this.
I haven’t posted it on its own yet, everyone please vote on whether this passes the quality threshold with agreement voting—if this is in the black I’ll make it its own post. If you think it’s not ready, appreciated if you explain why.
My guess is that the people voting “disagree” think that including the distillation in your general write-up is sufficient, and that you don’t need to make the distillation its own post.
This is great. I notice I very much want a version that is aimed at someone with essentially no technical knowledge of AI and no prior experience with LW—and this is seems like it’s much better at that then par, but still not where I’d want it to be. Whether or not I manage to take a shot, I’m wondering if anyone else is willing to take a crack at that?
I am not a million miles from that person. I have admittedly been consuming your posts on the subject rather obsessively over the past month or two, and following the links a lot, but have zero technical background and can’t really follow the mathematical notation. I still found it fascinating and think I “got it.”
dm-ed
If anyone writes this up I would love to know about it—my local AI safety group is going to be doing a reading + hackathon of this in three weeks, attempting to use the ideas on language models in practice. It would be nice to have this version for a couple of people who aren’t experienced with AI who will be attending, though it’s hardly gamebreaking for the event if we don’t have this.
You can find my attempt at the Waluigi Effect mini-post at: https://thezvi.substack.com/p/ai-3#%C2%A7the-waluigi-effect.
I haven’t posted it on its own yet, everyone please vote on whether this passes the quality threshold with agreement voting—if this is in the black I’ll make it its own post. If you think it’s not ready, appreciated if you explain why.
A shame—I see this at an agreement voting −3 a day later, which means I didn’t do a good enough job.
Thus, I kindly request some combination of (A) will someone else take a shot and/or (B) what would I have to do to get it where it needs to go?
(Edit, it’s now at +3? Hmm. We’ll see if that holds.)
My guess is that the people voting “disagree” think that including the distillation in your general write-up is sufficient, and that you don’t need to make the distillation its own post.