Maybe I don’t know what I’m talking about and obviously we’ve tried this already.
I’ve heard Eliezer mention that the ability to understand AI risk is linked to Security Mindset. Security Mindset is basically: you can think like a hacker, of exploits, how to abuse rules etc. So you can defend against hacks & exploits. You don’t stop at basic “looks safe to me!”
There are a lot of examples of this Security/Hacker Mindset in HPMOR. When Harry learns of rates between magical coins vs his known prices for gold, silver, etc, he instantly thinks of a scheme to trade between the magical and the muggle world to make infinite money.
Eliezer also said that Security Mindset is something you either got or not. I remember thinking: that can’t be true!
Are we bottlenecking AI alignment on not having enough people with Eliezer-level Security Mindset, and saying “Oh well, it can’t be taught!”?! (That’s where I’ve had the “people are dropping the ball” feeling. But maybe I just don’t know enough.)
Two things seem obvious to me: - Couldn’t one devise a Security Mindset test, and get the high scorers to work on alignment? (So even if we can’t teach it, we get more people who have it)(I assume it was a similar process to find Superforecasters). - Have we already tried reallyhard to teach Security Mindset, so that we’re sure it can’t be taught? Presumably, Eliezer did try, and concluded it wasn’t teachable?
I won’t be the one doing this, since I’m unclear on whether I’m Security gifted myself (I think a little, and I think more than I used to, but I’m too low g to play high level games).
Security Mindset is basically: you can think like a hacker, of exploits, how to abuse rules etc. So you can defend against hacks & exploits. You don’t stop at basic “looks safe to me!”
Mm, that’s not exactly how I’d summarize it. That seems more like ordinary paranoia:
Lots of programmers have the ability to imagine adversaries trying to threaten them. They imagine how likely it is that the adversaries are able to attack them a particular way, and then they try to block off the adversaries from threatening that way. Imagining attacks, including weird or clever attacks, and parrying them with measures you imagine will stop the attack; that is ordinary paranoia.
My understanding is that Security Mindset-style thinking doesn’t actually rest on your ability to invent a workable plan of attack. Instead, it’s more like imagining that there exists a method for unstoppably breaking some (randomly-chosen) element of your security, and then figuring out how to make your system secure despite that. Or… that it’s something like the opposite of fence-post security, where you’re trying to make sure that for your system to be broken, several conditionally independent things need to go wrong or be wrong.
Maybe I don’t know what I’m talking about and obviously we’ve tried this already.
I’ve heard Eliezer mention that the ability to understand AI risk is linked to Security Mindset.
Security Mindset is basically: you can think like a hacker, of exploits, how to abuse rules etc. So you can defend against hacks & exploits. You don’t stop at basic “looks safe to me!”
There are a lot of examples of this Security/Hacker Mindset in HPMOR. When Harry learns of rates between magical coins vs his known prices for gold, silver, etc, he instantly thinks of a scheme to trade between the magical and the muggle world to make infinite money.
Eliezer also said that Security Mindset is something you either got or not.
I remember thinking: that can’t be true!
Are we bottlenecking AI alignment on not having enough people with Eliezer-level Security Mindset, and saying “Oh well, it can’t be taught!”?!
(That’s where I’ve had the “people are dropping the ball” feeling. But maybe I just don’t know enough.)
Two things seem obvious to me:
- Couldn’t one devise a Security Mindset test, and get the high scorers to work on alignment?
(So even if we can’t teach it, we get more people who have it)(I assume it was a similar process to find Superforecasters).
- Have we already tried really hard to teach Security Mindset, so that we’re sure it can’t be taught?
Presumably, Eliezer did try, and concluded it wasn’t teachable?
I won’t be the one doing this, since I’m unclear on whether I’m Security gifted myself (I think a little, and I think more than I used to, but I’m too low g to play high level games).
Mm, that’s not exactly how I’d summarize it. That seems more like ordinary paranoia:
My understanding is that Security Mindset-style thinking doesn’t actually rest on your ability to invent a workable plan of attack. Instead, it’s more like imagining that there exists a method for unstoppably breaking some (randomly-chosen) element of your security, and then figuring out how to make your system secure despite that. Or… that it’s something like the opposite of fence-post security, where you’re trying to make sure that for your system to be broken, several conditionally independent things need to go wrong or be wrong.
Ok, thanks for the correction! My definition was wrong but the argument still stands that it should be teachable, or at least testable.