Suppose you’re a billionaire and you want to get married. However, gold-digging people of the gender you prefer target you. They are good enough at faking attraction that you cannot tell. How should you act? One idea I had was this: pick 10000 random people and then select from there. You will at the very least likely remove most of the world-class dissemblers. Armstrong proposes a similar scheme in Siren worlds and the perils of over-optimised search.
Tangentially relevant: Armstrong has a post on some general features that make it likely that Goodharting will be a problem. Conversely, he gives some features that make it less likely you’ll wind up Goodharting in your search.
So, as long as:
We use a Bayesian mix of reward functions rather than a maximum likelihood reward function.
An ideal reward function is present in the space of possible reward functions, and is not penalised in probability.
The different reward functions are normalised.
If our ideal reward functions have diminishing returns, this fact is explicitly included in the learning process.
Then, we shouldn’t unduly fear Goodhart effects (of course, we still need to incorporate as much as possible about our preferences into the AI’s learning process). The second problem, making sure that there are no unusual penalties for ideal reward functions, seems the hardest to ensure.
This applies to many celebrities and wealthy people worth much less than a billion. Other common strategies:
Only date independently wealthy or near-equally-famous people
Get some of the value of marriage in less-committed ways than marriage. Don’t marry, just date and have some kids.
Accept the risk of financial motivation for your partner. It’s quite possible that they love you deeply, AND want the lifestyle you can give them and your shared kids.
Have trusted friends and family who are good judges of character (note: this is a similar problem to create, but many have it via luck). Incorporate their feedback about a potential partner into your judgement.
Really, EVERYONE faces this problem if they choose a monogamous long-term partner. What are they really after? Will they remain loving and supportive aside from the support they expect from me? Wealthy and celebrities face it more legibly and directly, but not uniquely.
Suppose you’re a billionaire and you want to get married. However, gold-digging people of the gender you prefer target you. They are good enough at faking attraction that you cannot tell. How should you act? One idea I had was this: pick 10000 random people and then select from there. You will at the very least likely remove most of the world-class dissemblers. Armstrong proposes a similar scheme in Siren worlds and the perils of over-optimised search.
Tangentially relevant: Armstrong has a post on some general features that make it likely that Goodharting will be a problem. Conversely, he gives some features that make it less likely you’ll wind up Goodharting in your search.
This applies to many celebrities and wealthy people worth much less than a billion. Other common strategies:
Only date independently wealthy or near-equally-famous people
Get some of the value of marriage in less-committed ways than marriage. Don’t marry, just date and have some kids.
Accept the risk of financial motivation for your partner. It’s quite possible that they love you deeply, AND want the lifestyle you can give them and your shared kids.
Have trusted friends and family who are good judges of character (note: this is a similar problem to create, but many have it via luck). Incorporate their feedback about a potential partner into your judgement.
Really, EVERYONE faces this problem if they choose a monogamous long-term partner. What are they really after? Will they remain loving and supportive aside from the support they expect from me? Wealthy and celebrities face it more legibly and directly, but not uniquely.