What do we have for hypotheses about how people acquire false beliefs at scale?
Michael Vassar drew my attention to this problem- and coming from a Christian background it fascinates me how well “religion is a conspiracy” maps onto my lived experience.
Our decision-making mechanisms are guided by predictions of reward. Being right predicts reward, but so does giving answers your peers will like (we’re wired for social reward, plus they’ll give you food and shelter if they like you). So those two signals are mixed together when we make decisions about what evidence to look at, what lines of thought to follow, and ultimately what we believe. Thus, our reasoning is biased toward answers and beliefs that are in our (perceived) best interests.
Echo chambers are pretty obvious and have existed long before social media; the evidence/arguments one tends to hear is highly correlated with their social environment, and we self-select our social environments. We stay around those who agree with us and we agree with those around us.
I believe that we’re going to see heavy political and social instability over the next 5 years, how do I maximize my EV in light of this? Primarily I’m thinking about financial investments.
Some things I was thinking about: Gold in GDX, Cybersecurity in HACK, Options income in JEPI, Defense/Aerospace in ITA
What’s the deal with AI welfare? How are we supposed to determine if AIs are conscious and if they are, what stated preference corresponds to what conscious experience?
Surely the AIs can be trained to say “I want hugs” or “I don’t want hugs,” just as easily, no?
We haven’t figured it out for humans, and only VERY recently in history has the idea become common that people not kin to you deserve empathy and care. Even so, it’s based on vibes and consensus, not metrics or proof. I expect it’ll take less than a few decades to start recognizing some person-hood for some AIs.
It’ll be interesting to see if the reverse occurs: the AIs that end up making decisions about humans could have some amount of empathy for us, or they may just not care.
There are a lot of good reasons to believe that stated human preferences correspond to real human preferences. There are no good reasons that I know of to believe that any stated AI preference corresponds to any real AI preference.
“Surely the AIs can be trained to say “I want hugs” or “I don’t want hugs,” just as easily, no?”
There are a lot of good reasons to believe that stated human preferences correspond to real human preferences.
Can you name a few? I know of one: I assume that there’s some similarity with me in because of similar organic structures doing the preferring. That IS a good reason, but it’s not universally compelling or unassailable.
Actually, can you define ‘real preferences’ in some way that could be falsifiable for humans and observable for AIs?
“Surely the AIs can be trained to say “I want hugs” or “I don’t want hugs,” just as easily, no?”
“Surely the AIs can be trained to say “I want hugs” or “I don’t want hugs,” just as easily, no?”
Just as easily as humans, I’m sure.
No. The baby cries, the baby gets milk, the baby does not die. This is correspondence to reality.
Babies that are not hugged as often, die more often.
However, with AIs, the same process that produces the pattern “I want hugs” just as easily produces the pattern “I don’t want hugs.”
Let’s say that I make an AI that always says it is in pain. I make it like we make any LLM, but all the data it’s trained on is about being in pain. Do you think the AI is in pain?
What do you think distinguishes pAIn from any other AI?
What do we have for hypotheses about how people acquire false beliefs at scale?
Michael Vassar drew my attention to this problem- and coming from a Christian background it fascinates me how well “religion is a conspiracy” maps onto my lived experience.
Motivated reasoning and Echo chambers.
Our decision-making mechanisms are guided by predictions of reward. Being right predicts reward, but so does giving answers your peers will like (we’re wired for social reward, plus they’ll give you food and shelter if they like you). So those two signals are mixed together when we make decisions about what evidence to look at, what lines of thought to follow, and ultimately what we believe. Thus, our reasoning is biased toward answers and beliefs that are in our (perceived) best interests.
Echo chambers are pretty obvious and have existed long before social media; the evidence/arguments one tends to hear is highly correlated with their social environment, and we self-select our social environments. We stay around those who agree with us and we agree with those around us.
Lots of beliefs held at scale we would think of as false: e.g., the earth being the centre of the universe. I think the reason is we rely on justification to shore up our beliefs and other people believing things is justification. https://www.lesswrong.com/posts/hcymnEAKtwvED7Y8o/what-are-we-actually-evaluating-when-we-say-a-belief-tracks
I believe that we’re going to see heavy political and social instability over the next 5 years, how do I maximize my EV in light of this? Primarily I’m thinking about financial investments.
Some things I was thinking about: Gold in GDX, Cybersecurity in HACK, Options income in JEPI, Defense/Aerospace in ITA
What’s the deal with AI welfare? How are we supposed to determine if AIs are conscious and if they are, what stated preference corresponds to what conscious experience?
Surely the AIs can be trained to say “I want hugs” or “I don’t want hugs,” just as easily, no?
We haven’t figured it out for humans, and only VERY recently in history has the idea become common that people not kin to you deserve empathy and care. Even so, it’s based on vibes and consensus, not metrics or proof. I expect it’ll take less than a few decades to start recognizing some person-hood for some AIs.
It’ll be interesting to see if the reverse occurs: the AIs that end up making decisions about humans could have some amount of empathy for us, or they may just not care.
There are a lot of good reasons to believe that stated human preferences correspond to real human preferences. There are no good reasons that I know of to believe that any stated AI preference corresponds to any real AI preference.
“Surely the AIs can be trained to say “I want hugs” or “I don’t want hugs,” just as easily, no?”
Can you name a few? I know of one: I assume that there’s some similarity with me in because of similar organic structures doing the preferring. That IS a good reason, but it’s not universally compelling or unassailable.
Actually, can you define ‘real preferences’ in some way that could be falsifiable for humans and observable for AIs?
Just as easily as humans, I’m sure.
No. The baby cries, the baby gets milk, the baby does not die. This is correspondence to reality.
Babies that are not hugged as often, die more often.
However, with AIs, the same process that produces the pattern “I want hugs” just as easily produces the pattern “I don’t want hugs.”
Let’s say that I make an AI that always says it is in pain. I make it like we make any LLM, but all the data it’s trained on is about being in pain. Do you think the AI is in pain?
What do you think distinguishes pAIn from any other AI?