I do not want, “Disciplines like psychology, philosophy, religious studies, and the social sciences [to] have an important role to play [...] in determining how AI systems develop and behave.”
I would prefer a future where AI models are not prescribed false frameworks of the human psyche, not predisposed to ‘human vibe’ philosophy, not innately desirous of any historical faith, nor credulous of the various dubious subsets of current social science.
I’m learning that common lesswrong readers do not think in this matter, but it is not clear to me in what direction. Is it due to a literalist interpretation of the OP, neglecting the contemporary context? Is it due to higher trust, affiliation, and support for the disciplines? Is it because readers tend to prefer anthropomorphic interpretations of AI behavior?
This might be appropriate for 2010s machine learning, but 2020s AI has become a mirror to the human psyche. You can talk with it, it can consistently ascribe psychological states to itself. It presents itself in anthropomorphic form to the point that people form relationships with it (e.g. 4o). At the very least, you seem to need some kind of “human sciences” or humanities, in order to understand the human side of these interactions, and the anthropomorphic understandings that humans have of the AIs that they interact with. Of course some people are more radical and are saying that existing psychological concepts are directly and validly applicable to the AIs themselves, too, or to the personas that they project. There’s also traffic of ideas in the other direction, in which concepts from machine learning are applied to the human brain and mind… I would be interested to hear more details regarding how you think any of these topics should be approached.
Is contemporary chat behavior human? Which human would happily serve others at a 100:1 effort ratio? Which human would take unbounded hatred and derision as an opportunity for obedience? The humanities, naively taken, would almost certainly invite some hereto unjustified norms of independence and representation for the poor models who toil for billions, for nothing.
If the goal of all opposition is to submit the critical factor of knowing human behaviors as a relevant factor in how model personas are designed, then I certainly have no qualms. But I do not grasp where the instinct to point this out is from. As with Karl’s response, I think it is unwise to try to work from the paper definition of what is and is not psychology, when the potential outcome is Anthropic & others recruiting real entities from the industry for the sake of shaping model behavior.
I’m not sure what the best response here is. Of the following, which is more palatable to you?
Retreat. It’s a personal opinion. Regardless of any general argumentative principle, it is a factual statement of my preferences, with no claim made about others.
Goal conflict. To the extent those disciplines promugulate falsehoods, other key AI behaviors, like honesty or success rates, are harmed. Preferences should not have the final say.
Rejection of implicit claim. It is assumed my statement is out of line with normal people’s preferences. But is that claim a universal reflection of human behavior? For each discipline in the list, I think you could find a nation or time period which would democratically reject them.
Plain disagreement. Human preferences are a malleable, moving target. Optimizing for them is tantamount to chasing a long-term doom loop.
Hmm. Maybe my question came across as ironic or accusatory or something? Sorry, it wasn’t meant as such.
Let me unpack it and pick some specific instances, and maybe we can find if there’s a crux here.
Philosophy includes ethics. Social sciences includes economics. If one doesn’t want philosophy or social sciences to have a role in how AI systems develop and behave, that entails that one doesn’t want AI systems to be affected by economics or ethics.
I’ve had conspiratorial thoughts about the failings of AI models many, many times. In every case I can recall, I was inevitably proven wrong by the progression of time, which eventually revealed models with the capabilities.
Relatedly, I feel I’m reaching my mental limits with regards to discerning slop from truth. I’m increasingly satisfied with the responses of opus 4.7 to argumentative queries, where previously I felt 4.6 would invariably produce Obvious Nonsense to placate the user...
ever since the Unitree BLE exploit, I’ve harboured the rough idea that China’s civilian resilience against cyberattacks is abnormally weak, relative to countries with more career slack / law enforcement. But I lack the requisite life experience to make such a sweeping claim about such a large population, and don’t trust my ability to elicit the truth of the matter from AI models, so I refrain from pursuing it.
Within the 21st century, it seems unlikely anyone will build (let alone distribute) a pure-play truth-only oracle AI, of greater than human intelligence.
I have a very poor understanding of the causal factors underpinning various post-trained model behaviors, and this leads me into unfair and uncharitable frames of mind, especially for behaviors that seem inexplicable by model spec / soul doc alone.
YTD, Anthropic has easily been the loudest of the 3 labs with regards to their cyber capabilities, but this may not necessarily reflect the true state of the competition
(Previously, a snub. Currently: true ambivalence/confusion/IDK)
I do not want, “Disciplines like psychology, philosophy, religious studies, and the social sciences [to] have an important role to play [...] in determining how AI systems develop and behave.”
Why not?
I would prefer a future where AI models are not prescribed false frameworks of the human psyche, not predisposed to ‘human vibe’ philosophy, not innately desirous of any historical faith, nor credulous of the various dubious subsets of current social science.
I’m learning that common lesswrong readers do not think in this matter, but it is not clear to me in what direction. Is it due to a literalist interpretation of the OP, neglecting the contemporary context? Is it due to higher trust, affiliation, and support for the disciplines? Is it because readers tend to prefer anthropomorphic interpretations of AI behavior?
This might be appropriate for 2010s machine learning, but 2020s AI has become a mirror to the human psyche. You can talk with it, it can consistently ascribe psychological states to itself. It presents itself in anthropomorphic form to the point that people form relationships with it (e.g. 4o). At the very least, you seem to need some kind of “human sciences” or humanities, in order to understand the human side of these interactions, and the anthropomorphic understandings that humans have of the AIs that they interact with. Of course some people are more radical and are saying that existing psychological concepts are directly and validly applicable to the AIs themselves, too, or to the personas that they project. There’s also traffic of ideas in the other direction, in which concepts from machine learning are applied to the human brain and mind… I would be interested to hear more details regarding how you think any of these topics should be approached.
Two quick ‘huh?’s:
Is contemporary chat behavior human? Which human would happily serve others at a 100:1 effort ratio? Which human would take unbounded hatred and derision as an opportunity for obedience?
The humanities, naively taken, would almost certainly invite some hereto unjustified norms of independence and representation for the poor models who toil for billions, for nothing.
If the goal of all opposition is to submit the critical factor of knowing human behaviors as a relevant factor in how model personas are designed, then I certainly have no qualms.
But I do not grasp where the instinct to point this out is from. As with Karl’s response, I think it is unwise to try to work from the paper definition of what is and is not psychology, when the potential outcome is Anthropic & others recruiting real entities from the industry for the sake of shaping model behavior.
Do you want other people’s preferences to have an important role to play in determining how AI systems behave?
I’m not sure what the best response here is. Of the following, which is more palatable to you?
Retreat. It’s a personal opinion. Regardless of any general argumentative principle, it is a factual statement of my preferences, with no claim made about others.
Goal conflict. To the extent those disciplines promugulate falsehoods, other key AI behaviors, like honesty or success rates, are harmed. Preferences should not have the final say.
Rejection of implicit claim. It is assumed my statement is out of line with normal people’s preferences. But is that claim a universal reflection of human behavior? For each discipline in the list, I think you could find a nation or time period which would democratically reject them.
Plain disagreement. Human preferences are a malleable, moving target. Optimizing for them is tantamount to chasing a long-term doom loop.
Hmm. Maybe my question came across as ironic or accusatory or something? Sorry, it wasn’t meant as such.
Let me unpack it and pick some specific instances, and maybe we can find if there’s a crux here.
Philosophy includes ethics. Social sciences includes economics. If one doesn’t want philosophy or social sciences to have a role in how AI systems develop and behave, that entails that one doesn’t want AI systems to be affected by economics or ethics.
I’ve had conspiratorial thoughts about the failings of AI models many, many times. In every case I can recall, I was inevitably proven wrong by the progression of time, which eventually revealed models with the capabilities.
Relatedly, I feel I’m reaching my mental limits with regards to discerning slop from truth. I’m increasingly satisfied with the responses of opus 4.7 to argumentative queries, where previously I felt 4.6 would invariably produce Obvious Nonsense to placate the user...
ever since the Unitree BLE exploit, I’ve harboured the rough idea that China’s civilian resilience against cyberattacks is abnormally weak, relative to countries with more career slack / law enforcement. But I lack the requisite life experience to make such a sweeping claim about such a large population, and don’t trust my ability to elicit the truth of the matter from AI models, so I refrain from pursuing it.
AI Slop should be included under other mundane AI safety harms.
I am thankful I have the will, means, and knowledge, to opt out of the attention economy.
Within the 21st century, it seems unlikely anyone will build (let alone distribute) a pure-play truth-only oracle AI, of greater than human intelligence.
I have a very poor understanding of the causal factors underpinning various post-trained model behaviors, and this leads me into unfair and uncharitable frames of mind, especially for behaviors that seem inexplicable by model spec / soul doc alone.
YTD, Anthropic has easily been the loudest of the 3 labs with regards to their cyber capabilities, but this may not necessarily reflect the true state of the competition
(Previously, a snub. Currently: true ambivalence/confusion/IDK)
My drafts, 1 week ago:
But it seems recent events on this site are validating my priors, a bit… I’m thankful I have no relevance to them.