Giving unsolicited advice and criticism is a very good credible signal of respect
I have often heard it claimed that giving advice is a bad idea because most people don’t take it well and won’t actually learn from it.
Giving unsolicited advice/criticism risks:
The recipient liking you less
The recipient thinking you are stupid because “obviously they have heard this advice before”
The recipient thinking you are stupid because they disagree with the advice
The recipient being needlessly offended without any benefit
People benefit from others liking them and not thinking they are stupid, so these are real costs. Some people also don’t like offending others.
So clearly it’s only worth giving someone advice or criticism if you think at least some of the following are true:
Their wellbeing/impact/improvement is important enough that the small chance your advice has a positive impact is worth the cost
They are rational enough to not take offense in a way that would damage your relationship
They are particularly good at using advice/criticism, i.e. they are more likely to update than the average person
They value honest opinions and feedback even when they disagree, i.e. they prefer to know what others think about them because it’s interesting and potentially useful information even if not immediately actionable
The above points all reflect a superior attitude compared to the average person. And so, if you choose to give someone advice or criticism despite all the associated risks, you are credibly signaling that you think they have these positive traits.
Not giving unsolicited advice and criticism is selfish
The “giving advice is bad” meme is just a version of “being sycophantic is good”—you personally benefit when others like you and so often it’s useful to suck up to people.
Even the risk that your interlocutor is offended is not a real risk to their wellbeing—people dislike offending others because it feels uncomfortable to them. Being offended is not actually meaningfully harmful to the offended party.
No doubt that sycophancy and the fear of expressing potentially friendship damaging truths allows negative patterns of behavior to continue unimpeded but I think you’ve missed the two most necessary factors in determining if advice—solicited or unsolicited—is a net benefit to the recipient:
1. you sufficiently understand and have the expertise to comment on their situation & 2. you can offer new understanding they aren’t already privy to.
Perhaps the situations where I envision advice is being given is different to yours?
The problem I notice with most unsolicited advice is it’s either something the recipient is already aware of (i.e. the classic sitcom example is someone touches a hot dish and after the fact is told “careful that pan is hot”—is it good advice? Well in the sense that it is truthful, maybe. But the burn already having happened, it is not longer useful.) This is why it annoys people, this is why it is taken as an insult to their intelligence.
A lot of people have already heard the generic or obvious advice and there may be many reasons why they aren’t following it,[1] and most of the time hearing this generic advice being repeated will not be of a benefit even if they have all the qualities you enumerate: that you’re willing to accept the cost of giving advice, that they are rational enough to not take offense, they are good at taking advice and criticism, and they value honest feedback even when they disagree.
Take this example exchange:
A: “Why are you using the grill to make melted cheese, we have a toaster oven.”
B: “the toaster is shorted out, it’s broken”
You must sufficiently understand the recipient’s situation if you are to have any hope of improving it. If you don’t know what they know about the toaster oven, then unsolicited advice can’t help.
Another major problem I’ve found with unsolicited advice is that it lacks fine grain execution detail. My least favourite advice as a freelancing creative is “you need to get your name out there”—where is there? On that big nebulous internet? How does that help me exactly? Unless I needed further reinforcement of the fact that what material I am putting online isn’t reaching my current interlocutor—but it doesn’t give me any clues how to go about remedying that.
Advice, for it to be useful needs more than just sympathy and care for the person’s well being—it needs understanding of the situation which is the cause of their behavior.
My personal metric for the “quality” of advice is how actionable it is. This means that it can’t be post-facto (like the sitcom hot pan), it needs to understand causes and context—such as why they aren’t using the toaster oven, and most importantly it needs to suggest explicit actions that can be taken in order to change the situation (and which cations the recipient can or can’t take can only be determined by properly understanding their situation and the causes of their behavior).
Caveat: I’m sure there’s a genre of fringe cases where repetition becomes “the medium is the message”—that is they do need to hear it again. But there’s a certain point where doing the same thing again and again and expecting a different result is quite stark raving mad.
Perhaps a situation to avoid giving advice in is if you think your advice is likely to be genuinely worthless because you have no expertise, knowledge, or intelligence that is relevant to the matter and you don’t trust your own judgment at all. Otherwise if you respect the other person, you’d consider them able to judge the usefulness of your advice for themselves.
You can’t know for sure that they’ve heard some advice before. Also you are providing the information that the piece of advice occurred to you, which in and of itself is often interesting/useful. So if you’re giving someone advice they are likely to have heard before this means there is a small chance that’s wrong and it’s still useful, and a larger chance that it has value zero. So in expectation the value is still positive. If you don’t give the advice, you are prioritizing not looking stupid or not offending them, which are both selfish motives.
Related to (2) is that telling someone you disapprove or think less of them for something, i.e. criticizing without providing any advice at all, is also a good signal of respect, because you are providing them with possibly useful information at the risk of them liking you less or making you feel uncomfortable.
In my opinion, this misses the crucial dynamic that the costs of giving advice significantly go up if you care about what the other person thinks of you, which is correlated with respect, status and power. I personally think that giving advice is good, that if given tactfully many people take it well, and also often enjoy giving it, so will generally try to do this wherever possible unless there’s a clear reason not to, especially in the context of EG interpretability research. But I’m much more cautious if I’m talking to someone who seems important, consider themselves high status, has power over me, etc. I think this is a large part of why people can feel offended by receiving advice. There can be some implicit sense of “you are too stupid to have thought of this”, especially if the advice is bad or obvious.
Another important facet is that most people are not (competent) utilitarians about social interactions, so you cannot accurately infer their beliefs with reasoning like this.
Fair, there’s a real tension between signaling that you think someone has a good mindset (a form of intellectual respect) and signaling that you are scared of someone’s power over you or that you care a lot about their opinion of you.
I noticed feeling a little unsatisfied and worried about this advice. I think it pattern matches with people who are savvy with status games or subtle bullying that allows for plausible deniability (“I’m just trying to help! You’re being too sensitive.”). I think people’s heuristic of perceiving criticisms as threatening seems somewhat justified most of the time.
To be clear, I tentatively define respect as the act of (a) evaluating a person as having an amount of value and welfare that is just as important as yours, (b) believing that this person’s value and welfare is worth caring about, and (c) treating them as such. You don’t have to admire or like a person to respect them. Here are some actions that connote disrespect (or indignity): torture, murder, confinement, physical abuse, verbal abuse, causing a person’s social standing to drop unnecessarily, etc. Having said that, I’m still not satisfied with this definition, but it’s the best I can come up so far.
Maybe you’ve thought about this already or I’ve missed some implicit assumptions, but let me try to explain by first using Buck’s experience as an example:
A lot of the helpful criticism I’ve gotten over the last few years was from people who were being kind of unreasonable and unfair.
One simple example of this is that one time someone (who I’d engaged with for many hours) told me he didn’t take my ideas seriously because I had blue hair. On the one hand, fuck that guy; on the other hand, it’s pretty helpful that he told me that, and I’m grateful to him for telling me.
I interpret this as Buck (a) being appreciative of a criticism that seems unreasonable and unfair, yet (b) his need for respect wasn’t fulfilled—I would probably say “fuck that guy” too if they thought my opinions don’t matter in any situation due to the color of my hair.
I could imagine Buck’s interlocutor passing your above conditions:
They might believe that Buck can be more impactful when other people see him with normal looking hair colour and takes him more seriously.
They might believe Buck is rational enough (but it turns out Buck was offended anyway).
They might believe Buck is good at using advice/criticism.
They might believe Buck values opinions and feedback even when they disagree (this is true).
I could also imagine Buck’s interlocutor doing a cost-benefit analysis and believing the associated costs you mentioned above are worth it. And yet, Buck was still at least a bit offended, and I think it would be reasonable to believe that this person’s criticism was actually not a good credible signal of respect.
One may argue that Buck isn’t being rational. If he did, he wouldn’t be offended. “Huh, this guy believed that the benefits of giving that criticism outweighs the cost of me liking them less, thinking that they are stupid, and me being offended outweigh. Seems like a credible signal of respect.”
I mean Buck was appreciative of that advice, but an advice being valuable is not necessarily a credible signal of respect. I could imagine an boss giving valuable advice that still checks all your conditions, but does it in a abusive manner.
My tentative version of what an unsolicited advice that’s also a good credible signal of respect would have more of the following conditions met:
The interlocutor actually communicating their criticism in a respectful way (as I’ve defined above). This seems like a necessary condition to pass.
The interlocutor made at least some effort to craft a good criticism/advice. One way this could work is for the interlocutor to ask questions and learn more about their advisee, which is probably a standard in many problem solving frameworks used by management consultants. But a mistake can sometimes be straightforwardly obvious that a low effort criticism works, so this condition is not sufficient on its own.
The interlocutor noticing that their advice could be wrong and very costly to heed. Again, not sufficient on its own.
The interlocutor showing care and authenticity, and showing that their advice isn’t some status-seeking one-upmanship, or a way to “create common knowledge of their status difference” (as a friend pointed out to me as another possibility).
I might be misunderstanding you though, so happy to update!
And thanks for writing this! I do think you are on to something—I do want to get better at feedback giving and receiving, and if done well and at a higher frequency (this might be what you’re pointing to), could make me more impactful.
How people respond tells you something about them, so you don’t necessarily need to start with a clear picture of how they might respond.
Also, I think advice is the wrong framing for things that are useful to give, it’s better to make sure people have the knowledge and skills to figure out the things they seem to need to figure out. Similarly to the “show, don’t tell” of educational discussions, you want to present the arguments and not the conclusions, let alone explicitly insist that the other person is wrong about the conclusions. Or better yet, promote the skills that let them assemble the arguments on their own, without needing to concretely present the arguments.
It might help to give the arguments and even conclusions or advice eventually, after everything else is done, but it’s not the essential part and might be pointless or needlessly confrontational if the conclusions they arrive at happen to differ.
Any rule about when to give advice has to be robust to people going on and on to lecture you about Jesus because they truly and sincerely want to keep you out of Hell. (Or lecture about veganism, or EA, or politics.)
More generally, social rules about good manners have to apply to everyone—both to people with correct beliefs and to people with incorrect ones. Just like not letting the police break into everyone’s houses catches fewer criminals (when the police are right), but protects innocent people (when the police are wrong), not giving advice helps fewer people (when the advice giver is right), but saves people from arrogant know it alls and meme plagues (when the advice giver is wrong).
I think this discussion about advice is very fruitful. I think the existing comments do a great job of characterizing why someone might reasonably be offended. So if we take that as the given situation: you want to help people, project respect, but don’t want it to come off the wrong way, what could you do?
My partial answer to this, is merely sharing your own authentic experience of why you are personally persuaded by the content of the advice, and allowing them to internalize that evidence and derive inferences for themselves. At social gatherings, the people in my life do this- just sharing stories, sometimes horror stories where the point is so obvious that it doesn’t need explicit statement. And it feels like a genuine form of social currency to faithfully report on your experiences. This reminds me of “Replace the Symbol with the Substance” [1] where the advice is the symbol and the experience is the substance.
So I wonder if that’s part of it—creating the same change in the person anyway a the while mitigating the risk of condescension. The dynamics of the relationship also complicate analyzing the situation. And in what type of social setting the advice is delivered. And probably a bunch more factors I haven’t thought of yet.
I enjoyed the combination of “these are real costs” and “positive impact is worth the cost.”
I found this insightful, ”...reflect a superior attitude...give...advice or criticism...signaling...they have...these positive traits”
I think the challenge lies in categorizing people as “superior” and “average”. I like the use of labels since it helps the conversation, but I wonder if it is too limiting. Perhaps, context and topic are important dimensions worthy of consideration as well. I can imagine real people responding differently given more variables, such as context and topic.
A recent NYT article about Orchid’s embryo selection program triggered a surprising to me backlash on X where people expressed disgust and moral disapproval at the idea of embryo selection. The arguments generally fell into two categories:
(1) “The murder argument” Embryo selection is bad because it involves creating and then discarding embryos, which is like murdering whole humans. This argument also implies regular IVF, without selection, is also bad. Most proponents of this argument believe that the point of fertilization marks a key point when the entity starts to have moral value, i.e. they don’t ascribe the same value to sperm and eggs.
(2) “The egalitarian argument” Embryo selection is bad because the embryos are not granted the equal chance of being born they deserve. “Equal chance” here is probably not quite the correct phrase/is a bit of a strawman (because of course fitter embryos have a naturally higher chance of being born). Proponents of this argument believe that intervening on the natural probability of any particular embryo being born is anti-egalitarian and this is bad. By selecting for certain traits we are saying people with those traits are more deserving of life, and this is unethical/wrong.
At face value, both of these arguments are valid. If you buy the premises (“embryos have the moral value of whole humans”, “egalitarianism is good”) then the arguments make sense. However, I think it’s hard to justify moral value beginning at the point of fertilization.
On argument (1):
If we define murder as “killing live things” and decide that murder is bad (an intuitive decision), then “the murder argument” holds up. However, I don’t think we actually think of murder as “killing live things” in real life. We don’t condemn killing bacteria as murder. The anti-IVF people don’t condemn killing sperm or egg cells as murder. So the crux here is not whether the embryo is alive, but rather whether it is of moral value. Proponents of this argument claim that the embryo is basically equivalent to a full human life. But to make this claim, you must appeal to its potential. It’s clear that in its current state, an embryo is not a full human. The bundle of cells has no ability to function as a human, no sensations, no thoughts, no pain, no happiness, no ability to survive or grow on its own. We just know the given the right conditions, the potential for a human life exists. But as soon as we start arguing about how the potential of something grants it moral value, it becomes difficult to draw the line arbitrarily at fertilization. From the point of view of potential humans, you can’t deny sperm and eggs moral value. In fact, every moment a woman spends not pregnant is a moment she is ridding the world of potential humans.
On argument (2):
If you grant the premise that any purposeful intervention on the probabilities of embryos being born is unethical because it violates some sacred egalitarian principle then it’s hard to refute argument (2). Scott Alexander has argued that encouraging a woman to rehabilitate from alcoholism before getting pregnant is equivalent to preferring the healthy baby over the baby with fetal alcohol syndrome, something argument (2) proponents oppose. However, I think this is a strawman. The egalitarians think every already-producedembryo should be given as equal a chance as possible. They are not discussing identity changes of potential embryos. However, again we run into the “moral value from potential” problem. Sure, you can claim that embryos have moral value for some magical God-given reason. But my intuition is that in their hearts, the embryo-valuers are using some notion of potential full human life to ground their assessment. In which case again we run into the arbitrariness of the fertilization cutoff point.
So in summary, I think it’s difficult to justify valuing embryos without appealing to their potential, which leads us to value earlier stages of potential humans. Under this view, it’s a moral imperative to not prevent the existences of any potential humans, which looks like maximizing the number of offspring you have. Or as stated in this xeet
every combo of sperm + egg that can exist should exist. we must get to the singularity so that we can print out all possible humans and live on an incredibly alive 200 story high coast to coast techno favela
People like to have clear-cut moral heuristics like “killing is bad.” This gives them an easy guide to making a morally correct decision and an easy guide to judging other’s actions as moral or immoral. This requires simplifying multidimensional situations into easily legible scenarios where a binary decision can be made. Thus you see people equating embryo disposal to first-degree murder, and others advocating for third-trimester abortion rights.
Regarding egalitarian-like arguments, I suspect many express opposition to embryo selection not because it’s a consequence of a positive philosophy that they state and believe and defend, but because they have a negative philosophy that tells them what positions are to be attacked.
I suspect that if you put together the whole list of what they attack, there would be no coherent philosophy that justifies it (or perhaps there would be one, but they would not endorse it).
There is more than zero logic to what is to be attacked and what isn’t, but it has more to do with “Can you successfully smear your opponent as an oppressor, or as one who supports doctrines that enable oppression; and therefore evil or, at best, ignorant if they immediately admit fault and repent; in other words, can you win this rhetorical fight?” than with “Does this argument, or its opposite, follow from common moral premises, data, and logical steps?”.
In this case, it’s like, if you state that humans with blindness or whatever have less moral worth than fully healthy humans, then you are to be attacked; and at least in the minds of these people, selecting embryos of the one kind over the other is close enough that you are also to be attacked.
Sure, you can claim that embryos have moral value for some magical God-given reason. But my intuition is that in their hearts, the embryo-valuers are using some notion of potential full human life to ground their assessment. In which case again we run into the arbitrariness of the fertilization cutoff point.
Some people believe embryos have souls which may impact their moral judgement. Soul can be considered as “full human life” in moral terms. I think attributing this to purely potential human life may not be accurate, since the intuitions for essentialist notions of continuity of selfhood can be often fairly strong among certain people.
I appreciate the pursuit of non-strawman understandings of misgivings around reprogenetics, and the pursuit of addressing them.
I don’t feel I understand the people who talk about embryo selection as “killing embryos” or “choosing who lives and dies”, but I want to and have tried, so I’ll throw some thoughts into the mix.
Hart, IIUC, argues that wanting to choose who will live and who won’t means you’re evil and therefore shouldn’t be making such choices. I think his argument is ultimately stupid, so maybe I still don’t get it. But anyway, I think it’s an importantly different sort of argument than the two you present. It’s an indictment of the character of the choosers.
Second: When I tried to empathize with “life/soul starts at conception”, what I got was:
We want a simple boundary…
… for political purposes, to prevent…
child sacrifice (which could make sense given the cults around the time of the birth of Christianity?).
killing mid-term fetuses, which might actually for real start to have souls.
… for social purposes, because it causes damage to ….
the would-be parents’s souls to abort the thing which they do, or should, think of as having a soul.
the social norm / consensus / coordination around not killing things that people do or should orient towards as though they have souls.
The pope said so. (...But then I’d like to understand why the pope said so, which would take more research.) (Something I said to a twitter-famous Catholic somehow caused him to seriously consider that, since Yermiahu says that god says “Before I formed you in the womb I knew you...”, maybe it’s ok to discard embryos before implantation...)
(My invented explanation:) Souls are transpersonal. They are a distributed computation between the child, the parents, the village, society at large, and humanity throughout all time (god). As an embryo grows, the computation is, gradually, “handed off to / centralized in” the physical locus of the child. But already upon conception, the parents are oriented towards the future existence of the child, and are computing their part of the child’s soul—which is most of what has currently manifested of the child’s soul. In this way, we get:
From a certain perspective:
It reflects poorly on would-be parents who decide to abort.
It makes sense for the state to get involved to prevent abortion. (I don’t agree with this, but hear me out:)
The perspective is one which does not acknowledge the possibility of would-be parents not mentally and socially orienting to a pregnancy in the same way that parents orient when they are intending to have children, or at least open to it and ready to get ready for it.
...Which is ultimately stupid of course, because that is a possibility. So maybe this is still a strawman.
Well, maybe the perspective is that it’s possible but bad, which is at least usefully a different claim.
Within my invented explanation, the “continuous distributed metaphysics of the origins of souls”, it is indeed the case that the soul starts at conception—BUT in fact it’s fine to swap embryos! It’s actually a strange biodeterminism to say that this clump of cells or that, or this genome or that, makes the person. A soul is not a clump of cells or a genome! The soul is the niche that the parents, and the village, have already begun constructing for the child; and, a little bit, the soul is the structure of all humanity (e.g. the heritage of concepts and language; the protection of rights; etc.).
People talk about meditation/mindfulness practices making them more aware of physical sensations. In general, having “heightened awareness” is often associated with processing more raw sense data but in a simple way. I’d like to propose an alternative version of “heightened awareness” that results from consciously knowing more information. The idea is that the more you know, the more you notice. You spot more patterns, make more connections, see more detail and structure in the world.
Compare two guys walking through the forest: one is a classically “mindful” type, he is very aware of the smells and sounds and sensations, but the awareness is raw, it doesn’t come with a great deal of conscious thought. The second is an expert in botany and birdwatching. Every plant and bird in the forest has interest and meaning to him. The forest smells help him predict what grows around the corner, the sounds connect to his mental map of birds’ migratory routes.
Sometimes people imply that AI is making general knowledge obsolete, but they miss this angle—knowledge enables heightened conscious awareness of what is happening around you. The fact that you can look stuff up on Google, or ask an AI assistant, does not actually lodge that information in your brain in a way that lets you see richer structure in the world. Only actually knowing does that.
Yeah, two people can read the same Wikipedia page, and get different levels of understanding. The same is true for reading the same AI output. No matter how nicely the AI puts it, either it connects with something in your brain or it doesn’t.
In theory, with a superhuman general AI, we could say something like “hey, AI, teach me enough to fully appreciate the thing that you just wrote” (with enough patience and a way to reduce hallucinations, we might be able to achieve a similar effect even with current AIs), but most people probably won’t bother.
Perhaps it’s that those people say AI is making general knowledge obsolete because it reduces the social value or status of possessing general knowledge by making it an abundant resource. As you said though, the fact that people have access to that abundant resource doesn’t mean they understand how to properly make use of it. The capability to understand is still a scarce resource.
The risk of incorrectly believing in moral realism
(Status: not fully fleshed out, philosophically unrigorous)
A common talking point is that if you have even some credence in moral realism being correct, you should act as if it’s correct. The idea is something like: if moral realism is true and you act is if it’s false, you’re making a genuine mistake (i.e. by doing something bad), whereas if it’s false and you act as if it’s true, it doesn’t matter (i.e. because nothing is good or bad in this case).
I think this way of thinking is flawed, and in fact, the opposite argument can be made (albeit less strongly): if there’s some credence in moral realism being false, acting as if it’s true could be very risky.
The “act as if moral realism is true if unsure” principle contrasts moral realism, (i.e. that there is an objective moral truth, independent of any particular mind) with nihilism (i.e. nothing matters). But these are not the only two perspectives you could have. Moral subjectivism is a to-me intuitively compelling anti-realist view, which says that the truth value of moral propositions is mind-dependent (i.e. based on an individual’s beliefs about what is right and wrong).
From a moral subjectivist perspective, my actions can be justified by what I think is good, and your actions can be justified by what you think is good, and these things can disagree.
Importantly, compared to moral realism, moral subjectivism implies a different strategy when it comes to coordinating with others to achieve good things. If I am a moral realist, I may hope that with enough effort, I can prove to others (other people, or even machines), that something is good or bad. Whereas if I’m a moral subjectivist, this idea seems rather doomed. I need to accept that others may have a different, valid to them, conception of good. And so my options are either to overpower them (by not letting them achieve their idea of good when it conflicts with mine) or trade with them.
If I decide to “act as if moral realism is true”, I might spend a lot of resources trying to prove my idea of goodness to others, instead of directly pursuing my goals or trading with those who disagree. Furthermore, if everyone adopts this strategy, we end up in a long, unproductive fight that can never be resolved, instead of engaging in mutually-beneficial trades wherever possible.
This may pose a practical issue when it comes to AI development: if AI developers believe that there’s an objectively correct morality that the AI should follow, they may end up violating almost all people’s subjective conception of goodness in pursuit of an objective goodness that doesn’t exist.
And so my options are either to overpower them (by not letting them achieve their idea of good when it conflicts with mine) or trade with them.
There’s room for persuasion and deliberation as well. Moral anti-realists can care about how other people form moral beliefs (e.g. quality of justifications, coherence of values, non-coercion).
Moral anti-realism shouldn’t insist that a person’s values are apparent to that person, what they currently think is good. Moral realism likes to declare the dubious assumption that everyone’s values-on-reflection should be the same (in the limit), but hardly uses this assumption. Instead, it correctly points out that values-on-reflection are not the same as currently-apparent-values, that arguments about values are worthwhile. But the same should be the case when we allow (normative) orthogonality, where everyone’s values-on-reflection can (normatively) end up different. Worthwhile arguments can even be provided by one person to another, about that other’s person misunderstanding of their own different values.
my actions can be justified by what I think is good, and your actions can be justified by what you think is good, and these things can disagree
It’s easy to conflate three different things: 1. Whether or not there is an objective collection of moral facts 2. Whether or not it is possible to learn objective moral facts 3. Whether or not I should convince someone to believe a certain set of moral facts in a given situation
We can deny (1) with moral subjectivism. We can accept (1) but deny (2) by claiming that there are objective moral facts, but some (or all) of these are unknowable to some (or all) of humanity (moral realists don’t need to think that they can prove anything to others) We can accept (1) and (2) but deny (3) by saying that persuasion isn’t always the action that maximizes moral outcomes. Maybe the way to achieve the morally best outcome is actually to convince someone else of some false ideas that end up leading to morally useful actions (e.g. in 1945 we could convince Hitler’s colleagues that it’s righteous in general to backstab your colleagues if it meant one of them might kill Hitler)
So moral realists can accept that others will have other conceptions of good, and believe that the best options are to overpower or trade with those others (rather than convince them). They’re not perfect examples, but we’ve seen many moral realists do this throughout history (e.g. the Crusades). I think whether or not convincing others of your sense of morality is a morality-maximizing action depends a lot on the specifics of your morality and the context you’re in.
I think people who predict significant AI progress and automation often underestimate how human domain experts will continue to be useful for oversight, auditing, accountability, keeping things robustly on track, and setting high-level strategy.
Having “humans in the loop” will be critical for ensuring alignment and robustness, and I think people will realize this, creating demand for skilled human experts who can supervise and direct AIs.
(I may be responding to a strawman here, but my impression is that many people talk as if in the future most cognitive/white-collar work will be automated and there’ll be basically no demand for human domain experts in any technical field, for example.)
Oversight, auditing, and accountability are jobs. Agriculture shows that 95% of jobs going away is not the problem. But AI might be better at the new jobs as well, without any window of opportunity where humans are initially doing them and AI needs to catch up. Instead it’s AI that starts doing all the new things well first and humans get no opportunity to become competitive at anything, old or new, ever again.
Even formulation of aligned high-level tasks and intent alignment of AIs make sense as jobs that could be done well by misaligned AIs for instrumental reasons. Which is not even deceptive alignment, but still plausibly segues into gradual disempowerment or sharp left turn.
I think this criticism doesn’t make sense without some description of the AI progress its conditioning on. Eg in a Tyler Cowen world, I agree. In an Eliezer world I disagree.
Inspired by a number of posts discussing owning capital + AI, I’ll share my own simplistic prediction on this topic:
Unless there is a hostile AI takeover, humans will be able to continue having and enforcing laws, including the law that only humans can own and collect rent from resources. Things like energy sources, raw materials, and land have inherent limits on their availability—no matter how fast AI progresses we won’t be able to create more square feet of land area on earth. By owning these resources, you’ll be able to profit from AI-enabled economic growth as this growth will only increase demand for the physical goods that are key bottlenecks for basically all productive endeavors.
To elaborate further/rephrase: sure, you can replace human programmers with vastly more efficient AI programmers, decreasing the human programmers’ value. In a similar fashion you can replace a lot of human labor. But an equivalent replacement for physical space or raw materials for manufacturing does not exist. With an increase in demand for goods caused by a growing economy, these things will become key bottlenecks and scarcity will increase their price. Whoever owns them (some humans) will be collecting a lot of rent.
Even simpler version of the above: economics traditionally divides factors of production into land, labor, capital, entrepreneurship. If labor costs go toward zero you can still hodl some land.
Besides the hostile AI takeover scenario, why could this be wrong (/missing the point)?
Space has resources people don’t own. The earth’s mantle a couple thousand feet down potentially has resources people don’t own. More to the point maybe, I don’t think humans will be able to continue enforcing laws barring a hostile takeover in the way you seem to think.
Imagine we find out that aliens are headed for earth and will arrive in a few years. Just from the light emissions their probes and expanding civilisation give off, we can infer that they’re obviously more technologically mature than us, probably already engineered themselves to be much smarter than us, and can basically do whatever they want with the atoms that make up our solar system and there’s nothing we can do about it. We don’t know what they want yet though. Maybe they’re friendly?
I think guessing that the aliens will be friendly and share human morality to an extent seems like a pretty specific guess about their minds to be making, and is maybe false more likely than not. But guessing that they don’t care about human preferences or well-being but do care about human legal structures, that they won’t at all help you or gift you things, also won’t disassemble you and your property for its atoms[1], but will try to buy atoms from those whom the atoms belong to according to human legal records, now that strikes me as a really really really specific guess to be making that is very likely false.
Superintelligent AGIs don’t start out having giant space infrastructure, but qualitatively, I think they’d very quickly overshadow the collective power of humanity in a similar manner. They can see paths through the future to accomplish their goals much better than we can, routing around attempts by us to oppose them. The force that backs up our laws does not bind them. If you somehow managed to align them, they might want to follow some of our laws, because they care about them. But if someone managed to make them care about the legal system, they probably also managed to make them care about your well-being. Few humans, I think, would not at all care about other humans’ welfare, but would care about the rule of law, when choosing what to align their AGI with. That’s not a kind of value system that shows up in humans much.
So in that scenario, you don’t need a legal claim to part of the pre-exisiting economy to benefit from the superintelligences’ labours. They will gift some of their labour to you. Say the current value of the world economy is x, owned by humans roughly in proportion to how much money they have, and two years after superintelligence the value of the economy is 101x, with ca.99x of the new surplus owned by aligned superintelligences[2] because they created most of that value, and ca.x owned by rich humans who sold the superintelligence valuable resources and infrastructure to get the new industrial base started faster[3]. The superintelligence will then probably distribute its gains among humans according to some system that either treats conscious minds pretty equally, or follows the idiosyncratic preferences of the faction that aligned it, not according to how large a fraction of the total economy they used to own two years ago. So someone who started out with much more money than you two years ago doesn’t have much more money in expectation now than you do.
You can’t just demand super high share percentages from the superintelligence in return for that startup capital. It’s got all the resource owners in the world as potential bargain partners to compete with you. And really, the only reason it wouldn’t be steering the future into a deal where you get almost nothing, or just steal all your stuff, is to be nice to you. Decision theoretically, this is a handout with extra steps, not a negotiation between equals.
A question in my head is what range of fixed points are possible in terms of different numeric (“monetary”) economic mechanisms and contracts. Seems to me those are a kind of AI component that has been in use since before computers.
Ownership is enforced by physical interactions, and only exists to the degree the interactions which enforce it do. Those interactions can change.
As Lucius said, resources in space are unprotected.
Organizations which hand more of their decision-making to sufficiently strong AIs “win” by making technically-legal moves, at the cost of probably also attacking their owners. Money is a general power coupon accepted by many interactions; ownership deeds are a more specific, narrow one; if the ai systems which enforce these mechanisms don’t systemically reinforce towards outcomes where the things available to buy actually satisfy the preferences of remaining humans who own ai stock or land, then the owners can end up with no not-deadly food and a lot of money, while datacenters grow and grow, taking up energy and land with (semi?-)autonomously self replicating factories or the like—if money-like exchange continues to be how the physical economy is managed in ai to ai interactions, these self replicating factories might end up adapted to make products that the market will buy. but if the majority of the buying power is ai controlled corporations, then figuring out how to best manipulate those ais into buying is the priority. If it isn’t, then manipulating humans into buying is the priority.
It seems to me that the economic alignment problem of guaranteeing everyone is each able to reliably only spend money on things that actually match their own preferences, so that sellers can’t gain economic power by customer manipulation, is an ongoing serious problem that ends up being the weak link in scenarios where AIs manage an economy that uses similar numeric abstractions and contracts (money, ownership, rent) as the current one.
you can replace a lot of human labor. But an equivalent replacement for physical space or raw materials for manufacturing does not exist.
There is a lot of space and raw materials in the universe. AI thinks faster, so technological progress happens faster, which opens up access to new resources shortly after takeoff. Months to years, not decades to centuries.
If, for the sake of argument, we suppose that goods that provide no benefit to humans have no value, then land in space will be less valuable than land on earth until humans settle outside of earth (which I don’t believe will happen in the next few decades).
Mining raw materials from space and using them to create value on earth is feasible, but again I’m less confident that this will happen (in an efficient-enough manner that it eliminates scarcity) in as short of a timeframe as you predict.
However, I am sympathetic to the general argument here that smart-enough AI is able to find more efficient ways of manufacturing or better approaches to obtaining plentiful energy/materials. How extreme this is will depend on “takeoff speed” which you seem to think will be faster than I do.
land in space will be less valuable than land on earth until humans settle outside of earth (which I don’t believe will happen in the next few decades).
Why would it take so long? Is this assuming no ASI?
Longer-term, if Adam Brown is correct on how advanced civilizations can change the laws of physics, then effectively no constraints remain on the economy, and the reason why we can’t collect almost all of the rent is because you can drive prices arbitrarily low:
I don’t think “hostile takeover” is a meaningful distinction in case of AGI. What exactly prevents AGI from pulling plan consisting of 50 absolutely legal moves which ends up with it as US dictator?
Perhaps the term “hostile takeover” was poorly chosen but this is an example of something I’d call a “hostile takeover”. As I doubt we would want and continue to endorse an AI-dictator.
Perhaps “total loss of control” would have been better.
Whenever I read yet another paper or discussion of activation steering to modify model behavior, my instinctive reaction is to slightly cringe at the naiveté of the idea. Training a model to do some task only to then manually tweak some of the activations or weights using a heuristic-guided process seems quite un-bitter-lesson-pilled. Why not just directly train for the final behavior you want—find better data, tweak the reward function, etc.?
But actually there may be a good reason to continue working on model-internals control (i.e. ways of influencing model behavior outside of modifying the text input or training process, by directly changing internal state). For some applications, you may want to express something in terms of the model’s own abstractions, something that you won’t know a priori how to do in text or via training data in fine-tuning. Throughout the training process, a model naturally learns a rich semantic activation space. And in some cases, the “cleanest” way to modify its behavior is by expressing the change in terms of its learned concepts, whose representations are sculpted by exaflops of compute.
Not sure what distinction you’re making. I’m talking about steering for controlling behavior in production, not for red-teaming at eval time or to test interp hypotheses via causal interventions. However this still covers both safety (e.g. “be truthful”) and “capabilities” (e.g. “write in X style”) interventions.
Well, mainly I’m saying that “Why not just directly train for the final behavior you want” is answered by the classic reasons why you don’t always get what you trained for. (The mesaoptimizer need not have the same goals as the optimizer; the AI agent need not have the same goals as the reward function, nor the same goals as the human tweaking the reward function.) Your comment makes more sense to me if interpreted as about capabilities rather than about those other things.
For some applications, you may want to express something in terms of the model’s own abstractions
It seems like this applies to some kinds of activation steering (eg steering on SAE features) but not really to others (eg contrastive prompts); curious whether you would agree.
Perhaps. I see where you are coming from. Though I think it’s possible contrastive-prompt-based vectors (eg. CAA) also approximate “natural” features better than training on those same prompts (fewer degrees of freedom with the correct inductive bias). I should check whether there has been new research on this…
And in some cases, the “cleanest” way to modify its behavior is by expressing the change in terms of its learned concepts, whose representations are sculpted by exaflops of compute.
After all, what is an activation steering vector but a weirdly-constructed LoRA with rank 1[1]?
Ok technically they’re not equivalent because LoRAs operate in an input-dependent fashion on activations, while activation steering operates in an input-independent fashion on the activations. But LLMs very consistently have outlier directions in activation space with magnitudes that are far larger than “normal” directions and approximately constant across inputs. LoRA adds \(AB^Tx\\) to the activations. With r=1, you can trivially make BT aligned to the outlier dimension, which allows you to make BTx a scalar with value ≈ 1 (±0.06), which you can project to a constant direction in activation space with A. So given a steering vector, you can in practice make a *basically* equivalent but worse LoRA[2] in the models that exist today.
Don’t ask me how this even came up, and particularly don’t ask me what I was trying to do with serverless bring-your-own-lora inference. If you find yourself going down this path, consider your life choices. This way lies pain. See if you can just use goodfire.
Several such APIs exist. My thought was “I’d like to play with the llamascope SAE features without having to muck about with vllm, and together lets you upload a LoRA directly”, and I failed to notice that the SAE was for the base model and together only supports LoRAs for the instruct model.
The fun thing about this LoRA hack is that you don’t actually have to train the LoRA, if you know the outlier direction+magnitude for your model and the activation addition you want to apply you can write straight to the weights. The unfun thing is that it’s deeply cursed and also doesn’t even save you from having to mess with vllm.
Edit: on reflection, I do think rank 1 LoRAs might be an underappreciated interpretability tool.
Most people on LW, and even most people in the US, are in favor of disease eradication, radical life extension, reduction of pain and suffering. A significant proportion (although likely a minority) are in favor of embryo selection or gene editing to increase intelligence and other desirable traits. I am also in favor of all these things. However, endorsing this form of generally popular transhumanism does not imply that one should endorse humanity’s succession by non-biological entities. Human “uploads” are much riskier than any of the aforementioned interventions—how do we know if we’ve gotten the upload right, how do we make the environment good enough without having to simulate all of physics? Successors that are not based on human emulation are even worse. Deep learning based AIs are detached from the lineage of humanity in a clear way and are unlikely to resemble us internally at all. If you want your descendants to exist (or to continue existing yourself), deep learning based AI is no equivalent.
Succession by non-biological entities is not a natural extension of “regular” transhumanism. It carries altogether new risks and in my opinion would almost certainly go wrong by most current people’s preferences.
The term “posthumanism” is usually used to describe “succession by non-biological entities”, for precisely the reason that it’s a distinct concept, and a distinct philosophy, from “mere” transhumanism.
(For instance, I endorse transhumanism, but am not at all enthusiastic about posthumanism. I don’t really have any interested in being “succeeded” by anything.)
I find this position on ems bizarre. If the upload acts like a human brain, and then also the uploads seem normalish after interacting with them a bunch, I feel totally fine with them.
I also am more optimistic than you about creating AIs that have very different internals but that I think are good successors, though I don’t have a strong opinion.
I am not philosophically opposed to ems, I just think they will be very hard to get right (mainly because of the environment part—the em will be interacting with a cheap downgraded version of the real world). I am willing to change my mind on this. I also don’t think we should avoid building ems, but I think it’s highly unlikely an em life will ever be as good as or equivalent to a regular human life so I’d not want my lineage replaced with ems.
In contrast to my point on ems, I do think we should avoid building AIs whose main purpose is to be equivalent to (or exceed) humans in “moral value”/pursue anything that resembles building “AI successors”. Imo the main purpose of AI alignment should be to ensure AIs help us thrive and achieve our goals rather than to attempt to embed our “values” into AIs with the goal of promoting our “values” independently of our existence. (Values is in scare quotes because I don’t think there’s such a thing as human values—individuals differ a lot in their values, goals, and preferences.)
Would you be convinced if you talked to the ems a bunch and they reported normal, happy, fun lives? (Assuming nothing nefarious happened in terms of e.g. modifying their brains to report that.) I think I would find that very convincing. If you wouldn’t find that convincing, what would you be worried was missing?
I would find that reasonably convincing, yes (especially because my prior is already that true ems would not have a tendency to report their experiences in a different way from us).
i want drastically upgraded biology, potentially with huge parts of the chemical stack swapped out in ways I can only abstractly characterize now without knowing what the search over viable designs will output. but in place, without switching to another substrate. it’s not transhumanism, to my mind, unless it’s to an already living person. gene editing isn’t transhumanism, it’s some other thing; but shoes are transhumanism for the same reason replacing all my cell walls with engineered super-bio nanotech that works near absolute zero is transhumanism. only the faintest of clues what space an ASI would even be looking in to figure out how to do that, but it’s the goal in my mind for ultra-low-thermal-cost life. uploads are a silly idea, anyway, computers are just not better at biology than biology. anything you’d do with a computer, once you’re advanced enough to know how, you’d rather do by improving biology
computers are just not better at biology than biology. anything you’d do with a computer, once you’re advanced enough to know how, you’d rather do by improving biology
I share a similar intuition but I haven’t thought about this enough and would be interested in pushback!
it’s not transhumanism, to my mind, unless it’s to an already living person. gene editing isn’t transhumanism
You can do gene editing on adults (example). Also in some sense an embryo is a living person.
IMO the whole “upload” thing changes drastically depending on our understanding of consciousness and continuity of the self (which is currently nearly non-existent). It’s like teleportation—I would let neither that nor upload happen to me willingly unless someone was able to convincingly explain me how precisely are my qualia associated with my brain and how they’re going to move over (rather than just killing me and creating a different entity).
I don’t believe it’s impossible for an upload to be “me”. But I doubt it’d be as easy as simply making a scan of my synapses and calling it a day. If it is, and if that “me” is then also infinitely copiable, I’d be very ambivalent about it (given all the possible ways it could go horribly wrong—see this story or the recent animated show Pantheon for ideas).
So it’s definitely a “ok, but” position for me. Would probably feel more comfortable with a “replace my brain bit by bit with artificial functional equivalents” scenario as one that preserves genuine continuity of self.
I think a big reason why uploads may be much worse than regular life is not that the brain scan will be not good enough but that they won’t be able to interact with the real world like you can as a physical human.
Edit: I guess with sufficiently good robotics the ems would be able to interact with the same physical world as us in which case I would be much less worried.
I’d say even simply a simulated physical environment could be good enough to be indistinguishable. As Morpheus put it:
What is real? How do you define ‘real’? If you’re talking about what you can feel, what you can smell, what you can taste and see, then ‘real’ is simply electrical signals interpreted by your brain.
Of course, that would require insane amounts of compute, but so would a brain upload in the first place anyway.
I feel like this position is… flimsy? Unsubstantial? It’s not like I disagree, I don’t understand why you would want to articulate it in this way.
On the one hand, I don’t think biological/non-biological distinction is very meaningful from transhumanist perspective. Is embryo, genetically modified to have +9000IQ, going to be meaningfully considered “transhuman” instead of “posthuman”? Are you going to still be you after one billion years of life extension? “Keeping relevant features of you/humanity after enormous biological changes” seems to be qualitatively the same to “keeping relevant features of you/humanity after mind uploading”—i.e., if you know at gears-level what features of biological brains are essential to keep, you have rough understanding what you should work on in uploading.
On the other hand, I totally agree that if you don’t feel adventurous and you don’t want to save the world at price of your personality death, it would be a bad idea to undergo uploading in a way that closest-to-modern technology can provide. It just means that you need to wait for more technological progress. If we are in the ballpark of radical life extension, I don’t see any reason to not wait 50 years to perfect upload tech and I don’t see any reason why 50 years are not going to be enough, conditional on at least normally expected technical progress.
The same with AIs. If we have children, who are meaningfully different from us, and who can become even more different in glorious transhumanist future, I don’t see reasons to not have AI children, conditional on their designs preserving all important relevant features we want to see in our children. The problem is that we are not on track to create such designs, not conceptual existence of such designs.
And all said seems to be simply deducible/anticipated from concept of transhumanism, i.e., concept that the good future is the one filled with beings capable to meaningfully say that they were Homo Sapiens and stopped being Homo Sapiens at some point of their life. When you say “I want radical life extension” you immediately run into question “wait, am I going to be me after one billion years of life extension?” and you start The Way through all the questions about self-identity, essense of humanity, succession, et cetera.
I am going to post about biouploading soon – where the uploading is happened into (or via) a distributed net of my own biological neurons. This combines good things about uploading – immortality, ability to be copied, easy to repair, and good things about being biological human – preserving infinite complexity, exact sameness of a person, guarantee that the bioupload will have human qualia and any other important hidden things which we can miss.
Like with AGI, risks are a reason to be careful, but not a reason to give up indefinitely on doing it right. I think superintelligence is very likely to precede uploading (unfortunately), and so if humanity is allowed to survive, the risks of making technical mistakes with uploading won’t really be an issue.
I don’t see how this has anything to do with “succession” though, there is a world of difference between developing options and forcing them on people who don’t agree to take them.
Something I’ve noticed from posting more of my thoughts online:
People who disagree with your conclusion to begin with are more likely to carefully read and point out errors in your reasoning/argumentation, or instances where you’ve made incorrect factual claims. Whereas people who agree with your conclusion before reading are more likely to consciously or subconsciously gloss over any flaws in your writing because they are onboard with the “broad strokes”.
So your best criticism ends up coming with a negative valence, i.e. from people who disagree with your conclusion to begin with.
(LessWrong has much less of this bias than other places, though I still see some of it.)
Thus a better way of framing criticism is to narrowly discuss some issue with reasoning, putting aside any views about the conclusion, leaving its possible reevaluation an implicit exercise for the readers.
I think there’s some weak evidence that yes. In some studies where they give HGH for other reasons (a variety of developmental disorders, as well as cases when the child is unusually small or short), an IQ increase or other improved cognitive outcomes are observed. The fact that this occurs in a wide variety of situations indicates that it could be a general effect that could apply to healthy children.
Examples of studies (caveat: produced with the help of ChatGPT, I’m including null results also). Left column bolded when there’s a clear cognitive outcome improvement.
Treatment group
Observed cognitive / IQ effects of HGH
Study link
Children with isolated growth hormone deficiency; repeated head circumference and IQ testing during therapy
IQ increased in parallel with head-size catch-up (small case series, N=4). Exact IQ‐point gains not reported in the abstract.
Short-stature children (growth hormone deficiency and idiopathic short stature), ages 5–16, followed 3 years during therapy
IQ and achievement scores: no change over 3 years (≈0 IQ-point mean change reported); behavior improved (e.g., total problems ↓, P<.001 in growth hormone deficiency; attention/social/thought problems each P=.001).
Children born small for gestational age, long-term randomized dose-response cohort (≈8 years of therapy)
Total IQ and “performal” IQ increased from below population norms to within normal range by follow-up (p<0.001). Precise IQ-point means not in abstract.
Children born small for gestational age, randomized, double-blind dose-response trial (1 vs 2 mg/m²/day)
Total IQ and Block-Design (performance) scores increased (p<0.001). Head-size growth correlated positively with all IQ scores; untreated controls did not show head-size increases. Exact IQ-point changes not in abstract.
Prepubertal short children (mix of growth hormone deficiency and idiopathic short stature), randomized to fixed vs individualized dosing for 24 months
Full-scale IQ increased with a medium effect size (Cohen’s d ≈0.6) after 24 months; processing speed also improved (d ≈0.4). Exact IQ-point means not provided in abstract.
Children born small for gestational age, randomized to high-dose growth hormone for 2 years vs no treatment
No cognitive benefit over 2 years: IQ unchanged in the treated group; in the untreated group, mean IQ rose (P<.05), but after excluding children with developmental problems, neither group changed significantly. Behavioral checklist scores: no significant change.
Prepubertal children with Prader–Willi syndrome, randomized controlled trial (2 years) plus 4-year longitudinal follow-up on therapy
Prevents decline seen in untreated controls (vocabulary and similarities declined in controls at 2 years, P=.03–.04). Over 4 years on therapy: abstract reasoning (Similarities) and visuospatial skills (Block Design) increased (P=.01 and P=.03). Total IQ stayed stable on therapy vs decline in controls.
Infants and young children with Prader–Willi syndrome (approximately 52-week therapy; earlier vs later start)
Mental development improved after 52 weeks; earlier initiation (<9 months) associated with greater mental-development gains than later start. Exact test scores vary by age tool; abstract does not list points.
Down syndrome cohort, ~15-year follow-up after early childhood growth hormone
No advantage in brief IQ scores at long-term follow-up; higher scores in multiple cognitive subtests (e.g., Leiter-R, WISC-III subtests) vs controls; larger adult head circumference in previously treated group.
I would also suggest looking at IGF-1. You can reach out to me; this topic interests me and I have a lot of experience working with HGH and IGF-1 (including a world record).
One risk of “vibe-coding” a piece of software with an LLM is that it gets you 90% of the way there, but then you’re stuck—the last 10% of bug fixes, performance improvements, or additional features is really hard to figure out because the AI has written messy, verbose code that both of you struggle to work with. Nevertheless, to delegate software engineering to AI tools is more tempting than ever. Frontier models can spit out almost-perfect complex React apps in just a minute, something that would have taken you hours in the past. And despite the risks, it’s often the right decision to prioritize speed, especially as models get smarter.
There is, of course, a middle ground between “vibe-coding” and good old-fashioned typing-every-character-yourself. You could use LLMs for smart autocomplete, occasionally asking for help with specific functions or decisions, or small and targeted edits. But models don’t seem optimized for this use case. It’s actually difficult to do so—it’s one thing to build an RL environment where the goal is to write code that passes some tests or gets a high preference score. It’s another thing to build an RL environment where the model has to guide a human to do a task, write code that’s easy for humans to build on, or ensure the solution is maximally legible to a human.
Will it become a more general problem that the easiest way for an AI to solve a problem is to produce a solution that humans find particularly hard to understand or work with? Some may say this is not a problem at the limit, when AIs are robustly superhuman at the task, but until then there is a temporary period of slop. Personally, I think this is a problem even when AIs are superhuman because of the importance of human oversight. Optimizing for intelligibility to humans is important for robustness and safety—at least some people should be able to understand and verify AI solutions, or intervene in AI-automated systems when needed.
I wonder, in the unlikely case that the AI progress would stop, and we would be left with AIs exactly as smart as they are now, whether that would completely ruin software development.
We would soon have tons of automatically generated software that is difficult for humans to read. People developing new libraries would be under smaller pressure to make them legible, because as long as they can be understood by AIs, who cares. Paying a human to figure this out would be unprofitable, because running the AI thousand times and hoping that it gets it right once would be cheaper. Etc.
Current LLM coding agents are pretty bad at noticing that a new library exists to solve a problem in the first place, and at evaluating whether an unfamiliar library is fit for a given task.
As long as those things remain true, developers of new libraries wouldn’t be under much pressure in any direction, besides “pressure to make the LLM think their library is the newest canonical version of some familiar lib”.
Think clearly about the current AI training approach trajectory
If you start by discussing what you expect to be the outcome of pretraining + light RLHF then you’re not talking about AGI or superintelligence or even the current frontier of how AI models are trained. Powerful, general AI requires serious RL on a diverse range of realistic environments, and the era of this has just begun. Manystartupsareworkingon building increasingly complex, diverse, and realistic training environments.
It’s kind of funny that so much LessWrong arguing has been around why a base model might start trying to take over the world. When that’s beyond the point. Of course we will eventually start RL’ing models on hard, real-world goals.
Example post / comment to illustrate what I mean.
What, concretely, is being analogized when we compare AI training to evolution?
People (myself included) often handwave what is being analogized when it comes to comparing evolution to modern ML. Here’s my attempt to make it concrete:
Both are directed search processes (hence the analogy)
Search space: possible genes vs. possible parameter configurations
Direction of search: stuff that survives and increases in number vs. stuff that scores well on loss function
Search algorithm: random small steps vs. locally greedy+noisy steps
One implication of this is that we should not talk about whether one or another species tries to survive and increase in number (“are humans aligned with evolution’s goals?”) but rather whether genetic material/individual genes are doing so.
Have you read the evolution sequence? I think it does a good job of explaining why the direction of change isn’t quite toward stuff that survives and increases in number.
Giving unsolicited advice and criticism is a very good credible signal of respect
I have often heard it claimed that giving advice is a bad idea because most people don’t take it well and won’t actually learn from it.
Giving unsolicited advice/criticism risks:
The recipient liking you less
The recipient thinking you are stupid because “obviously they have heard this advice before”
The recipient thinking you are stupid because they disagree with the advice
The recipient being needlessly offended without any benefit
People benefit from others liking them and not thinking they are stupid, so these are real costs. Some people also don’t like offending others.
So clearly it’s only worth giving someone advice or criticism if you think at least some of the following are true:
Their wellbeing/impact/improvement is important enough that the small chance your advice has a positive impact is worth the cost
They are rational enough to not take offense in a way that would damage your relationship
They are particularly good at using advice/criticism, i.e. they are more likely to update than the average person
They value honest opinions and feedback even when they disagree, i.e. they prefer to know what others think about them because it’s interesting and potentially useful information even if not immediately actionable
The above points all reflect a superior attitude compared to the average person. And so, if you choose to give someone advice or criticism despite all the associated risks, you are credibly signaling that you think they have these positive traits.
Not giving unsolicited advice and criticism is selfish
The “giving advice is bad” meme is just a version of “being sycophantic is good”—you personally benefit when others like you and so often it’s useful to suck up to people.
Even the risk that your interlocutor is offended is not a real risk to their wellbeing—people dislike offending others because it feels uncomfortable to them. Being offended is not actually meaningfully harmful to the offended party.
No doubt that sycophancy and the fear of expressing potentially friendship damaging truths allows negative patterns of behavior to continue unimpeded but I think you’ve missed the two most necessary factors in determining if advice—solicited or unsolicited—is a net benefit to the recipient:
1. you sufficiently understand and have the expertise to comment on their situation
&
2. you can offer new understanding they aren’t already privy to.
Perhaps the situations where I envision advice is being given is different to yours?
The problem I notice with most unsolicited advice is it’s either something the recipient is already aware of (i.e. the classic sitcom example is someone touches a hot dish and after the fact is told “careful that pan is hot”—is it good advice? Well in the sense that it is truthful, maybe. But the burn already having happened, it is not longer useful.) This is why it annoys people, this is why it is taken as an insult to their intelligence.
A lot of people have already heard the generic or obvious advice and there may be many reasons why they aren’t following it,[1] and most of the time hearing this generic advice being repeated will not be of a benefit even if they have all the qualities you enumerate: that you’re willing to accept the cost of giving advice, that they are rational enough to not take offense, they are good at taking advice and criticism, and they value honest feedback even when they disagree.
Take this example exchange:
A: “Why are you using the grill to make melted cheese, we have a toaster oven.”
B: “the toaster is shorted out, it’s broken”
You must sufficiently understand the recipient’s situation if you are to have any hope of improving it. If you don’t know what they know about the toaster oven, then unsolicited advice can’t help.
Another major problem I’ve found with unsolicited advice is that it lacks fine grain execution detail. My least favourite advice as a freelancing creative is “you need to get your name out there”—where is there? On that big nebulous internet? How does that help me exactly? Unless I needed further reinforcement of the fact that what material I am putting online isn’t reaching my current interlocutor—but it doesn’t give me any clues how to go about remedying that.
Advice, for it to be useful needs more than just sympathy and care for the person’s well being—it needs understanding of the situation which is the cause of their behavior.
My personal metric for the “quality” of advice is how actionable it is. This means that it can’t be post-facto (like the sitcom hot pan), it needs to understand causes and context—such as why they aren’t using the toaster oven, and most importantly it needs to suggest explicit actions that can be taken in order to change the situation (and which cations the recipient can or can’t take can only be determined by properly understanding their situation and the causes of their behavior).
Caveat: I’m sure there’s a genre of fringe cases where repetition becomes “the medium is the message”—that is they do need to hear it again. But there’s a certain point where doing the same thing again and again and expecting a different result is quite stark raving mad.
Perhaps a situation to avoid giving advice in is if you think your advice is likely to be genuinely worthless because you have no expertise, knowledge, or intelligence that is relevant to the matter and you don’t trust your own judgment at all. Otherwise if you respect the other person, you’d consider them able to judge the usefulness of your advice for themselves.
You can’t know for sure that they’ve heard some advice before. Also you are providing the information that the piece of advice occurred to you, which in and of itself is often interesting/useful. So if you’re giving someone advice they are likely to have heard before this means there is a small chance that’s wrong and it’s still useful, and a larger chance that it has value zero. So in expectation the value is still positive. If you don’t give the advice, you are prioritizing not looking stupid or not offending them, which are both selfish motives.
Related to (2) is that telling someone you disapprove or think less of them for something, i.e. criticizing without providing any advice at all, is also a good signal of respect, because you are providing them with possibly useful information at the risk of them liking you less or making you feel uncomfortable.
In my opinion, this misses the crucial dynamic that the costs of giving advice significantly go up if you care about what the other person thinks of you, which is correlated with respect, status and power. I personally think that giving advice is good, that if given tactfully many people take it well, and also often enjoy giving it, so will generally try to do this wherever possible unless there’s a clear reason not to, especially in the context of EG interpretability research. But I’m much more cautious if I’m talking to someone who seems important, consider themselves high status, has power over me, etc. I think this is a large part of why people can feel offended by receiving advice. There can be some implicit sense of “you are too stupid to have thought of this”, especially if the advice is bad or obvious.
Another important facet is that most people are not (competent) utilitarians about social interactions, so you cannot accurately infer their beliefs with reasoning like this.
Fair, there’s a real tension between signaling that you think someone has a good mindset (a form of intellectual respect) and signaling that you are scared of someone’s power over you or that you care a lot about their opinion of you.
I noticed feeling a little unsatisfied and worried about this advice. I think it pattern matches with people who are savvy with status games or subtle bullying that allows for plausible deniability (“I’m just trying to help! You’re being too sensitive.”). I think people’s heuristic of perceiving criticisms as threatening seems somewhat justified most of the time.
To be clear, I tentatively define respect as the act of (a) evaluating a person as having an amount of value and welfare that is just as important as yours, (b) believing that this person’s value and welfare is worth caring about, and (c) treating them as such. You don’t have to admire or like a person to respect them. Here are some actions that connote disrespect (or indignity): torture, murder, confinement, physical abuse, verbal abuse, causing a person’s social standing to drop unnecessarily, etc. Having said that, I’m still not satisfied with this definition, but it’s the best I can come up so far.
Maybe you’ve thought about this already or I’ve missed some implicit assumptions, but let me try to explain by first using Buck’s experience as an example:
I interpret this as Buck (a) being appreciative of a criticism that seems unreasonable and unfair, yet (b) his need for respect wasn’t fulfilled—I would probably say “fuck that guy” too if they thought my opinions don’t matter in any situation due to the color of my hair.
I could imagine Buck’s interlocutor passing your above conditions:
They might believe that Buck can be more impactful when other people see him with normal looking hair colour and takes him more seriously.
They might believe Buck is rational enough (but it turns out Buck was offended anyway).
They might believe Buck is good at using advice/criticism.
They might believe Buck values opinions and feedback even when they disagree (this is true).
I could also imagine Buck’s interlocutor doing a cost-benefit analysis and believing the associated costs you mentioned above are worth it. And yet, Buck was still at least a bit offended, and I think it would be reasonable to believe that this person’s criticism was actually not a good credible signal of respect.
One may argue that Buck isn’t being rational. If he did, he wouldn’t be offended. “Huh, this guy believed that the benefits of giving that criticism outweighs the cost of me liking them less, thinking that they are stupid, and me being offended outweigh. Seems like a credible signal of respect.”
I mean Buck was appreciative of that advice, but an advice being valuable is not necessarily a credible signal of respect. I could imagine an boss giving valuable advice that still checks all your conditions, but does it in a abusive manner.
My tentative version of what an unsolicited advice that’s also a good credible signal of respect would have more of the following conditions met:
The interlocutor actually communicating their criticism in a respectful way (as I’ve defined above). This seems like a necessary condition to pass.
The interlocutor made at least some effort to craft a good criticism/advice. One way this could work is for the interlocutor to ask questions and learn more about their advisee, which is probably a standard in many problem solving frameworks used by management consultants. But a mistake can sometimes be straightforwardly obvious that a low effort criticism works, so this condition is not sufficient on its own.
The interlocutor noticing that their advice could be wrong and very costly to heed. Again, not sufficient on its own.
The interlocutor showing care and authenticity, and showing that their advice isn’t some status-seeking one-upmanship, or a way to “create common knowledge of their status difference” (as a friend pointed out to me as another possibility).
And probably something else written here.
I might be misunderstanding you though, so happy to update!
And thanks for writing this! I do think you are on to something—I do want to get better at feedback giving and receiving, and if done well and at a higher frequency (this might be what you’re pointing to), could make me more impactful.
How people respond tells you something about them, so you don’t necessarily need to start with a clear picture of how they might respond.
Also, I think advice is the wrong framing for things that are useful to give, it’s better to make sure people have the knowledge and skills to figure out the things they seem to need to figure out. Similarly to the “show, don’t tell” of educational discussions, you want to present the arguments and not the conclusions, let alone explicitly insist that the other person is wrong about the conclusions. Or better yet, promote the skills that let them assemble the arguments on their own, without needing to concretely present the arguments.
It might help to give the arguments and even conclusions or advice eventually, after everything else is done, but it’s not the essential part and might be pointless or needlessly confrontational if the conclusions they arrive at happen to differ.
Any rule about when to give advice has to be robust to people going on and on to lecture you about Jesus because they truly and sincerely want to keep you out of Hell. (Or lecture about veganism, or EA, or politics.)
More generally, social rules about good manners have to apply to everyone—both to people with correct beliefs and to people with incorrect ones. Just like not letting the police break into everyone’s houses catches fewer criminals (when the police are right), but protects innocent people (when the police are wrong), not giving advice helps fewer people (when the advice giver is right), but saves people from arrogant know it alls and meme plagues (when the advice giver is wrong).
I think this discussion about advice is very fruitful. I think the existing comments do a great job of characterizing why someone might reasonably be offended. So if we take that as the given situation: you want to help people, project respect, but don’t want it to come off the wrong way, what could you do?
My partial answer to this, is merely sharing your own authentic experience of why you are personally persuaded by the content of the advice, and allowing them to internalize that evidence and derive inferences for themselves. At social gatherings, the people in my life do this- just sharing stories, sometimes horror stories where the point is so obvious that it doesn’t need explicit statement. And it feels like a genuine form of social currency to faithfully report on your experiences. This reminds me of “Replace the Symbol with the Substance” [1] where the advice is the symbol and the experience is the substance.
So I wonder if that’s part of it—creating the same change in the person anyway a the while mitigating the risk of condescension. The dynamics of the relationship also complicate analyzing the situation. And in what type of social setting the advice is delivered. And probably a bunch more factors I haven’t thought of yet.
[1]: https://www.lesswrong.com/posts/GKfPL6LQFgB49FEnv/replace-the-symbol-with-the-substance
Insightful. Glad you wrote it.
I enjoyed the combination of “these are real costs” and “positive impact is worth the cost.”
I found this insightful, ”...reflect a superior attitude...give...advice or criticism...signaling...they have...these positive traits”
I think the challenge lies in categorizing people as “superior” and “average”. I like the use of labels since it helps the conversation, but I wonder if it is too limiting. Perhaps, context and topic are important dimensions worthy of consideration as well. I can imagine real people responding differently given more variables, such as context and topic.
Bottom line: I loved it!
On people’s arguments against embryo selection
A recent NYT article about Orchid’s embryo selection program triggered a surprising to me backlash on X where people expressed disgust and moral disapproval at the idea of embryo selection. The arguments generally fell into two categories:
(1) “The murder argument” Embryo selection is bad because it involves creating and then discarding embryos, which is like murdering whole humans. This argument also implies regular IVF, without selection, is also bad. Most proponents of this argument believe that the point of fertilization marks a key point when the entity starts to have moral value, i.e. they don’t ascribe the same value to sperm and eggs.
(2) “The egalitarian argument” Embryo selection is bad because the embryos are not granted the equal chance of being born they deserve. “Equal chance” here is probably not quite the correct phrase/is a bit of a strawman (because of course fitter embryos have a naturally higher chance of being born). Proponents of this argument believe that intervening on the natural probability of any particular embryo being born is anti-egalitarian and this is bad. By selecting for certain traits we are saying people with those traits are more deserving of life, and this is unethical/wrong.
At face value, both of these arguments are valid. If you buy the premises (“embryos have the moral value of whole humans”, “egalitarianism is good”) then the arguments make sense. However, I think it’s hard to justify moral value beginning at the point of fertilization.
On argument (1):
If we define murder as “killing live things” and decide that murder is bad (an intuitive decision), then “the murder argument” holds up. However, I don’t think we actually think of murder as “killing live things” in real life. We don’t condemn killing bacteria as murder. The anti-IVF people don’t condemn killing sperm or egg cells as murder. So the crux here is not whether the embryo is alive, but rather whether it is of moral value. Proponents of this argument claim that the embryo is basically equivalent to a full human life. But to make this claim, you must appeal to its potential. It’s clear that in its current state, an embryo is not a full human. The bundle of cells has no ability to function as a human, no sensations, no thoughts, no pain, no happiness, no ability to survive or grow on its own. We just know the given the right conditions, the potential for a human life exists. But as soon as we start arguing about how the potential of something grants it moral value, it becomes difficult to draw the line arbitrarily at fertilization. From the point of view of potential humans, you can’t deny sperm and eggs moral value. In fact, every moment a woman spends not pregnant is a moment she is ridding the world of potential humans.
On argument (2):
If you grant the premise that any purposeful intervention on the probabilities of embryos being born is unethical because it violates some sacred egalitarian principle then it’s hard to refute argument (2). Scott Alexander has argued that encouraging a woman to rehabilitate from alcoholism before getting pregnant is equivalent to preferring the healthy baby over the baby with fetal alcohol syndrome, something argument (2) proponents oppose. However, I think this is a strawman. The egalitarians think every already-produced embryo should be given as equal a chance as possible. They are not discussing identity changes of potential embryos. However, again we run into the “moral value from potential” problem. Sure, you can claim that embryos have moral value for some magical God-given reason. But my intuition is that in their hearts, the embryo-valuers are using some notion of potential full human life to ground their assessment. In which case again we run into the arbitrariness of the fertilization cutoff point.
So in summary, I think it’s difficult to justify valuing embryos without appealing to their potential, which leads us to value earlier stages of potential humans. Under this view, it’s a moral imperative to not prevent the existences of any potential humans, which looks like maximizing the number of offspring you have. Or as stated in this xeet
People like to have clear-cut moral heuristics like “killing is bad.” This gives them an easy guide to making a morally correct decision and an easy guide to judging other’s actions as moral or immoral. This requires simplifying multidimensional situations into easily legible scenarios where a binary decision can be made. Thus you see people equating embryo disposal to first-degree murder, and others advocating for third-trimester abortion rights.
Regarding egalitarian-like arguments, I suspect many express opposition to embryo selection not because it’s a consequence of a positive philosophy that they state and believe and defend, but because they have a negative philosophy that tells them what positions are to be attacked.
I suspect that if you put together the whole list of what they attack, there would be no coherent philosophy that justifies it (or perhaps there would be one, but they would not endorse it).
There is more than zero logic to what is to be attacked and what isn’t, but it has more to do with “Can you successfully smear your opponent as an oppressor, or as one who supports doctrines that enable oppression; and therefore evil or, at best, ignorant if they immediately admit fault and repent; in other words, can you win this rhetorical fight?” than with “Does this argument, or its opposite, follow from common moral premises, data, and logical steps?”.
In this case, it’s like, if you state that humans with blindness or whatever have less moral worth than fully healthy humans, then you are to be attacked; and at least in the minds of these people, selecting embryos of the one kind over the other is close enough that you are also to be attacked.
(Confidence: 75%)
Some people believe embryos have souls which may impact their moral judgement. Soul can be considered as “full human life” in moral terms. I think attributing this to purely potential human life may not be accurate, since the intuitions for essentialist notions of continuity of selfhood can be often fairly strong among certain people.
I appreciate the pursuit of non-strawman understandings of misgivings around reprogenetics, and the pursuit of addressing them.
I don’t feel I understand the people who talk about embryo selection as “killing embryos” or “choosing who lives and dies”, but I want to and have tried, so I’ll throw some thoughts into the mix.
First: Maybe take a look at: https://www.thenewatlantis.com/publications/the-anti-theology-of-the-body
Hart, IIUC, argues that wanting to choose who will live and who won’t means you’re evil and therefore shouldn’t be making such choices. I think his argument is ultimately stupid, so maybe I still don’t get it. But anyway, I think it’s an importantly different sort of argument than the two you present. It’s an indictment of the character of the choosers.
Second: When I tried to empathize with “life/soul starts at conception”, what I got was:
We want a simple boundary…
… for political purposes, to prevent…
child sacrifice (which could make sense given the cults around the time of the birth of Christianity?).
killing mid-term fetuses, which might actually for real start to have souls.
… for social purposes, because it causes damage to ….
the would-be parents’s souls to abort the thing which they do, or should, think of as having a soul.
the social norm / consensus / coordination around not killing things that people do or should orient towards as though they have souls.
The pope said so. (...But then I’d like to understand why the pope said so, which would take more research.) (Something I said to a twitter-famous Catholic somehow caused him to seriously consider that, since Yermiahu says that god says “Before I formed you in the womb I knew you...”, maybe it’s ok to discard embryos before implantation...)
(My invented explanation:) Souls are transpersonal. They are a distributed computation between the child, the parents, the village, society at large, and humanity throughout all time (god). As an embryo grows, the computation is, gradually, “handed off to / centralized in” the physical locus of the child. But already upon conception, the parents are oriented towards the future existence of the child, and are computing their part of the child’s soul—which is most of what has currently manifested of the child’s soul. In this way, we get:
From a certain perspective:
It reflects poorly on would-be parents who decide to abort.
It makes sense for the state to get involved to prevent abortion. (I don’t agree with this, but hear me out:)
The perspective is one which does not acknowledge the possibility of would-be parents not mentally and socially orienting to a pregnancy in the same way that parents orient when they are intending to have children, or at least open to it and ready to get ready for it.
...Which is ultimately stupid of course, because that is a possibility. So maybe this is still a strawman.
Well, maybe the perspective is that it’s possible but bad, which is at least usefully a different claim.
Within my invented explanation, the “continuous distributed metaphysics of the origins of souls”, it is indeed the case that the soul starts at conception—BUT in fact it’s fine to swap embryos! It’s actually a strange biodeterminism to say that this clump of cells or that, or this genome or that, makes the person. A soul is not a clump of cells or a genome! The soul is the niche that the parents, and the village, have already begun constructing for the child; and, a little bit, the soul is the structure of all humanity (e.g. the heritage of concepts and language; the protection of rights; etc.).
People talk about meditation/mindfulness practices making them more aware of physical sensations. In general, having “heightened awareness” is often associated with processing more raw sense data but in a simple way. I’d like to propose an alternative version of “heightened awareness” that results from consciously knowing more information. The idea is that the more you know, the more you notice. You spot more patterns, make more connections, see more detail and structure in the world.
Compare two guys walking through the forest: one is a classically “mindful” type, he is very aware of the smells and sounds and sensations, but the awareness is raw, it doesn’t come with a great deal of conscious thought. The second is an expert in botany and birdwatching. Every plant and bird in the forest has interest and meaning to him. The forest smells help him predict what grows around the corner, the sounds connect to his mental map of birds’ migratory routes.
Sometimes people imply that AI is making general knowledge obsolete, but they miss this angle—knowledge enables heightened conscious awareness of what is happening around you. The fact that you can look stuff up on Google, or ask an AI assistant, does not actually lodge that information in your brain in a way that lets you see richer structure in the world. Only actually knowing does that.
Yeah, two people can read the same Wikipedia page, and get different levels of understanding. The same is true for reading the same AI output. No matter how nicely the AI puts it, either it connects with something in your brain or it doesn’t.
In theory, with a superhuman general AI, we could say something like “hey, AI, teach me enough to fully appreciate the thing that you just wrote” (with enough patience and a way to reduce hallucinations, we might be able to achieve a similar effect even with current AIs), but most people probably won’t bother.
Perhaps it’s that those people say AI is making general knowledge obsolete because it reduces the social value or status of possessing general knowledge by making it an abundant resource. As you said though, the fact that people have access to that abundant resource doesn’t mean they understand how to properly make use of it. The capability to understand is still a scarce resource.
The risk of incorrectly believing in moral realism
(Status: not fully fleshed out, philosophically unrigorous)
A common talking point is that if you have even some credence in moral realism being correct, you should act as if it’s correct. The idea is something like: if moral realism is true and you act is if it’s false, you’re making a genuine mistake (i.e. by doing something bad), whereas if it’s false and you act as if it’s true, it doesn’t matter (i.e. because nothing is good or bad in this case).
I think this way of thinking is flawed, and in fact, the opposite argument can be made (albeit less strongly): if there’s some credence in moral realism being false, acting as if it’s true could be very risky.
The “act as if moral realism is true if unsure” principle contrasts moral realism, (i.e. that there is an objective moral truth, independent of any particular mind) with nihilism (i.e. nothing matters). But these are not the only two perspectives you could have. Moral subjectivism is a to-me intuitively compelling anti-realist view, which says that the truth value of moral propositions is mind-dependent (i.e. based on an individual’s beliefs about what is right and wrong).
From a moral subjectivist perspective, my actions can be justified by what I think is good, and your actions can be justified by what you think is good, and these things can disagree.
Importantly, compared to moral realism, moral subjectivism implies a different strategy when it comes to coordinating with others to achieve good things. If I am a moral realist, I may hope that with enough effort, I can prove to others (other people, or even machines), that something is good or bad. Whereas if I’m a moral subjectivist, this idea seems rather doomed. I need to accept that others may have a different, valid to them, conception of good. And so my options are either to overpower them (by not letting them achieve their idea of good when it conflicts with mine) or trade with them.
If I decide to “act as if moral realism is true”, I might spend a lot of resources trying to prove my idea of goodness to others, instead of directly pursuing my goals or trading with those who disagree. Furthermore, if everyone adopts this strategy, we end up in a long, unproductive fight that can never be resolved, instead of engaging in mutually-beneficial trades wherever possible.
This may pose a practical issue when it comes to AI development: if AI developers believe that there’s an objectively correct morality that the AI should follow, they may end up violating almost all people’s subjective conception of goodness in pursuit of an objective goodness that doesn’t exist.
Generally agree, but disagree with this part:
There’s room for persuasion and deliberation as well. Moral anti-realists can care about how other people form moral beliefs (e.g. quality of justifications, coherence of values, non-coercion).
I think those things can be generally interpreted as “trades” in the broadest sense. Sometimes trades of favour, reputation, or knowledge.
Moral anti-realism shouldn’t insist that a person’s values are apparent to that person, what they currently think is good. Moral realism likes to declare the dubious assumption that everyone’s values-on-reflection should be the same (in the limit), but hardly uses this assumption. Instead, it correctly points out that values-on-reflection are not the same as currently-apparent-values, that arguments about values are worthwhile. But the same should be the case when we allow (normative) orthogonality, where everyone’s values-on-reflection can (normatively) end up different. Worthwhile arguments can even be provided by one person to another, about that other’s person misunderstanding of their own different values.
It’s easy to conflate three different things:
1. Whether or not there is an objective collection of moral facts
2. Whether or not it is possible to learn objective moral facts
3. Whether or not I should convince someone to believe a certain set of moral facts in a given situation
We can deny (1) with moral subjectivism.
We can accept (1) but deny (2) by claiming that there are objective moral facts, but some (or all) of these are unknowable to some (or all) of humanity (moral realists don’t need to think that they can prove anything to others)
We can accept (1) and (2) but deny (3) by saying that persuasion isn’t always the action that maximizes moral outcomes. Maybe the way to achieve the morally best outcome is actually to convince someone else of some false ideas that end up leading to morally useful actions (e.g. in 1945 we could convince Hitler’s colleagues that it’s righteous in general to backstab your colleagues if it meant one of them might kill Hitler)
So moral realists can accept that others will have other conceptions of good, and believe that the best options are to overpower or trade with those others (rather than convince them). They’re not perfect examples, but we’ve seen many moral realists do this throughout history (e.g. the Crusades). I think whether or not convincing others of your sense of morality is a morality-maximizing action depends a lot on the specifics of your morality and the context you’re in.
I think people who predict significant AI progress and automation often underestimate how human domain experts will continue to be useful for oversight, auditing, accountability, keeping things robustly on track, and setting high-level strategy.
Having “humans in the loop” will be critical for ensuring alignment and robustness, and I think people will realize this, creating demand for skilled human experts who can supervise and direct AIs.
(I may be responding to a strawman here, but my impression is that many people talk as if in the future most cognitive/white-collar work will be automated and there’ll be basically no demand for human domain experts in any technical field, for example.)
Oversight, auditing, and accountability are jobs. Agriculture shows that 95% of jobs going away is not the problem. But AI might be better at the new jobs as well, without any window of opportunity where humans are initially doing them and AI needs to catch up. Instead it’s AI that starts doing all the new things well first and humans get no opportunity to become competitive at anything, old or new, ever again.
Even formulation of aligned high-level tasks and intent alignment of AIs make sense as jobs that could be done well by misaligned AIs for instrumental reasons. Which is not even deceptive alignment, but still plausibly segues into gradual disempowerment or sharp left turn.
I think this criticism doesn’t make sense without some description of the AI progress its conditioning on. Eg in a Tyler Cowen world, I agree. In an Eliezer world I disagree.
Inspired by a number of posts discussing owning capital + AI, I’ll share my own simplistic prediction on this topic:
Unless there is a hostile AI takeover, humans will be able to continue having and enforcing laws, including the law that only humans can own and collect rent from resources. Things like energy sources, raw materials, and land have inherent limits on their availability—no matter how fast AI progresses we won’t be able to create more square feet of land area on earth. By owning these resources, you’ll be able to profit from AI-enabled economic growth as this growth will only increase demand for the physical goods that are key bottlenecks for basically all productive endeavors.
To elaborate further/rephrase: sure, you can replace human programmers with vastly more efficient AI programmers, decreasing the human programmers’ value. In a similar fashion you can replace a lot of human labor. But an equivalent replacement for physical space or raw materials for manufacturing does not exist. With an increase in demand for goods caused by a growing economy, these things will become key bottlenecks and scarcity will increase their price. Whoever owns them (some humans) will be collecting a lot of rent.
Even simpler version of the above: economics traditionally divides factors of production into land, labor, capital, entrepreneurship. If labor costs go toward zero you can still hodl some land.
Besides the hostile AI takeover scenario, why could this be wrong (/missing the point)?
Space has resources people don’t own. The earth’s mantle a couple thousand feet down potentially has resources people don’t own. More to the point maybe, I don’t think humans will be able to continue enforcing laws barring a hostile takeover in the way you seem to think.
Imagine we find out that aliens are headed for earth and will arrive in a few years. Just from the light emissions their probes and expanding civilisation give off, we can infer that they’re obviously more technologically mature than us, probably already engineered themselves to be much smarter than us, and can basically do whatever they want with the atoms that make up our solar system and there’s nothing we can do about it. We don’t know what they want yet though. Maybe they’re friendly?
I think guessing that the aliens will be friendly and share human morality to an extent seems like a pretty specific guess about their minds to be making, and is maybe false more likely than not. But guessing that they don’t care about human preferences or well-being but do care about human legal structures, that they won’t at all help you or gift you things, also won’t disassemble you and your property for its atoms[1], but will try to buy atoms from those whom the atoms belong to according to human legal records, now that strikes me as a really really really specific guess to be making that is very likely false.
Superintelligent AGIs don’t start out having giant space infrastructure, but qualitatively, I think they’d very quickly overshadow the collective power of humanity in a similar manner. They can see paths through the future to accomplish their goals much better than we can, routing around attempts by us to oppose them. The force that backs up our laws does not bind them. If you somehow managed to align them, they might want to follow some of our laws, because they care about them. But if someone managed to make them care about the legal system, they probably also managed to make them care about your well-being. Few humans, I think, would not at all care about other humans’ welfare, but would care about the rule of law, when choosing what to align their AGI with. That’s not a kind of value system that shows up in humans much.
So in that scenario, you don’t need a legal claim to part of the pre-exisiting economy to benefit from the superintelligences’ labours. They will gift some of their labour to you. Say the current value of the world economy is x, owned by humans roughly in proportion to how much money they have, and two years after superintelligence the value of the economy is 101x, with ca.99x of the new surplus owned by aligned superintelligences[2] because they created most of that value, and ca.x owned by rich humans who sold the superintelligence valuable resources and infrastructure to get the new industrial base started faster[3]. The superintelligence will then probably distribute its gains among humans according to some system that either treats conscious minds pretty equally, or follows the idiosyncratic preferences of the faction that aligned it, not according to how large a fraction of the total economy they used to own two years ago. So someone who started out with much more money than you two years ago doesn’t have much more money in expectation now than you do.
For its conserved quantum numbers really
Or owned by whomever the superintelligences take orders from.
You can’t just demand super high share percentages from the superintelligence in return for that startup capital. It’s got all the resource owners in the world as potential bargain partners to compete with you. And really, the only reason it wouldn’t be steering the future into a deal where you get almost nothing, or just steal all your stuff, is to be nice to you. Decision theoretically, this is a handout with extra steps, not a negotiation between equals.
A question in my head is what range of fixed points are possible in terms of different numeric (“monetary”) economic mechanisms and contracts. Seems to me those are a kind of AI component that has been in use since before computers.
Ownership is enforced by physical interactions, and only exists to the degree the interactions which enforce it do. Those interactions can change.
As Lucius said, resources in space are unprotected.
Organizations which hand more of their decision-making to sufficiently strong AIs “win” by making technically-legal moves, at the cost of probably also attacking their owners. Money is a general power coupon accepted by many interactions; ownership deeds are a more specific, narrow one; if the ai systems which enforce these mechanisms don’t systemically reinforce towards outcomes where the things available to buy actually satisfy the preferences of remaining humans who own ai stock or land, then the owners can end up with no not-deadly food and a lot of money, while datacenters grow and grow, taking up energy and land with (semi?-)autonomously self replicating factories or the like—if money-like exchange continues to be how the physical economy is managed in ai to ai interactions, these self replicating factories might end up adapted to make products that the market will buy. but if the majority of the buying power is ai controlled corporations, then figuring out how to best manipulate those ais into buying is the priority. If it isn’t, then manipulating humans into buying is the priority.
It seems to me that the economic alignment problem of guaranteeing everyone is each able to reliably only spend money on things that actually match their own preferences, so that sellers can’t gain economic power by customer manipulation, is an ongoing serious problem that ends up being the weak link in scenarios where AIs manage an economy that uses similar numeric abstractions and contracts (money, ownership, rent) as the current one.
There is a lot of space and raw materials in the universe. AI thinks faster, so technological progress happens faster, which opens up access to new resources shortly after takeoff. Months to years, not decades to centuries.
If, for the sake of argument, we suppose that goods that provide no benefit to humans have no value, then land in space will be less valuable than land on earth until humans settle outside of earth (which I don’t believe will happen in the next few decades).
Mining raw materials from space and using them to create value on earth is feasible, but again I’m less confident that this will happen (in an efficient-enough manner that it eliminates scarcity) in as short of a timeframe as you predict.
However, I am sympathetic to the general argument here that smart-enough AI is able to find more efficient ways of manufacturing or better approaches to obtaining plentiful energy/materials. How extreme this is will depend on “takeoff speed” which you seem to think will be faster than I do.
Why would it take so long? Is this assuming no ASI?
This is actually true, at least in the short term, with the important caveat of the gears of ascension’s comment here:
https://www.lesswrong.com/posts/4hCca952hGKH8Bynt/nina-panickssery-s-shortform#quPNTp46CRMMJoamB
Longer-term, if Adam Brown is correct on how advanced civilizations can change the laws of physics, then effectively no constraints remain on the economy, and the reason why we can’t collect almost all of the rent is because you can drive prices arbitrarily low:
https://www.dwarkeshpatel.com/p/adam-brown
I don’t think “hostile takeover” is a meaningful distinction in case of AGI. What exactly prevents AGI from pulling plan consisting of 50 absolutely legal moves which ends up with it as US dictator?
Perhaps the term “hostile takeover” was poorly chosen but this is an example of something I’d call a “hostile takeover”. As I doubt we would want and continue to endorse an AI-dictator.
Perhaps “total loss of control” would have been better.
Whenever I read yet another paper or discussion of activation steering to modify model behavior, my instinctive reaction is to slightly cringe at the naiveté of the idea. Training a model to do some task only to then manually tweak some of the activations or weights using a heuristic-guided process seems quite un-bitter-lesson-pilled. Why not just directly train for the final behavior you want—find better data, tweak the reward function, etc.?
But actually there may be a good reason to continue working on model-internals control (i.e. ways of influencing model behavior outside of modifying the text input or training process, by directly changing internal state). For some applications, you may want to express something in terms of the model’s own abstractions, something that you won’t know a priori how to do in text or via training data in fine-tuning. Throughout the training process, a model naturally learns a rich semantic activation space. And in some cases, the “cleanest” way to modify its behavior is by expressing the change in terms of its learned concepts, whose representations are sculpted by exaflops of compute.
I always thought the point of activation steering was for safety/alignment/interpretability/science/etc., not capabilities.
Not sure what distinction you’re making. I’m talking about steering for controlling behavior in production, not for red-teaming at eval time or to test interp hypotheses via causal interventions. However this still covers both safety (e.g. “be truthful”) and “capabilities” (e.g. “write in X style”) interventions.
Well, mainly I’m saying that “Why not just directly train for the final behavior you want” is answered by the classic reasons why you don’t always get what you trained for. (The mesaoptimizer need not have the same goals as the optimizer; the AI agent need not have the same goals as the reward function, nor the same goals as the human tweaking the reward function.) Your comment makes more sense to me if interpreted as about capabilities rather than about those other things.
It seems like this applies to some kinds of activation steering (eg steering on SAE features) but not really to others (eg contrastive prompts); curious whether you would agree.
Perhaps. I see where you are coming from. Though I think it’s possible contrastive-prompt-based vectors (eg. CAA) also approximate “natural” features better than training on those same prompts (fewer degrees of freedom with the correct inductive bias). I should check whether there has been new research on this…
Thanks! If you find research that addresses that question, I’d be interested to know about it.
After all, what is an activation steering vector but a weirdly-constructed LoRA with rank 1[1]?
Ok technically they’re not equivalent because LoRAs operate in an input-dependent fashion on activations, while activation steering operates in an input-independent fashion on the activations. But LLMs very consistently have outlier directions in activation space with magnitudes that are far larger than “normal” directions and approximately constant across inputs. LoRA adds \(AB^Tx\\) to the activations. With r=1, you can trivially make BT aligned to the outlier dimension, which allows you to make BTx a scalar with value ≈ 1 (±0.06), which you can project to a constant direction in activation space with A. So given a steering vector, you can in practice make a *basically* equivalent but worse LoRA[2] in the models that exist today.
Don’t ask me how this even came up, and particularly don’t ask me what I was trying to do with serverless bring-your-own-lora inference. If you find yourself going down this path, consider your life choices. This way lies pain. See if you can just use goodfire.
Tinker is an API for LoRA PEFT. You don’t mention it directly, but it’s trendy enough that I thought your comment was a reference to it.
Several such APIs exist. My thought was “I’d like to play with the llamascope SAE features without having to muck about with vllm, and together lets you upload a LoRA directly”, and I failed to notice that the SAE was for the base model and together only supports LoRAs for the instruct model.
The fun thing about this LoRA hack is that you don’t actually have to train the LoRA, if you know the outlier direction+magnitude for your model and the activation addition you want to apply you can write straight to the weights. The unfun thing is that it’s deeply cursed and also doesn’t even save you from having to mess with vllm.
Edit: on reflection, I do think rank 1 LoRAs might be an underappreciated interpretability tool.
The motte and bailey of transhumanism
Most people on LW, and even most people in the US, are in favor of disease eradication, radical life extension, reduction of pain and suffering. A significant proportion (although likely a minority) are in favor of embryo selection or gene editing to increase intelligence and other desirable traits. I am also in favor of all these things. However, endorsing this form of generally popular transhumanism does not imply that one should endorse humanity’s succession by non-biological entities. Human “uploads” are much riskier than any of the aforementioned interventions—how do we know if we’ve gotten the upload right, how do we make the environment good enough without having to simulate all of physics? Successors that are not based on human emulation are even worse. Deep learning based AIs are detached from the lineage of humanity in a clear way and are unlikely to resemble us internally at all. If you want your descendants to exist (or to continue existing yourself), deep learning based AI is no equivalent.
Succession by non-biological entities is not a natural extension of “regular” transhumanism. It carries altogether new risks and in my opinion would almost certainly go wrong by most current people’s preferences.
The term “posthumanism” is usually used to describe “succession by non-biological entities”, for precisely the reason that it’s a distinct concept, and a distinct philosophy, from “mere” transhumanism.
(For instance, I endorse transhumanism, but am not at all enthusiastic about posthumanism. I don’t really have any interested in being “succeeded” by anything.)
That makes sense, I just often see these ideas conflated in popular discourse.
I find this position on ems bizarre. If the upload acts like a human brain, and then also the uploads seem normalish after interacting with them a bunch, I feel totally fine with them.
I also am more optimistic than you about creating AIs that have very different internals but that I think are good successors, though I don’t have a strong opinion.
I am not philosophically opposed to ems, I just think they will be very hard to get right (mainly because of the environment part—the em will be interacting with a cheap downgraded version of the real world). I am willing to change my mind on this. I also don’t think we should avoid building ems, but I think it’s highly unlikely an em life will ever be as good as or equivalent to a regular human life so I’d not want my lineage replaced with ems.
In contrast to my point on ems, I do think we should avoid building AIs whose main purpose is to be equivalent to (or exceed) humans in “moral value”/pursue anything that resembles building “AI successors”. Imo the main purpose of AI alignment should be to ensure AIs help us thrive and achieve our goals rather than to attempt to embed our “values” into AIs with the goal of promoting our “values” independently of our existence. (Values is in scare quotes because I don’t think there’s such a thing as human values—individuals differ a lot in their values, goals, and preferences.)
Would you be convinced if you talked to the ems a bunch and they reported normal, happy, fun lives? (Assuming nothing nefarious happened in terms of e.g. modifying their brains to report that.) I think I would find that very convincing. If you wouldn’t find that convincing, what would you be worried was missing?
I would find that reasonably convincing, yes (especially because my prior is already that true ems would not have a tendency to report their experiences in a different way from us).
i want drastically upgraded biology, potentially with huge parts of the chemical stack swapped out in ways I can only abstractly characterize now without knowing what the search over viable designs will output. but in place, without switching to another substrate. it’s not transhumanism, to my mind, unless it’s to an already living person. gene editing isn’t transhumanism, it’s some other thing; but shoes are transhumanism for the same reason replacing all my cell walls with engineered super-bio nanotech that works near absolute zero is transhumanism. only the faintest of clues what space an ASI would even be looking in to figure out how to do that, but it’s the goal in my mind for ultra-low-thermal-cost life. uploads are a silly idea, anyway, computers are just not better at biology than biology. anything you’d do with a computer, once you’re advanced enough to know how, you’d rather do by improving biology
I share a similar intuition but I haven’t thought about this enough and would be interested in pushback!
You can do gene editing on adults (example). Also in some sense an embryo is a living person.
IMO the whole “upload” thing changes drastically depending on our understanding of consciousness and continuity of the self (which is currently nearly non-existent). It’s like teleportation—I would let neither that nor upload happen to me willingly unless someone was able to convincingly explain me how precisely are my qualia associated with my brain and how they’re going to move over (rather than just killing me and creating a different entity).
I don’t believe it’s impossible for an upload to be “me”. But I doubt it’d be as easy as simply making a scan of my synapses and calling it a day. If it is, and if that “me” is then also infinitely copiable, I’d be very ambivalent about it (given all the possible ways it could go horribly wrong—see this story or the recent animated show Pantheon for ideas).
So it’s definitely a “ok, but” position for me. Would probably feel more comfortable with a “replace my brain bit by bit with artificial functional equivalents” scenario as one that preserves genuine continuity of self.
I think a big reason why uploads may be much worse than regular life is not that the brain scan will be not good enough but that they won’t be able to interact with the real world like you can as a physical human.
Edit: I guess with sufficiently good robotics the ems would be able to interact with the same physical world as us in which case I would be much less worried.
I’d say even simply a simulated physical environment could be good enough to be indistinguishable. As Morpheus put it:
Of course, that would require insane amounts of compute, but so would a brain upload in the first place anyway.
I feel like this position is… flimsy? Unsubstantial? It’s not like I disagree, I don’t understand why you would want to articulate it in this way.
On the one hand, I don’t think biological/non-biological distinction is very meaningful from transhumanist perspective. Is embryo, genetically modified to have +9000IQ, going to be meaningfully considered “transhuman” instead of “posthuman”? Are you going to still be you after one billion years of life extension? “Keeping relevant features of you/humanity after enormous biological changes” seems to be qualitatively the same to “keeping relevant features of you/humanity after mind uploading”—i.e., if you know at gears-level what features of biological brains are essential to keep, you have rough understanding what you should work on in uploading.
On the other hand, I totally agree that if you don’t feel adventurous and you don’t want to save the world at price of your personality death, it would be a bad idea to undergo uploading in a way that closest-to-modern technology can provide. It just means that you need to wait for more technological progress. If we are in the ballpark of radical life extension, I don’t see any reason to not wait 50 years to perfect upload tech and I don’t see any reason why 50 years are not going to be enough, conditional on at least normally expected technical progress.
The same with AIs. If we have children, who are meaningfully different from us, and who can become even more different in glorious transhumanist future, I don’t see reasons to not have AI children, conditional on their designs preserving all important relevant features we want to see in our children. The problem is that we are not on track to create such designs, not conceptual existence of such designs.
And all said seems to be simply deducible/anticipated from concept of transhumanism, i.e., concept that the good future is the one filled with beings capable to meaningfully say that they were Homo Sapiens and stopped being Homo Sapiens at some point of their life. When you say “I want radical life extension” you immediately run into question “wait, am I going to be me after one billion years of life extension?” and you start The Way through all the questions about self-identity, essense of humanity, succession, et cetera.
I am going to post about biouploading soon – where the uploading is happened into (or via) a distributed net of my own biological neurons. This combines good things about uploading – immortality, ability to be copied, easy to repair, and good things about being biological human – preserving infinite complexity, exact sameness of a person, guarantee that the bioupload will have human qualia and any other important hidden things which we can miss.
Like with AGI, risks are a reason to be careful, but not a reason to give up indefinitely on doing it right. I think superintelligence is very likely to precede uploading (unfortunately), and so if humanity is allowed to survive, the risks of making technical mistakes with uploading won’t really be an issue.
I don’t see how this has anything to do with “succession” though, there is a world of difference between developing options and forcing them on people who don’t agree to take them.
Criticism quality-valence bias
Something I’ve noticed from posting more of my thoughts online:
People who disagree with your conclusion to begin with are more likely to carefully read and point out errors in your reasoning/argumentation, or instances where you’ve made incorrect factual claims. Whereas people who agree with your conclusion before reading are more likely to consciously or subconsciously gloss over any flaws in your writing because they are onboard with the “broad strokes”.
So your best criticism ends up coming with a negative valence, i.e. from people who disagree with your conclusion to begin with.
(LessWrong has much less of this bias than other places, though I still see some of it.)
Thus a better way of framing criticism is to narrowly discuss some issue with reasoning, putting aside any views about the conclusion, leaving its possible reevaluation an implicit exercise for the readers.
Could HGH supplementation in children improve IQ?
I think there’s some weak evidence that yes. In some studies where they give HGH for other reasons (a variety of developmental disorders, as well as cases when the child is unusually small or short), an IQ increase or other improved cognitive outcomes are observed. The fact that this occurs in a wide variety of situations indicates that it could be a general effect that could apply to healthy children.
Examples of studies (caveat: produced with the help of ChatGPT, I’m including null results also). Left column bolded when there’s a clear cognitive outcome improvement.
I would also suggest looking at IGF-1. You can reach out to me; this topic interests me and I have a lot of experience working with HGH and IGF-1 (including a world record).
https://pubmed.ncbi.nlm.nih.gov/16263982/
has it been tested on adults a lot?
On optimizing for intelligibility to humans (copied from substack)
I wonder, in the unlikely case that the AI progress would stop, and we would be left with AIs exactly as smart as they are now, whether that would completely ruin software development.
We would soon have tons of automatically generated software that is difficult for humans to read. People developing new libraries would be under smaller pressure to make them legible, because as long as they can be understood by AIs, who cares. Paying a human to figure this out would be unprofitable, because running the AI thousand times and hoping that it gets it right once would be cheaper. Etc.
Current LLM coding agents are pretty bad at noticing that a new library exists to solve a problem in the first place, and at evaluating whether an unfamiliar library is fit for a given task.
As long as those things remain true, developers of new libraries wouldn’t be under much pressure in any direction, besides “pressure to make the LLM think their library is the newest canonical version of some familiar lib”.
Think clearly about the current AI training approach trajectory
If you start by discussing what you expect to be the outcome of pretraining + light RLHF then you’re not talking about AGI or superintelligence or even the current frontier of how AI models are trained. Powerful, general AI requires serious RL on a diverse range of realistic environments, and the era of this has just begun. Many startups are working on building increasingly complex, diverse, and realistic training environments.
It’s kind of funny that so much LessWrong arguing has been around why a base model might start trying to take over the world. When that’s beyond the point. Of course we will eventually start RL’ing models on hard, real-world goals.
Example post / comment to illustrate what I mean.
What, concretely, is being analogized when we compare AI training to evolution?
People (myself included) often handwave what is being analogized when it comes to comparing evolution to modern ML. Here’s my attempt to make it concrete:
Both are directed search processes (hence the analogy)
Search space: possible genes vs. possible parameter configurations
Direction of search: stuff that survives and increases in number vs. stuff that scores well on loss function
Search algorithm: random small steps vs. locally greedy+noisy steps
One implication of this is that we should not talk about whether one or another species tries to survive and increase in number (“are humans aligned with evolution’s goals?”) but rather whether genetic material/individual genes are doing so.
Have you read the evolution sequence? I think it does a good job of explaining why the direction of change isn’t quite toward stuff that survives and increases in number.
No I have not, will take a look
Actually maybe I have but forgot its contents haha
Edit: Wait it is super long, could you more succinctly explain where I’m going wrong?
Was recently reminded of these excellent notes from Neel Nanda that I came across when first learning ML/MI. Great resource.