Lukas_Gloor

Karma: 3,216

Lukas_Gloor 20 Nov 2023 18:13 UTC
46 points
26
in reply to: orthonormal’s comment on: OpenAI: Facts from a Weekend
Yeah but if this is the case, I’d have liked to see a bit more balance than just retweeting the tribal-affiliation slogan (“OpenAI is nothing without its people”) and saying that the board should resign (or, in Ilya’s case, implying that he regrets and denounces everything he initially stood for together with the board). Like, I think it’s a defensible take to think that the board should resign after how things went down, but the board was probably pointing to some real concerns that won’t get addressed at all if the pendulum now swings way too much in the opposite direction, so I would have at least hoped for something like “the board should resign, but here are some things that I think they had a point about, which I’d like to see to not get shrugged under the carpet after the counter-revolution.”

Lukas_Gloor 11 Apr 2023 21:21 UTC
44 points
11
on: Evolution provides no evidence for the sharp left turn
I like the reasoning behind this post, but I’m not sure I buy the conclusion. Here’s an attempt at excavating why not:
If I may try to paraphrase, I’d say your argument has two parts:

(1) Humans had a “sharp left turn” not because of some underlying jump in brain capabilities, but because of shifting from one way of gaining capabilities to another (from solo learning to culture).

(2) Contemporary AI training is more analogous to “already having culture,” so we shouldn’t expect that things will accelerate in ways ML researchers don’t already anticipate based on trend extrapolations.

Accordingly, we shouldn’t expect AIs to get a sharp left turn.

I think I buy (1) but I’m not sure about (2).

Here’s an attempt at arguing that AI training will still get a “boost from culture.” If I’m right, it could even be the case that their “boost from culture” will be larger than it was for early humans because we now have a massive culture overhang.

Or maybe “culture” isn’t the right thing exactly, and the better phrase is something like “generality-and-stacking-insights-on-top-of-each-other threshold from deep causal understanding.” If we look at human history, it’s not just the start of cultural evolution that stands out – it’s also the scientific revolution! (A lot of cultural evolution worked despite individual humans not understanding why they do the things that they do [Henrich’s “The Secret of our Success] – by contrast, science is different and requires at least some scientists to understand deeply what they’re doing.)

My intuition is that there’s an “intelligence” threshold past which all the information on the internet suddenly becomes a lot more useful. When Nate/MIRI speak of a “sharp left turn,” my guess is that they mean some understanding-driven thing. (And it has less to do with humans following unnecessarily convoluted rules about food preparation that they don’t even understand the purpose of, but following the rules somehow prevents them from poisoning themselves.) It’s not “culture” per se, but we needed culture to get there (and maybe it matters “what kind of culture” – e.g., education with scientific mindware).

Elsewhere, I expressed it as follows (quoting now from text I wrote elsewhere):
I suspect that there’s a phase transition that happens when agents get sufficiently good at what Daniel Kokotajlo and Ramana Kumar call “P₂B” (a recursive acronym for “Plan to P₂B Better”). When it comes to “intelligence,” it seems to me that we can distinguish between “learning potential” and “trained/crystallized intelligence” (or “competence”). Children who grow up in an enculturated/learning-friendly setting (as opposed to, e.g., feral children or Helen Keller before she met her teacher) reach a threshold where their understanding of the world and their thoughts becomes sufficiently deep to kickstart a feedback loop. Instead of aimlessly absorbing what’s around them, they prioritize learning the skills and habits of thinking that seem beneficial according to their goals. In this process, slight differences in “learning potential” can significantly affect where a person ends up in their intellectual prime. So, “learning potential” may be gradual, but above a specific threshold (humans above, chimpanzees below), there’s a discontinuity in how it translates to “trained/crystallized intelligence” after a lifetime of (self-)directed learning. Moreover, it seems that we can tell that the slope of the graph (y-axis: “trained/crystallized intelligence;” x-axis: “learning potential”) around the human range is steep.
To quote something I’ve written previously:
“If the child in the chair next to me in fifth grade was slightly more intellectually curious, somewhat more productive, and marginally better dispositioned to adopt a truth-seeking approach and self-image than I am, this could initially mean they score 100%, and I score 95% on fifth-grade tests – no big difference. But as time goes on, their productivity gets them to read more books, their intellectual curiosity and good judgment get them to read more unusually useful books, and their cleverness gets them to integrate all this knowledge in better and increasingly more creative ways. [...] By the time we graduate university, my intellectual skills are mostly useless, while they have technical expertise in several topics, can match or even exceed my thinking even on areas I specialized in, and get hired by some leading AI company.
[...]
If my 12-year-old self had been brain-uploaded to a suitable virtual reality, made copies of, and given the task of devouring the entire internet in 1,000 years of subjective time (with no aging) to acquire enough knowledge and skill to produce novel and for-the-world useful intellectual contributions, the result probably wouldn’t be much of a success. If we imagined the same with my 19-year-old self, there’s a high chance the result wouldn’t be useful either – but also some chance it would be extremely useful. [...] I think it’s at least plausible that there’s a jump once the copies reach a level of intellectual maturity to make plans which are flexible enough [...] and divide labor sensibly [...].”
In other words, I suspect there’s a discontinuity at the point where the P₂B feedback loop hits its critical threshold.
So, my intuition here is that we’ll see phase change once AIs reach the kind of deeper understanding of things that allows them to form better learning strategies. That phase transition will be similar in kind to going from no culture to culture, but it’s more “AIs suddenly grokking rationality/science to a sufficient-enough degree that they can stack insights with enough reliability to avoid deteriorating results.” (Once they grok it, the update permeates to everything they’ve read – since they read large parts of the internet, the result will be massive.)

I’m not sure what all this implies about values generalizing to new contexts / matters of alignment difficulty. You seem open to the idea of fast takeoff through AIs improving training data, which seems related to my notion of “AIs get smart enough to notice on their own what type of internet-text training data is highest quality vs what’s dumb or subtly off.” So, maybe we don’t disagree much and your objection to the “sharp left turn” concept has to do with the connotations it has for alignment difficulties.
What links here?
- Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn by Zvi (5 Oct 2023 11:39 UTC; 129 points)

Lukas_Gloor 11 Nov 2021 19:27 UTC
LW: 43 AF: 3
AF
in reply to: adamShimi’s comment on: Discussion with Eliezer Yudkowsky on AGI interventions
I share the impression that the agent foundations research agenda seemed not that important. But that point doesn’t feel sufficient to argue that Eliezer’s pessimism about the current state of alignment research is just a face-saving strategy his brain tricked him into adopting. (I’m not saying you claimed that it is sufficient; probably a lot of other data points are factoring into your judgment.) MIRI have deprioritized agent foundations research for quite a while now. I also just think it’s extremely common for people to have periods where they work on research that eventually turns out to be not that important; the interesting thing is to see what happens when that becomes more apparent. I immediately trust people more if I see that they are capable of pivoting and owning up to past mistakes, and I could imagine that MIRI deserves a passing grade on this, even though I also have to say that I don’t know how exactly they nowadays think about prioritization in 2017 and earlier.

I really like Vaniver’s comment further below:
For what it’s worth, my sense is that EY’s track record is best in 1) identifying problems and 2) understanding the structure of the alignment problem.
And, like, I think it is possible that you end up in situations where the people who understand the situation best end up the most pessimistic about it.
I’m very far away from confident that Eliezer’s pessimism is right, but it seems plausible to me. Of course, some people might be in the epistemic position of having tried to hash out that particular disagreement on the object level and have concluded that Eliezer’s pessimism is misguided – I can’t comment on that. I’m just saying that based on what I’ve read, which is pretty much every post and comment on AI alignment on LW and the EA forum, I don’t get the impression that Eliezer’s pessimism is clearly unfounded.

Everyone’s views look like they are suspiciously shaped to put themselves and their efforts into a good light. If someone believed that their work isn’t important or their strengths aren’t very useful, they wouldn’t do the work and wouldn’t cultivate the strengths. That applies to Eliezer, but it also applies to the people who think alignment will likely be easy. I feel like people in the latter group would likely be inconvenienced (in terms of the usefulness of their personal strengths or the connections they’ve built in the AI industry, or past work they’ve done), too, if it turned out not to be.

Just to give an example on the sorts of observations that make me think Eliezer/”MIRI” could have a point:
- I don’t know what happened with a bunch of safety people leaving OpenAI but it’s at least possible to me that it involved some people having had negative updates on the feasibility of a certain type of strategy that Eliezer criticized early on here. (I might be totally wrong about this interpretation because I haven’t talked to anyone involved.)
- I thought it was interesting when Paul noted that our civilization’s Covid response was a negative update for him on the feasibility of AI alignment. Kudos to him for noting the update, but also: Isn’t that exactly the sort of misprediction one shouldn’t be making if one confidently thinks alignment is likely to succeed? (That said, my sense is that Paul isn’t even at the most optimistic end of people in the alignment community.)
- A lot of the work in the arguments for alignment being easy seems to me to be done by dubious analogies that assume that AI alignment is relevantly similar to risky technologies that we’ve already successfully invented. People seem insufficiently quick to get to the actual crux with MIRI, which makes me think they might not be great at passing the Ideological Turing Test. When we get to the actual crux, it’s somewhere deep inside the domain of predicting the training conditions for AGI, which feels like the sort of thing Eliezer might be good at thinking about. Other people might also be good at thinking about this, but then why do they often start their argument with dubious analogies to past technologies that seem to miss the point?
  [Edit: I may be strawmanning some people here. I have seen direct discussions about the likelihood of treacherous turns vs. repeated early warnings of alignment failure. I didn’t have a strong opinion either way, but it’s totally possible that some people feel like they understand the argument and confidently disagree with Eliezer’s view there.]

Lukas_Gloor 13 Nov 2022 11:05 UTC
41 points
25
in reply to: Lukas_Gloor’s comment on: Noting an unsubstantiated communal belief about the FTX disaster
I don’t think this changes anything. It’s still possible for someone with EA motivations to have dark triad traits, so I wouldn’t say “he was motivated by EA principles” implies that the same thing could’ve happened to almost anyone with EA principles. (What probably could’ve happened to more EAs is being complicit in the inner circle as lieutenants.)
“Feeling good about being a hero” is a motivation that people with dark triad traits can have just like anyone else. (The same goes for being deeply interested and obsessed with certain intellectual pursuits, like moral philosophy or applying utilitarianism to your life.) Let’s assume someone has a dark triad personality. I model people like that as the same as a more neurotypical person except that they:
- Feel the same way I feel about people I find annoying and unsympathetic about 99.9-100% of people.
- Don’t have any system-1 fear of bad consequences. Don’t have any worries related to things like guilt or shame (or maybe do have issues around shame but it expresses itself more in externalizing negative emotions like jealousy, spite).
- Find it uncannily easy to move on from close relationships or shut empathy on and off at will as circumstances change regarding what’s advantageous for them (if they ever form closer connections in the first place).
There are more factors that are different, but with some of the factors you wonder if they’re just consequences of the above. For instance, being power-hungry: if you can’t find meaning in close relationships, what else is there to do? Or habitual lying: if you find nearly everyone unsympathetic and annoying and you don’t experience the emotion of guilt, you probably find it easier (and more pleasant) to lie.
In short, I think people with dark triad traits lack a bunch of prosocial system-1 stuff, but they can totally aim to pursue system-2 goals like “wanting to be a hero” like anyone else.
(Maybe this is obvious, but sometimes I hear people say “I can’t imagine that he isn’t serious about EA” as though it makes other things about someone’s character impossible, which is not true.)
What links here?
- Lukas_Gloor's comment on Linkpost: SBF sentenced to 25 years jail by Deborah W.A. Foulkes (EA Forum; 29 Mar 2024 1:11 UTC; 31 points)
- Lukas_Gloor's comment on Social Dark Matter by [DEACTIVATED] Duncan Sabien (17 Nov 2023 11:39 UTC; 29 points)

Lukas_Gloor 19 Dec 2023 18:35 UTC
39 points
17
on: Effective Aspersions: How the Nonlinear Investigation Went Wrong
Very thoughtful post. I liked that you delved into this out of interest even though you aren’t particularly involved in this community, but then instead of just treating it as fun but unproductive gossip, you used your interest to make a high-value contribution!

It changed my mind in some places (I had a favorable reaction to the initial post by Ben; also, I still appreciate what Ben tried to do).

I will comment on two points that I didn’t like, but I’m not sure to what degree this changes your recommended takeaways (more on this below).
They [Kat and Emerson] made a major unforced tactical error in taking so long to respond and another in not writing in the right sort of measured, precise tone that would have allowed them to defuse many criticisms.
I don’t like that this sounds like this is only (or mostly) about tone.

I updated that the lawsuit threat was indeed more about tone than I initially thought. I initially thought that any threat of a lawsuit is strong evidence that someone is a bad actor. I now think it’s sometimes okay to mention the last resort of lawsuits if you think you’re about to be defamed.
At the same time, I’d say it was hard for Lightcone to come away with that interpretation when Emerson used phrases like ‘maximum damages permitted by law’ (a phrasing optimized for intimidation). Emerson did so in the context where one of the things he was accused of was unusually hostile negotiation and intimidation tactics! So, given the context and “tone” of the lawsuit threat, I feel like it made a lot of sense for Lightcone to see their worst concerns about Emerson “confirmation-boosted” when he made that lawsuit threat.
In any case, and more to my point about tone vs other things, I want to speak about the newer update by Nonlinear that came three months after the original post by Ben. Criticizing tone there is like saying “they lack expert skills at defusing tensions; not ideal, but also let’s not be pedantic.” It makes it sound like all they need to become great bosses is a bit of tactfulness training. However, I think there are more fundamental things to improve on, and these things lend a bunch of credibility to why someone might have a bad time working with them. (Also, they had three months to write that post, and it’s really quite optimized for presentation in several ways, so it’s not like we should apply low standards for this post.) I criticized some aspects of their post here and here. In short, I feel like they reacted by (1) conceding little to no things they could have done differently and (2), going on the attack with outlier-y black-and-white framings against not just Alice, but also Chloe, in a way that I think is probably more unfair/misleading/uncharitable about Chloe than what Chloe said about them. (I say “probably” because I didn’t spend a lot of time re-reading Ben’s original post and trying to separate which claims were made by Alice vs Chloe, doing the same about Nonlinear’s reply, and filtering out whether they’re ascribing statements to Chloe with their quotes-that-aren’t-quotes that she didn’t actually say.) I think that’s a big deal because their reaction pattern-matches to how someone would react if they did indeed have a “malefactor” pattern of frequently causing interpersonal harm. Just like it’s not okay to make misleading statements about others solely because you struggled with negative emotions in their presence, it’s also (equally) not okay to make misleading statements solely because someone is accusing you of being a bad boss or leader. It can be okay to see red in the heat of battle, but it’s an unfortunate dynamic because it blurs the line between people who are merely angry and hurt and people who are character-wise incapable of reacting appropriately to appropriate criticism. (This also goes into the topic of “adversarial epistemology” – if you think the existence of bad actors is a sufficient problem, you want to create social pressure for good-but-misguided actors to get their shit together and stop acting in a way/pattern that lends cover to bad actors.)

Eliezer recently re-tweeted this dismissive statement about DARVO. I think this misses the point. Sure, if the person who accuses you is a malicious liar or deluded to a point where it has massively destructive effects and is a pattern, then, yeah, you’re forced to fight back. So, point taken: sometimes the person who appears like the victim initially isn’t actually the victim. However, other times the truth is at least somewhat towards the middle, i.e., the person accusing you of something may have some points. In that case, you can address what happened without character-assassinating them in return, especially if you feel like you had a lot of responsibility in them having had a bad time. Defending Alice is not the hill I want to die on (although I’m not saying I completely trust Nonlinear’s picture of her), but I really don’t like the turn things took towards Chloe. I feel like it’s messed up that several commenters (at one point my comment here had 9 votes and −5 overall karma, and high disagreement votes) came away with the impression that it might be appropriate to issue a community-wide warning about Chloe as someone with a pattern of being destructive (and de-anonymizing her, which would further send the signal that the community considers her a toxic person). I find that a really scary outcome for whistleblower norms in the community. Note that this isn’t because I think it’s never appropriate to de-anonymize someone.

Here are the list of values that are important to me about this whole affair and context:
- I want whistleblower-type stuff to come to light because I think the damage bad leaders can do is often very large
- I want investigations to be fair. In many cases, this means giving accused parties time to respond
- I understand that there’s a phenotype of personality where someone has a habit of bad-talking others through false/misleading/distorted claims, and I think investigations (and analysis) should be aware of that
(FWIW, I assume that most people who vehemently disagree with me about some of the things I say in this comment and elsewhere would still endorse these above values.)

So, again, I’m not saying I find this a scary outcome because I have a “always believe the victim” mentality. (Your post fortunately doesn’t strawman others like that, but there were comments on Twitter and facebook that pushed this point, which I thought was uncalled for.)

Instead, consider for a moment the world where I’m right that:
- Chloe isn’t a large outlier in any relevant way of personality, except perhaps she was significantly below average at standing up for her interests/voicing her boundaries (for which it might even be possible that it was selected for in the Nonlinear hiring process)
This is what I find most plausible based on a number of data points. In that world, I think something about the swing of the social pendulum went wrong when the result of Chloe sharing her concerns makes things worse for her. (I’m not saying this is currently the case – I’m saying it would be the case if we fully bought into Nonlinear’s framing or the people who make the most negative comments about both Chloe and Alice, without flagging that many people familiar with the issue thought that Alice was a less reliable narrator than Chloe, etc.)

Of course, I focused a lot on a person who is currently anonymized. Fair to say that this is unfair given that Nonlinear have their reputation at stake all out in the open. Like I said elsewhere, it’s not like I think they deserved the full force of this.
These are tough tradeoffs to make. Unfortunately, we need some sort of policy to react to people who might be bad leaders. Among all the criticisms about Ben’s specific procedure, I don’t want this part to be de-emphasized.
The community mishandled this so badly and so comprehensively that inasmuch as Nonlinear made mistakes in their treatment of Chloe or Alice, for the purposes of the EA/LW community, the procedural defects have destroyed the case.
I’m curious what you mean by the clause “for the purposes of the EA/LW community.” I don’t want to put words into your mouth, but I’d be sympathetic to a claim that goes as follows. From a purely procedural perspective about what a fair process should look like for a community to decide that a particular group should be cut out from the community’s talent pipeline (or whatever harsh measure people want to consider), it would be unfair to draw this sort of conclusion against Nonlinear based on the too many flaws in the process used. If that’s what you’re saying, I’m sympathetic to that at the very least in the sense of “seems like a defensible view to me.” (And maybe also overall – but I find it hard to think about this stuff and I’m a bit tired of the affair.)
At the same time, I feel like, as a private individual, it’s okay to come away with confident beliefs (one way or the other) from this whole thing. It takes a higher bar of evidence (and assured fairness of procedure) to decide “the community should act as though x is established consensus” than it takes to yourself believe x.
What links here?
- Lukas_Gloor's comment on Effective Aspersions: How the Nonlinear Investigation Went Wrong by TracingWoodgrains (EA Forum; 20 Dec 2023 2:00 UTC; 33 points)

Lukas_Gloor 28 Dec 2021 15:35 UTC
33 points
in reply to: Kaj_Sotala’s comment on: Why did Europe conquer the world?
I read some of Diamond’s books as a teenager, and not much else on the topic. It’s unclear to me if those critiques are saying “It would have been better if you hadn’t read anything at all” or if they’re saying “Diamond oversimplified things a lot.”
The rhetoric in the critiques seems to imply the former, but then the specific arguments are more about details of case studies rather than something that necessarily refutes the general thrust of Diamond’s thesis?
I mean, I think it’s a common view that if you tell a just-so-story and get empirical claims importantly wrong, your entire view is now refuted. However, I think there are occasions where the point of just-so-stories is more like “Something along those lines has to be true” rather than “This is how it must have happened.” And it seems possible to me to know enough about a topic to make a claim like “Geographical determinism is mostly right” even while being mistaken about some of the specifics. It’s kind of similar to evolutionary psychology and claims like “Women are more likely to cheat with high-testosterone men” and then later it turns out that this finding doesn’t replicate. Does that now mean that evolutionary psychology is wrong? Not really.
Of course, it’s obviously important to have good scholarship skills and get the details right! I’m just wondering about how far-reaching the update should be from learning of these critiques.

I’ve taken some anthropology classes at uni before social justice culture went more mainstream and already found the field (or at least the corners of it that I had access to) to be very “ideological” with respect to anything related to power/conquest, etc., and I got the same impressions more recently from broader observations. From what I remember, Diamond made it clear in his books that he thinks geographical determinism is an antidote to racist or colonialist thinking. The first person you quote seems to acknowledge that with the phrase “though I do not believe this was his intent.” Still, the author of that passage seems to think that Diamond was subconsciously motivated to paint some groups as inherently inferior, or something like that. And I don’t understand why they think that. For instance, there’s this part of the passage:
[..] too stupid to invent the key technological advances used against them, and doomed to die because they failed to build cities, domesticate animals and thereby acquire infectious organisms.
Isn’t the whole point that stupidity has nothing to do with (e.g.) not domesticating an animal you cannot find in your region? If your continent only has useless marsupials you’re not gonna be able to domesticate a mammal, no matter how clever you are.
This example reinforces my expectation that books like GG&S are generally poorly received in fields that react allergically to any investigation into the underlying causes of conquests or of inequality, whether that research is ill-motivated or not. That doesn’t automatically say that the critiques are overstated, but it contributes to my uncertainty about how to update.
tl;dr I’ve read those quotes but feel unsure how much to update, partly because it’s common for people to have probably-misguided methodological objections to the type of thing Diamond was trying to do (ambitious theorizing about underlying drivers of history) and partly because of ideological currents in the fields of anthropology and sociology.

Lukas_Gloor 9 Apr 2024 15:03 UTC
32 points
20
on: Medical Roundup #2
Assisted Suicide Watch
A psychiatrist overstepping their qualifications by saying “It’s never gonna get any better” ((particularly when the source of the suffering is at least partly BPD, for which it’s commonly known that symptoms can get better in someone’s 40s)) clearly should never happen.
However, I’d imagine that most mental health professionals would be extremely careful when making statements about whether there’s hope for things to get better. In fact, there are probably guidelines around that.
Maybe it didn’t happen this way at all: I notice I’m confused.
This could just be careless reporting by the newspaper.
The article says:
She recalled her psychiatrist telling her that they had tried everything, that “there’s nothing more we can do for you. It’s never gonna get any better.”
Was it really the psychiatrist who added “It’s never gonna get any better,” or was it just that the psychiatrist said “There’s nothing more we can do for you,” and then Zoraya herself (the person seeking assisted suicide) told the reporters her conclusion “It’s never going to get any better,” and the reporters wrote it as though she ascribed those words to the psychiatrist?
In any case, this isn’t a proper “watch” (“assisted suicide watch”) if you only report when you find articles that make the whole thing seem slippery-slopy. (And there’s also a question of “how much is it actually like that?” vs “How much is it in the reporting” – maybe the reporter had their own biases in writing it like that. For all we know, this person, Zoraya, has had this plan for ever since she was a teenager, and gave herself 25 years to stop feeling suicidal, and now it’s been enough. And the reporter just chose to highlight a few things that sound dramatic, like the bit about not wanting to inconvenience the boyfriend with having to keep the grave tidy.)
I feel like the response here should be: Think hard about what sorts of guidelines we can create for doctors or mental health professionals to protect against risks of sliding down a slippery slope. It’s worth taking some risks because it seems really bad as well to err in the other direction (as many countries and cultures still do). Besides, it’s not straightforwardly evidence of a slippery slope simply because the numbers went up or seem “startling,” as the article claims. These developments can just as plausibly be viewed as evidence for, “startlingly many people suffer unnecessarily and unacceptably without these laws.” You have to look into the details to figure out which one it is, and it’s gonna be partly a values question rather than something we can settle empirically.
There are other written-about cases like Lauren Hoeve quite recently, also from Netherlands, who’d suffered from debilitating severe myalgic encephalomyelitis (ME) for five years and began her assisted suicide application in 2022. Anyone interested in this topic should probably go through more of these accounts and read sources directly from the people themselves (like blogposts explaining their decision) rather than just media reporting about it.

Lukas_Gloor 15 Apr 2023 23:54 UTC
32 points
16
on: Moderation notes re: recent Said/Duncan threads
Said’s way of asking questions, and the uncharitable assumptions he sometimes makes, is one of the most off-putting things I associate with LW. I don’t find it okay myself, but it seems like the sort of thing that’s hard to pin down with legible rules. Like, if he were to ask me “what is it that you don’t like, exactly” – I feel like it’s hard to pin down.

Edit: So, on the topic of moderation policy, seems like the option that individual users can ban specific other users if they have trouble dealing with their style or just if conflicts happen, that seems like a good solution to me. And I don’t think it should reflect poorly on the banner (unless they ban an extraordinary number of other users).

Lukas_Gloor 24 May 2023 14:41 UTC
31 points
43
on: My May 2023 priorities for AI x-safety: more empathy, more unification of concerns, and less vilification of OpenAI
I think you’re giving LeCun way too much credit if you’re saying his arguments are so bad now because other people around him were hostile and engaged in a low-quality way. Maybe those things were true, but that doesn’t excuse stuff like repeating bad arguments after they’ve been pointed out or confidently proclaiming that we have nothing to worry about based on arguments that obviously don’t hold up.

Lukas_Gloor 21 Feb 2023 22:19 UTC
30 points
14
on: Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky
I liked the sincerity of the podcast hosts and how they adjusted their plan for the podcast by not going into “importance of AI for crypto” questions after realizing that Eliezer’s view is basically “we all seem doomed, so obviously stuff like crypto is a bit pointless to talk about.”

Lukas_Gloor 17 Nov 2023 11:39 UTC
29 points
16
in reply to: Viliam’s comment on: Social Dark Matter
Many people are psychopaths, but most psychopaths do not lack empathy… they just disagree with some effective altruist ideas?
Lacking affective empathy is arguably one of the most defining characteristics of psychopathy, so I don’t think this is a good example.
Instead, think of all the peripheral things that people associate with psychopathy and that probably do correlate with it (and often serve as particularly salient examples or cause people to get “unmasked”), but may not always go together. Take those away.
For instance,
- Sadism – you can have psychopaths who are no more sadistic than your average person. (Many may still act more sadistically on average because they don’t have prosocial emotions counterbalancing the low-grade sadistic impulses that you’d also find in lots of neurotypical people.)
- Lack of conscientiousness and “parasitic lifestyle” – some psychopaths have self-control and may even be highly conscientious. See the claims about how, in high-earning professions that reward ruthlessness (e.g., banking, some types of law, some types of management), a surprisingly high percentage of high-performers have psychopathic traits.
- Disinterested in altruism of any sort – some psychopaths may be genuinely into EA but may be tempted to implement it more SBF-style. (See also my comment here.)
- Obsessed with social standing/power/interpersonal competition. Some may focus on excelling at a non-social hobby (“competition with nature”) that provides excitement that would otherwise be missing in an emotionally dulled life (something about “shallow emotions” is IMO another central characteristic of psychopathy, though there’s probably more specificity to it). E.g., I wouldn’t be shocked if the free-solo climber Alex Honnold were on some kind of “psychopathy spectrum,” but I might be wrong (and if he was, I’d still remain a fan). I’m only speculating based on things like “they measured his amygdala/-activation and it was super small.”

Lukas_Gloor 4 Sep 2022 20:56 UTC
29 points
5
on: EA, Veganism and Negative Animal Utilitarianism
The following is the list of titles in the subsection “Animals” on Essays on Reducing Suffering written by Brian Tomasik. As you can see, Brian has written a lot about how wild animal habitats and biomass are affected by various things.

Food animals
Wild-animal suffering
Insects and other invertebrates
Entomophagy
Welfare biology
Numbers of wild animals
Humanity’s impact on wild-animal abundance
Crop cultivation
Fishing
Cattle grazing
Climate change
Food scraps
Wastewater

Lukas_Gloor 19 Nov 2023 2:54 UTC
28 points
12
in reply to: habryka’s comment on: Integrity in AI Governance and Advocacy
As you’d probably agree with, it’s plausible that Sutskever was able to convince the board about specific concerns based on his understanding of the technology (risk levels and timelines) or his day-to-day experience at OpenAI and direct interactions with Sam Altman. If that’s what happened, then it wouldn’t be fair that any EA-minded board members just acted in an ideologically-driven way. (Worth pointing out for people who don’t know this that Sutskever has no ties to EA; it just seems like he shares concerns about the dangers from AI.)
But let’s assume that it comes out that EA board members played a really significant role or were even thinking about something like this before Sutskever brought up concerns. “Play of power” evokes connotations of opportunism and there being no legitimacy for the decision other than that the board thought they could get away with it. This sort of concern you’re describing would worry me a whole lot more if OpenAI had a typical board and corporate structure.

However, since they have a legal structure and mission that emphasizes benefitting humanity as a whole and not shareholders, I’d say situations like the one here are (in theory) exactly why the board was set up that way. The board’s primary task is overseeing the CEO. To achieve OpenAI’s mission, the CEO needs to have the type of personality and thinking habits so he will likely converge toward whatever the best-informed views are about AI risks (and benefits) and how to mitigate (and actualize) them. The CEO shouldn’t be someone who is unlikely to engage in the sort of cognition that one would perform if one cared greatly about long-run outcomes rather than near-term status and took seriously the chance of being wrong about one’s AI risk and timeline assumptions. Regardless of what’s actually true about Altman, it seems like the board came to a negative conclusion about his suitability. In terms of how they made this update, we can envision some different scenarios, some of them would seem unfair to Altman and “ideology-driven” in a sinister way, while others would seem legitimate. (The following scenarios will take for granted that the thing that happened had elements of a “AI safety coup,” as opposed to a “Sutskever coup” or “something else entirely.” Again, I’m not saying that any of this is confirmed; I’m just going with the hypothesis where the EA involvement has the most potential for controversy.) So, here are three variants of how the board could have updated that Altman is not suitable for the mission:
(1) The responsible board members (could just be a subset of the ones that voted against Altman rather than all four of them) never gave him much of a chance. They learned that Altman is less concerned about AI notkilleveryoneism than they would’ve liked, so they took an opportunity to try to oust him. (This is bad because it’s ideology-driven rather than truth-seeking.)
(2) The responsible board members did give Altman a chance initially, but he deceived them in a smoking-gun-type breach of trust.
(3) The responsible board members did gave Altman a chance initially, but they became increasingly disillusioned through a more insincere-vibes-based and gradual erosion of trust, perhaps accompanied by disappointments from empty promises/assurances about, e.g., taking safety testing more seriously for future models, avoiding racing dynamics/avoiding giving out too much info on how to speed up AI through commercialization/rollouts, etc. (I’m only speculating here with the examples I’m giving, but the point is that if the board is unusually active about looking into stuff, it’s conceivable that they maybe-justifiably reached this sort of update even without any smoking-gun-type breach of trust.)

Needless to say, (1) would be very bad board behavior and would put EA in a bad light. (2) would be standard stuff about what boards are there for, but seems somewhat unlikely to have happened here based on the board not being able to easily give more info to the public about what Altman did wrong (as well as the impression I get that they don’t hold much leverage in the negotations now). (3) seems most likely to me and also quite complex to make judgments about the specifics, because lots of things can fall into (3). (3) requires an unusually “active/observant” board. This isn’t necessarily bad. I basically want to flag that I see lots of (3)-type scenarios where the board acted with integrity and courage, but also (admittedly) probably displayed some inexperience by not preparing for the power struggle that results after a decision like this, and by (possibly?) massively mishandling communications, using wording that may perfectly describe what happened when the description is taken literally, but is very misleading when we apply the norms about how parting ways announcements are normally written in very tactful corporate speak. (See also Eliezer’s comment here.) Alternatively, it’s also possible that a (3)-type scenario happened, but the specific incremental updates were uncharitable towards Altman due to being tempted by “staging a coup,” or stuff like that. It gets messy when you have to evaluate someone’s leadership fit where they have a bunch of uncontested talents but also some orange flags and you have decide what sort of strengths or weaknesses are most essential for the mission.

Lukas_Gloor 8 Oct 2023 11:56 UTC
28 points
0
in reply to: Joseph Miller’s comment on: Sam Altman’s sister, Annie Altman, says Sam has (severely) abused her
I know someone who recovered memories of repeated abuse including from the age of four later in their teenage years. The parents could corroborate a lot of circumstances around those memories, which suggests that they’re likely broadly accurate. For instance, things like “they told their mother about the abuse when they were four, and the mother remembered that this conversation happened.” Or “the parents spoke to the abuser and he basically admitted it.” There was also suicidal ideation at around age six (similarity to Annie’s story). In addition, the person remembers things like, when playing with children’s toy figures (human-like animals), they would not play with these toy figures like ordinary children and instead think about plots that involve bleeding between legs and sexual assault. (This is much more detailed than Annie’s story, but remembering panic attacks as the first memory and having them as a child at least seems like evidence that she was strongly affected by something that had happened.)
Note that the person in question recovered these memories alone years before having any therapy.
It’s probably easier to remember abuse (or for this to manifest itself in child behavior in detailed ways, like with the toy figures) when it’s repeated. I think there’s a bunch of interpersonal variation also with respect to how people react to trauma. According to selfdecode (a service like 23andme), the person has alleles that make them unusually resilient to trauma, and yet they still struggled with cPTSD symptoms and the memories weren’t always accessible.

Lukas_Gloor 13 Nov 2022 10:31 UTC
28 points
13
on: Noting an unsubstantiated communal belief about the FTX disaster
Sam has engaged with EA ideas early on and shown a deep understanding and even obsession with them long before it would have given him massive benefits to associate with EA. So, I think your point is almost certainly false, but it could’ve been true in a similar situation, and that’s really important to be aware of.
What links here?
- Lukas_Gloor's comment on On being compromised by Gavin (EA Forum; 7 Jan 2023 23:10 UTC; 3 points)

Lukas_Gloor 22 Nov 2023 2:55 UTC
26 points
13
on: OpenAI: Facts from a Weekend
One thing I’ve realized more in the last 24h:
- It looks like Sam Altman is using a bunch of “tricks” now trying to fight his way back into more influence over OpenAI. I’m not aware of anything I’d consider unethical (at least if one has good reasons to believe one has been unfairly attacked), but it’s still the sort of stuff that wouldn’t come naturally to a lot of people and wouldn’t feel fair to a lot of people (at least if there’s a strong possibility that the other side is acting in good faith too).
- Many OpenAI employees have large monetary incentives on the line and there’s levels of peer pressure that are off the charts, so we really can’t read too much into who tweeted how many hearts or signed the letter or whatever.
Maybe the extent of this was obvious to most others, but for me, while I was aware that this was going on, I feel like I underestimated the extent of it. One thing that put things into a different light for me is this tweet.
Which makes me wonder, could things really have gone down a lot differently? Sure, smoking-gun-type evidence would’ve helped the board immensely. But is it their fault that they don’t have it? Not necessarily. If they had (1) time pressure (for one reason or another – hard to know at this point) and (2) if they still had enough ‘soft’ evidence to justify drastic actions. With (1) and (2) together, it could have made sense to risk intervening even without smoking-gun-type evidence.
(2) might be a crux for some people, but I believe that there are situations where it’s legitimate for a group of people to become convinced that someone else is untrustworthy without being in a position to easily and quickly convince others. NDAs in play could be one reason, but also just “the evidence is of the sort that ‘you had to be there’” or “you need all this other context and individual data points only become compelling if you also know about all these other data points that together help rule out innocuous/charitable interpretations about what happened.”

In any case, many people highlighted the short notice with which the board announced their decision and commented that this implies that the board acted in an outrageous way and seems inexperienced. However, having seen what Altman managed to mobilize in just a couple of days, it’s now obvious that, if you think he’s scheming and deceptive in a genuinely bad way (as opposed to “someone knows how to fight power struggles and is willing to fight them when he feels like he’s undeservedly under attack” – which isn’t by itself a bad thing), then you simply can’t give him a headstart.
So, while I still think the board made mistakes, I today feel a bit less confident that these mistakes were necessarily as big as I initially thought. I now think it’s possible – but far from certain – that we’re in a world where things are playing out the way they have mostly because it’s a really though situation for the board to be in even when they are right. And, sure, that would’ve been a reason to consider not starting this whole thing, but obviously that’s very costly as well, so, again, tough situation.

I guess a big crux is “how common is it that you justifiably think someone is bad but it’ll be hard to convince others?” My stance is that, if you’re right, you should eventually be able to convince others if the others are interested in the truth and you get a bunch of time and the opportunity to talk to more people who may have extra info. But you might not be able to succeed if you only have a few days and then you’re out if you don’t sound convincing enough.

My opinions have been fluctuating a crazy amount recently (I don’t think I’ve ever been in a situation where my opinions have gone up and down like this!), so, idk, I may update quite a bit in the other direction again tomorrow.

Lukas_Gloor 9 Sep 2022 11:23 UTC
LW: 26 AF: 12
8
AF
on: The shard theory of human values
As you allude by discussing shards for cooperative tendencies, the Shard Theory approach seems relevant for intent alignment too, not just value alignment. (For value alignment, the relevance of humans as an example is “How did human values evolve despite natural selection optimizing for something different and more crude?” For intent alignment, the relevance is “How come some humans exhibit genuinely prosocial motivations and high integrity despite not sharing the exact same goals as others?”)

Studying the conditions for the evolution of genuinely prosocial motivations seems promising to me.
By “prosocial motivations,” I mean something like “trying to be helpful and cooperative” at least in situations where this is “low cost.” (In this sense, classical utilitarians with prosocial motivations are generally safe to be around even for those of us who don’t want to be replaced by hedonium.)
We can make some interesting observations on prosocial motivations in humans:
- Due to Elephant in the Brain issues, an aspiration to be prosocial isn’t always enough to generate prosociality as a virtue in the way that counts. Something like high metacognition + commitment to high integrity seem required as well.
- Not all people have genuinely prosocial motivations.
- People who differ from each other on prosocial motivations (and metacognition and integrity) seem to fall into “surprisingly” distinct clusters.
By the last bullet point, I mean that it seems plausible that we can learn a lot about someone’s character even in situations that are obviously “a test.” E.g., the best venture capitalists don’t often fall prey to charlatan founders. Paul Graham writes about his wife Jessica Livingston:
I’m better at some things than Jessica, and she’s better at some things than me. One of the things she’s best at is judging people. She’s one of those rare individuals with x-ray vision for character. She can see through any kind of faker almost immediately. Her nickname within YC was the Social Radar, and this special power of hers was critical in making YC what it is. The earlier you pick startups, the more you’re picking the founders. Later stage investors get to try products and look at growth numbers. At the stage where YC invests, there is often neither a product nor any numbers.
If Graham is correct about his wife’s ability, this means that people with “shady character” sometimes fail in test situations specifically due to their character – which is strange because you’d expect that the rational strategy in these situation is “act as though you had good character.”

In humans, “perfect psychopaths” arguably don’t exist. That is, people without genuinely prosocial motivations, even when they’re highly intelligent, don’t behave the same as genuinely prosocial people in 99.9% of situations while saving their deceitful actions for the most high-stakes situations. Instead, it seems likely that they can’t help but behave in subtly suspicious ways even in situations where they’re able to guess that judges are trying to assess their character.

From the perspective of Shard Theory’s approach, it seems interesting to ask “Why is this?”

My take (inspired by a lot of armchair psychology and – even worse – armchair evolutionary psychology – is the following:
- Asymmetric behavioral strategies: Even in “test situations” where the time and means for evaluation are limited (e.g., trial tasks followed by lengthy interviews), people can convey a lot of relevant information through speech. Honest strategies have some asymmetric benefits (“words aren’t cheap”). (The term “asymmetric behavioral strategies” is inspired by this comment on “asymmetric tools.”)
  - Pointing out others’ good qualities.
    People who consistently praise others for their good qualities, even in situations where this isn’t socially advantageous, credibly signal that they don’t apply a zero-sum mindset to social situations.
  - Making oneself transparent (includes sharing disfavorable information).
    People who consistently tell others why they behave in certain ways, make certain decisions, or hold specific views, present a clearer picture of themselves. Others can then check that picture for consistency. The more readily one shares information, the harder it would be to keep lies consistent. The habit of proactive transparency also sets up a precedent: it makes it harder to suddenly shift to intransparency later on, at one’s convenience.
    Pointing out one’s hidden negative qualities. One subcategory of “making oneself transparent” is when people disclose personal shortcomings even in situations where they would have been unlikely to otherwise come up. In doing so, they credibly signal that they don’t need to oversell themselves in order to gain others’ appreciation. The more openly someone discloses their imperfections, the more their honest intent and their genuine competencies will shine through.
  - Handling difficult interpersonal conversations on private, prosocial emotions.
    People who don’t shy away from difficult interpersonal conversations (e.g., owning up to one’s mistakes and trying to resolve conflicts) can display emotional depth and maturity as well as an ability to be vulnerable. Difficult interpersonal conversations thereby serve as a fairly reliable signal of someone’s moral character (especially in real-time without practice and rehearsing) because vulnerability is hard to fake for people who aren’t in touch with emotions like guilt and shame, or are incapable of feeling them. For instance, pathological narcissists tend to lack insight into their negative emotions, whereas psychopaths lack certain prosocial emotions entirely. If people with those traits nonetheless attempt to have difficult interpersonal conversations, they risk being unmasked. (Analogy: someone who lacks a sense of smell will be unmasked when talking about the intricacies of perfumery, even if they’ve done practicing for faking it.)
  - Any individual signal can be faked. A skilled manipulator will definitely go out of their way to fake prosocial signals or cleverly spin up ambiguities in how to interpret past events. To tell whether a person is manipulative, I recommend giving relatively little weight to single examples of their behavior and focus on the character qualities that show up the most consistently.
- Developmental constraints: The way evolution works, mind designs “cannot go back to the drawing board” – single mutations cannot alter too many things at once without badly messing up the resulting design.
  - For instance, manipulators get better at manipulating if they have a psychology of the sort (e.g.) “high approach seeking, low sensitivity to punishment.” Developmental constraint: People cannot alter their dispositions at will.
  - People who self-deceive become more credible liars. Developmental tradeoff: Once you self-deceive, you can no longer go back and “unroll” what you’ve done.
  - Some people’s emotions might have evolved to be credible signals, making people “irrationally” interpersonally vulnerable (e.g., disposition to be fearful and anxious) or “irrationally” affected by others’ discomfort (e.g., high affective empathy). Developmental constraint: Faking emotions you don’t have is challenging even for skilled manipulators.
  - Different niches / life history strategies: Deceptive strategies seem to be optimized for different niches (at least in some cases). For instance, I’ve found that we can tell a lot about the character of men by looking at their romantic preferences. (E.g., if someone seeks out shallow relationship after shallow relationship and doesn’t seem to want “more depth,” that can be a yellow flag. It becomes a red flag if they’re not honest about their motivations for the relationship and if they prefer to keep the connection shallow even though the other person would want more depth.)
- “No man’s land” in fitness gradients: In the ancestral environment, asymmetric tools + developmental constraints + inter-species selection pressure for character (neither too weak, nor too strong) produced fitness gradients that steer towards attractors of either high honesty vs high deceitfulness. From a fitness perspective, it sucks to “practice” both extremes of genuine honesty and dishonesty in the same phenotype because the strategies hone in on different sides of various developmental tradeoffs. (And there are enough poor judges of character so that dishonest phenotypes can mostly focus on niches where the attain high reward somewhat easily so they don’t have to constantly expose themselves to the highest selection pressures for getting unmasked.)
- Capabilities constraints (relative to the capabilities of competent judges): People who find themselves with the deceitful phenotype cannot bridge the gap and learn to act the exact same way a prosocial actor would act (but they can fool incompetent judges or competent judges who face time-constraints or information-constraints). This is a limitation of capabilities: it would be different if people were more skilled learners and had better control over their psychology.
In the context of training TAI systems, we could attempt to recreate these conditions and select for integrity and prosocial motivations. One difficulty here lies in recreating the right “developmental constraints” and in keeping a balance the relative capabilities between judges and to-be-evaluated agents. (Humans presumably went through an evolutionary arms race related to assessing each others’ competence and character, which means that people were always surrounded by judges of similar intelligence.)
Lastly, there’s a problem where, if you dial up capabilities too much, it becomes increasingly easier to “fake everything.” (For the reasons Ajeya explains in her account of deceptive alignment.)
(If anyone is interested in doing research on the evolution of prosocality vs antisocialness in humans and/or how these things might play out in AI training environments, I know people who would likely be interested in funding such work.)
What links here?

Lukas_Gloor 7 Jun 2022 9:27 UTC
26 points
15
in reply to: niplav’s comment on: AGI Safety FAQ / all-dumb-questions-allowed thread
You may get massive s-risk at comparatively little potential benefit with this. On many people’s values, the future you describe may not be particularly good anyway, and there’s an increased risk of something going wrong because you’d be trying a desperate effort with something you’d not fully understand.

Lukas_Gloor 23 Jul 2022 19:08 UTC
25 points
9
in reply to: Lone Pine’s comment on: Connor Leahy on Dying with Dignity, EleutherAI and Conjecture
I feel unsure about the merits of this for other contexts (because it can indeed create a toxic atmosphere), but I think there are specific contexts where scrutinizing someone’s decision-making algorithm seems particularly important:
- Somewhat unilateral pursuit of an activity with high downside risk
- Position of high influence without much accountability or legibility for outsiders to give useful criticism
Heading an alignment organization with strong information security where you have enough control so that it’s unusual compared to other organizations fulfils both criteria.

So, I’d say that not discussing the topic in contexts similar to this one would be a mistake.

Lukas_Gloor 21 Jul 2022 13:28 UTC
25 points
16
in reply to: MSRayne’s comment on: Sexual Abuse attitudes might be infohazardous
I agree with your point and you give a very convincing example. Still, I object quite strongly to the phrasing “[...] that things like this are almost entirely made traumatic by society’s socially constructed ideas about them.”

Why “almost entirely”?

Here some ingredients to my view:
- There are vast individual differences in neuroticism and ability to process potentially-traumatic events.
- Just like some phobias develop more commonly than others, some experiences are more likely to elicit trauma than others.
- On trauma reactions, there might be biological priors that make people more afraid of “bad agents” than “bad environment,” just like there are priors that children are less likely to learn to become scared of outlets/power stations and more likely to learn fear of snakes. (And maybe it’s not a “prior” per se but rather a logical fact that agents are less predictable than the environment and it makes logical sense to be more scared of unpredictable things.)
- Society’s take on what’s dangerous/scary/bad isn’t entirely random; instead, there’s often a wisdom it.
Now, looking at the example in the OP with the above background assumptions, I form the following view.
- The example has some features that make it more conducive to eliciting trauma than other experiences (i.e., there are good reasons why society has a different attitude to the experience in the OP compared to “being forced to look at the color orange as a child.”)
  - If I had to describe exactly why, I think it’s about powerlessness in being forced to a secluded room and touched against your will by a person who could beat you up or otherwise take revenge, and the combination with “this sort of thing may elicit shame in shame-prone people.” (We might be shame-prone around sexuality for biological reasons or for the sort of reasons why people are more likely to evolve fear of snakes than other phobias – it may not be entirely “hardcoded,” but there could still be “structural reasons” related to emotions around sex and the types of dynamics around it that typically emerge in societies.) Then, with trauma risk from “bad agent” often being worse than trauma risk from (just) “bad environment,” it seems natural that some people at least will find it scary to process an experience where they firsthand learn that there are others who lack concern for people and will do bad things to you if they catch you and think they can get away with it. This is an uncomfortable thought in itself, people like that shouldn’t exist in a just and safe world. (With the example “forced to see the color orange in a society where this is judged to be really bad” – my bet is that people in that society would be a lot less traumatized if they encountered an orange bird in the forest vs. if someone forcibly takes them inside a secluded room and puts a sheet of orange wallpaper in front of them!) Lastly, issues with disgust and disgust sensitivity probably play a role – being touched by someone who creeps you out or grosses you out can feel horrible for the same reason it doesn’t feel good if someone spits in your face – has very little to do with societal conventions because most cultures don’t condone spitting into someone’s face. Touch/contact, especially involving bodily fluids, is inherently more disgust-evoking than other potential disgust triggers.)
To conclude, I find myself roughly equally alienated by “all forms of sexually inappropriate touch are inherently trauma-causing” and “inappropriate sexual touch is primarily traumatic because of societal expectations.” Both lack important nuance. I might agree that sexually inappropriate touch is less likely to traumatize people than societal discourse would suggest (depending on what sort of discourse we have in mind), which means that the discourse might be doing some harm here while also having the benefit of making these experiences less likely to happen – so it seems complicated.

(Unrelated to the rest of my comment, but relevant to the topic: this video about an account of being molested by a priest.)

Lukas_Gloor

Food animals

Wild-animal suffering

Insects and other invertebrates

Entomophagy

Welfare biology

Numbers of wild animals

Humanity’s impact on wild-animal abundance

Crop cultivation

Fishing

Cattle grazing

Climate change

Food scraps

Wastewater