rvnnt

Karma: 177

rvnnt Feb 14, 2025, 1:55 PM
1 point
0
in reply to: Dagon’s comment on: rvnnt’s Shortform

It’s unclear whether there is a tipping point where [...]

Yes. Also unclear whether the 90% could coordinate to take any effective action, or whether any effective action would be available to them. (Might be hard to coordinate when AIs control/influence the information landscape; might be hard to rise up against e.g. robotic law enforcement or bioweapons.)

Don’t use passive voice for this. [...]

Good point! I guess one way to frame that would be as

by what kind of process do the humans in law enforcement, military, and intelligence agencies get replaced by AIs? Who/what is in effective control of those systems (or their successors) at various points in time?

And yeah, that seems very difficult to predict or reliably control. OTOH, if someone were to gain control of the AIs (possibly even copies of a single model?) that are running all the systems, that might make centralized control easier? </wild, probably-useless speculation>

rvnnt’s Shortform

rvnntFeb 13, 2025, 3:21 PM

3 points

3 comments LW link

rvnnt Feb 13, 2025, 3:21 PM
2 points
1
on: rvnnt’s Shortform
A potentially somewhat important thing which I haven’t seen discussed:
- People who have a lot of political power or own a lot of capital, are unlikely to be adversely affected if (say) 90% of human labor becomes obsolete and replaced by AI.
- In fact, so long as property rights are enforced, and humans retain a monopoly on decisionmaking/political power, such people are not-unlikely to benefit from the economic boost that such automation would bring.
- Decisions about AI policy are mostly determined by people with a lot of capital or political power. (E.g. Andreessen Horowitz, JD Vance, Trump, etc.)
(This looks like a decisionmaker is not the beneficiary -type of situation.)

Why does that matter?
- It has implications for modeling decisionmakers, interpreting their words, and for how to interact with them.^[1]
- If we are in a gradual-takeoff world^[2], then we should perhaps not be too surprised to see the wealthy and powerful push for AI-related policies that make them more wealthy and powerful, while a majority of humans become disempowered and starve to death (or live in destitution, or get put down with viruses or robotic armies, or whatever). (OTOH, I’m not sure if that possibility can be planned/prepared for, so maybe that’s irrelevant, actually?)
1. ↩︎
  For example: we maybe should not expect decisionmakers to take risks from AI seriously until they realize those risks include a high probability of “I, personally, will die”. As another example: when people like JD Vance output rhetoric like “[AI] is not going to replace human beings. It will never replace human beings”, we should perhaps not just infer that “Vance does not believe in AGI”, but instead also assign some probability to hypotheses like “Vance thinks AGI will in fact replace lots of human beings, just not him personally; and he maybe does not believe in ASI, or imagines he will be able to control ASI”.
2. ↩︎
  Here I’ll define “gradual takeoff” very loosely as “a world in which there is a >1 year window during which it is possible to replace >90% of human labor, before the first ASI comes into existence”.

rvnnt Feb 11, 2025, 6:29 PM
8 points
4
in reply to: garrison’s comment on: Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
Thank you for (being one of the horrifyingly few people) doing sane reporting on these crucially important topics.

rvnnt Feb 11, 2025, 11:52 AM
2 points
0
on: Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
Typo: “And humanity needs all the help we it can get.”

rvnnt Feb 10, 2025, 10:53 AM
2 points
1
in reply to: JuliaHP’s comment on: Altman blog on post-AGI world
Out of (1)-(3), I think (3)^[1] is clearly most probable:
- I think (2) would require Altman to be deeply un-strategic/un-agentic, which seems in stark conflict with all the skillful playing-of-power-games he has displayed.
- (3) seems strongly in-character with the kind of manipulative/deceitful maneuvering-into-power he has displayed thus far.
- I suppose (1) is plausible; but for that to be his only motive, he would have to be rather deeply un-strategic (which does not seem to be the case).
(Of course one could also come up with other possibilities besides (1)-(3).)^[2]
1. ↩︎
  or some combination of (1) and (3)
2. ↩︎
  E.g. maybe he plans to keep ASI to himself, but use it to implement all-of-humanity’s CEV, or something. OTOH, I think the kind of person who would do that, would not exhibit so much lying, manipulation, exacerbating-arms-races, and gambling-with-everyone’s-lives. Or maybe he doesn’t believe ASI will be particularly impactful; but that seems even less plausible.

rvnnt Jan 31, 2025, 2:24 PM
1 point
0
in reply to: Seth Herd’s comment on: Should you publish solutions to corrigibility?

Note that our light cone with zero value might also eclipse other light cones that might’ve had value if we didn’t let our AGI go rogue to avoid s-risk.

That’s a good thing to consider! However, taking Earth’s situation as a prior for other “cradles of intelligence”, I think that consideration returns to the question of “should we expect Earth’s lightcone to be better or worse than zero-value (conditional on corrigibility)?”

rvnnt Jan 31, 2025, 2:23 PM
1 point
0
in reply to: Charlie Steiner’s comment on: Should you publish solutions to corrigibility?
To me, those odds each seem optimistic by a factor of about 1000, but ~reasonable relative to each other.

(I don’t see any low-cost way to find out why we disagree so strongly, though. Moving on, I guess.)

But this isn’t any worse to me than being killed [...]

Makes sense (given your low odds for bad outcomes).

Do you also care about minds that are not you, though? Do you expect most future minds/persons that are brought into existence to have nice lives, if (say) Donald “Grab Them By The Pussy” Trump became god-emperor (and was the one deciding what persons/minds get to exist)?

rvnnt Jan 31, 2025, 2:16 PM
1 point
0
in reply to: Seth Herd’s comment on: Should you publish solutions to corrigibility?
IIUC, your model would (at least tentatively) predict that
- if person P has a lot of power over person Q,
- and P is not sadistic,
- and P is sufficiently secure/well-resourced that P doesn’t “need” to exploit Q,
- then P will not intentionally do anything that would be horrible for Q?
If so, how do you reconcile that with e.g. non-sadistic serial killers, rapists, or child abusers? Or non-sadistic narcissists in whose ideal world everyone else would be their worshipful subject/slave?

That last point also raises the question: Would you prefer the existence of lots of (either happily or grudgingly) submissive slaves over oblivion?

To me it seems that terrible outcomes do not require sadism. Seems sufficient that P be low in empathy, and want from Q something Q does not want to provide (like admiration, submission, sex, violent sport, or even just attention).^[1] I’m confused as to how/why you disagree.
1. ↩︎
  Also, AFAICT, about 0.5% to 8% of humans are sadistic, and about 8% to 16% have very little or zero empathy. How did you arrive at “99% of humanity [...] are not so sadistic”? Did you account for the fact that most people with sadistic inclinations probably try to hide those inclinations? (Like, if only 0.5% of people appear sadistic, then I’d expect the actual prevalence of sadism to be more like ~4%.)

rvnnt Jan 30, 2025, 2:36 PM
1 point
0
in reply to: Seth Herd’s comment on: Should you publish solutions to corrigibility?
It seems like you’re assuming people won’t build AGI if they don’t have reliable ways to control it, or else that sovereign (uncontrolled) AGI would be likely the be friendly to humanity.

I’m assuming neither. I agree with you that both seem (very) unlikely. ^[1]

It seems like you’re assuming that any humans succeeding in controlling AGI is (on expectation) preferable to extinction? If so, that seems like a crux: if I agreed with that, then I’d also agree with “publish all corrigibility results”.
1. ↩︎
  I expect that unaligned ASI would lead to extinction, and our share of the lightcone being devoid of value or disvalue. I’m quite uncertain, though.

rvnnt Jan 30, 2025, 2:30 PM
1 point
0
in reply to: Charlie Steiner’s comment on: Should you publish solutions to corrigibility?
It’s more important to defuse the bomb than it is to prevent someone you dislike from holding it.

I think there is a key disanalogy to the situation with AGI: The analogy would be stronger if the bomb was likely to kill everyone, but also had a some (perhaps very small) probability of conferring godlike power to whomever holds it. I.e., there is a tradeoff: decrease the probability of dying, at the expense of increasing the probability of S-risks from corrupt(ible) humans gaining godlike power.

If you agree that there exists that kind of tradeoff, I’m curious as to why you think it’s better to trade in the direction of decreasing probability-of-death for increased probability-of-suffering.

So, the question I’m most interested in is the one at the end of the post^[1], viz

What (crucial) considerations should one take into account, when deciding whether to publish—or with whom to privately share—various kinds of corrigibility-related results?
1. ↩︎
  Didn’t put it in the title, because I figured that’d be too long of a title.

rvnnt Jan 30, 2025, 11:54 AM
1 point
0
on: Should you publish solutions to corrigibility?
Taking a stab at answering my own question; an almost-certainly non-exhaustive list:
- Would the results be applicable to deep-learning-based AGIs?^[1] If I think not, how can I be confident they couldn’t be made applicable?
- Do the corrigibility results provide (indirect) insights into other aspects of engineering (rather than SGD’ing) AGIs?
- How much weight one gives to avoiding x-risks vs s-risks.^[2]
- Who actually needs to know of the results? Would sharing the results with the whole Internet lead to better outcomes than (e.g.) sharing the results with a smaller number of safety-conscious researchers? (What does the cost-benefit analysis look like? Did I even do one?)
- How optimistic (or pessimistic) one is about the common-good commitment (or corruptibility) of the people who one thinks might end up wielding corrigible AGIs.
1. ↩︎
  Something like the True Name of corrigibility might at first glance seem applicable only to AIs of whose internals we have some meaningful understanding or control.
2. ↩︎
  If corrigibility were easily feasible, then at first glance, that would seem to reduce the probability of extinction (via unaligned AI), but increase the probability of astronomical suffering (under god-emperor Altman/Ratcliffe/Xi/Putin/...).

[Question] Should you publish solutions to corrigibility?

rvnntJan 30, 2025, 11:52 AM

13 points

13 comments1 min readLW link

rvnnt Jan 24, 2025, 2:27 PM
1 point
0
in reply to: Algon’s comment on: What are the differences between AGI, transformative AI, and superintelligence?
I think the main value of that operationalization is enabling more concrete thinking/forecasting about how AI might progress. Models some of the relevant causal structure of reality, at a reasonable level of abstraction: not too nitty-gritty^[1], not too abstract^[2].
1. ↩︎
  which would lead to “losing the forest for the trees”, make the abstraction too effortful to use in practice, and/or risk making it irrelevant as soon as something changes in the world of AI
2. ↩︎
  e.g. a higher-level abstraction like “AI that speeds up AI development by a factor of N” might at first glance seem more useful. But as you and ryan noted, speed-of-AI-development depends on many factors, so that operationalization would be mixing together many distinct things, hiding relevant causal structures of reality, and making it difficult/confusing to think about AI development.

rvnnt Jan 23, 2025, 2:24 PM
3 points
0
on: What are the differences between AGI, transformative AI, and superintelligence?
I think this approach to thinking about AI capabilities is quite pertinent. Could be worth including “Nx AI R&D labor AIs” in the list?

rvnnt Jan 23, 2025, 1:57 PM
3 points
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
Cogent framing; thanks for writing it. I’d be very interested to read your framing for the problem of “how do we get to a good future for humanity, conditional on the first attractor state for AGI alignment?”^[1]
1. ↩︎
  Would you frame it as “the AGI lab leadership alignment problem”? Or a governance problem? Or something else?

rvnnt Jan 3, 2025, 3:33 PM
2 points
1
in reply to: Seth Herd’s comment on: Orienting to 3 year AGI timelines
Thanks for the answer. It’s nice to get data about how other people think about this subject.

the concern that the more sociopathic people wind up in positions of power is the big concern.

Agreed!

Do I understand correctly: You’d guess that
- 99% of humans have a “positive empathy-sadism balance”,
- and of those, 90-99% could be trusted to control the world (via controlling ASI),
- i.e., ~89-98% of humanity could be trusted to control the world with ASI-grade power?
If so, then I’m curious—and somewhat bewildered! -- as to how you arrived at those guesses/numbers.

I’m under the impression that narcissism and sadism have prevalences of very roughly 6% and 4%, respectively. See e.g. this post, or the studies cited therein. Additionally, probably something like 1% to 10% of people are psychopaths, depending on what criteria are used to define “psychopathy”. Even assuming there’s a lot of overlap, I think a reasonable guess would be that ~8% of humans have at least one of those traits. (Or 10%, if we include psychopathy.)

I’m guessing you disagree with those statistics? If yes, what other evidence leads you to your different (much lower) estimates?

Do you believe that someone with (sub-)clinical narcissism, if given the keys to the universe, would bring about good outcomes for all (with probability >90%)? Why/how? What about psychopaths?

Do you completely disagree with the aphorism that “power corrupts, and absolute power corrupts absolutely”?

Do you think that having good intentions (and +0 to +3 SD intelligence) is probably enough for someone to produce good outcomes, if they’re given ASI-grade power?

FWIW, my guesstimates are that
- over 50% of genpop would become corrupted by ASI-grade power, or are sadistic/narcissistic/psychopathic/spiteful to begin with,
- of the remainder, >50% would fuck things up astronomically, despite their good intentions^[1],
- genetic traits like psychopathy and narcissism (not sure about sadism), and acquired traits like cynicism, are much more prevalent (~5x odds?) in people who will end up in charge of AGI projects, relative to genpop. OTOH, competence at not-going-insane is likely higher among them too.
it would be so easy to benefit humanity, just by telling your slave AGI to go make it happen. A lot of people would enjoy being hailed as a benevolent hero

I note that if someone is using an AGI as a slave, and is motivated by wanting prestige status, then I do not expect that to end well for anyone else. (Someone with moderate power, e.g. a medieval king, with the drive to be hailed a benevolent hero, might indeed do great things for other people. But someone with more extreme power—like ASI-grade power—could just… rewire everyone’s brains; or create worlds full of suffering wretches, for him to save and be hailed/adored by; or… you get the idea.)
1. ↩︎
  Even relatively trivial things like social media or drugs mess lots of humans up; and things like “ability to make arbitrary modifications to your mind” or “ability to do anything you want, to anyone, with complete impunity” are even further OOD, and open up even more powerful superstimuli/reward-system hacks. Aside from tempting/corrupting humans to become selfish, I think that kind of situation has high potential to just lead to them going insane or breaking (e.g. start wireheading) in any number of ways.
  
  And then there are other failure modes, like insufficient moral uncertainty and locking in some parochial choice of values, or a set of values that made sense in some baseline human context but which generalize to something horrible. (“Obviously we should fill the universe with Democracy/Christianity/Islam/Hedonism/whatever!”, … “Oops, turns out Yahweh is pretty horrible, actually!”)

rvnnt Dec 27, 2024, 1:11 PM
1 point
0
in reply to: Seth Herd’s comment on: Orienting to 3 year AGI timelines
I’d be interested to see that draft as a post!

What fraction of humans in set X would you guess have a “positive empathy-sadism balance”, for
- X = all of humanity?
- X = people in control of (governmental) AGI projects?
I agree that the social environment / circumstances could have a large effect on whether someone ends up wielding power selfishly or benevolently. I wonder if there’s any way anyone concerned about x/s-risks could meaningfully affect those conditions.

I’m guessing^[1] I’m quite a bit more pessimistic than you about what fraction of humans would produce good outcomes if they controlled the world.
1. ↩︎
  with a lot of uncertainty, due to ignorance of your models.

rvnnt Dec 27, 2024, 1:10 PM
1 point
0
in reply to: Seth Herd’s comment on: Orienting to 3 year AGI timelines
I agree that “strengthening democracy” sounds nice, and also that it’s too vague to be actionable. Also, what exactly would be the causal chain from “stronger democracy” (whatever that means) to “command structure in the nationalized AGI project is trustworthy and robustly aligned to the common good”?

If you have any more concrete ideas in this domain, I’d be interested to read about them!

rvnnt Dec 27, 2024, 1:09 PM
2 points
0
in reply to: Seth Herd’s comment on: Orienting to 3 year AGI timelines
Pushing for nationalization or not might affect when it’s done, giving some modicum of control.

I notice that I have almost no concrete model of what that sentence means. A couple of salient questions^[1] I’d be very curious to hear answers to:
- What concrete ways exist for affecting when (and how) nationalization is done? (How, concretely, does one “push” for/against nationalization of AGI?)
- By what concrete causal mechanism could pushing for nationalization confer a modicum of control; and control over what exactly, and to whom?
1. ↩︎
  Other questions I wish I (or people advocating for any policy w.r.t. AGI) had answers to include (i.a.) “How could I/we/anyone ensure that the resulting AGI project actually benefits everyone? Who, in actual concrete practice, would end up effectively having control over the AGI? How could (e.g.) the public hold those people accountable, even as those people gain unassailable power? How do we ensure that those people are not malevolent to begin with, and also don’t become corrupted by power? What kinds of oversight mechanisms could be built, and how?”

rvnnt

rvnnt’s Shortform

[Question] Should you pub­lish solu­tions to cor­rigi­bil­ity?

[Question] Should you publish solutions to corrigibility?