WillPetillo

Karma: 72

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI

WillPetillo4 Dec 2023 22:58 UTC

35 points

0 comments35 min readLW link

WillPetillo 15 Jan 2024 3:44 UTC
12 points
11
on: The Leeroy Jenkins principle: How faulty AI could guarantee “warning shots”
One more objection to the model: AI labs apply just enough safety measures to prevent dumb rogue AIs. Fearing a public backlash to low-level catastrophes, AI companies test their models, checking for safety vulnerabilities, rogue behaviors, and potential for misuse. The easiest to catch problems, however, are also the least dangerous, so only the most cautious, intelligent, and dangerous rogue AI’s pass the security checks. Further, this correlation continues indefinitely, so all additional safety work contributes towards filtering the population of malevolent AIs towards the most dangerous. AI companies are not interested in adhering to the standard of theoretical, “provably safe” models, as they are trying to get away with the bare minimum, so the filter never catches everything. While “warning shots” appear all the time in experimental settings, these findings are suppressed or downplayed in public statements and the media, and the public only sees the highly sanitized result of the filtration process. Eventually, the security systems fail, but by this point AI has been developed past the threshold needed to become catastrophically dangerous.

What if Alignment is Not Enough?

WillPetillo7 Mar 2024 8:10 UTC

11 points

23 comments9 min readLW link

Unity Gridworlds

WillPetillo15 Oct 2023 4:36 UTC

9 points

0 comments1 min readLW link

WillPetillo 10 Nov 2023 1:04 UTC
7 points
2
on: Sam Altman’s sister, Annie Altman, says Sam has (severely) abused her
I’d like to add some nuance to the “innocent until proven guilty” assumption in the concluding remarks.

Standard of evidence is a major question in legal matters and heavily context-dependent. “Innocent until proven guilty” is a popular understanding of the standard for criminal guilt and it makes sense for that to be “beyond a reasonable doubt” because the question at hand is whether a state founded on principles of liberty should take away the freedom of one of its citizens. Other legal disputes, such as in civil liability, have different standards of evidence, including “more likely than not” and “clear and convincing.”

What standard we should apply here is an open question, which ultimately depends on what decisions we are trying to make. In this case, those questions seem to be: “can we trust Sam Altman’s moral character to make high-stakes decisions?” and perhaps “(how much) should we signal-boost Annie’s claims?”. On the one hand, the “beyond a reasonable doubt” standard of criminal guilt seems far too high. On the other hand, instant condemnation without any consideration (as in, not even looking at the claims in any detail) seems too low.

Note that this question of standards is entirely separate from considerations of priors, base rates, and the like. All of those things matter, but they are questions of whether the standards are met. Without a clear understanding of what those standards even are, it’s easy to get lost. I don’t have a strong answer to this myself, but I encourage readers and anyone following up on this to consider:

1. What, if anything, am I actually trying to decide here?
2. How certain do I need to be in order to make those decisions?

WillPetillo 20 Aug 2023 7:39 UTC
3 points
1
on: Memetic Judo #1: On Doomsday Prophets v.2.2
#7: (Scientific) Doomsday Track Records Aren’t That Bad

Historically, the vast majority of doomsday claims are based on religious beliefs, whereas only a small minority have been supported by a large fraction of relevant subject matter experts. If we consider only the latter, we find:

A) Malthusian crisis: false...but not really a doomsday prediction per se.
B) Hole in the ozone layer: true, but averted because of global cooperation in response to early warnings.
C) Climate change: probably true if we did absolutely nothing; probably mostly averted because of moderate, distributed efforts to mitigate (i.e. high investment in alternative energy sources and modest coordination).
D) Nuclear war: true, but averted because of global cooperation, with several terrifying near-misses...and could still happen.

This is not an exhaustive list as I am operating entirely from memory, but I am including everything I can think of and not deliberately cherry-picking examples—in fact, part of the reason I included (A) was to err on the side of stretching to include counter-examples. Also, the interpretations obviously contain a fair bit of subjectivity / lack of rigor. Nonetheless, in this informal survey, we see a clear pattern where, more often than not, doomsday scenarios that are supported by many leading relevant experts depict actual threats to human existence and the reason we are still around is because of active global efforts to prevent these threats from being realized.

Given all of the above counterarguments (especially #6), there is strong reason to categorize x-risk from AI alongside major environmental and nuclear threats. We should therefore assume by default that it is real and will only be averted if there is an active global effort to prevent it from being realized.

WillPetillo 14 Mar 2024 2:43 UTC
2 points
1
in reply to: flandry39’s comment on: What if Alignment is Not Enough?
To be clear, the sole reason I assumed (initial) alignment in this post is because if there is an unaligned ASI then we probably all die for reasons that don’t require SNC (though SNC might have a role in the specifics of how the really bad outcome plays out). So “aligned” here basically means: powerful enough to be called an ASI and won’t kill everyone if SNC is false (and not controlled/misused by bad actors, etc.)

> And the artificiality itself is the problem.

This sounds like a pretty central point that I did not explore very much except for some intuitive statements at the end (the bulk of the post summarizing the “fundamental limits of control” argument), I’d be interested in hearing more about this. I think I get (and hopefully roughly conveyed) the idea that AI has different needs from its environment than humans, so if it optimizes the environment in service of those needs we die...but I get the sense that there is something deeper intended here.

A question along this line, please ignore if it is a distraction from rather than illustrative of the above: would anything like SNC apply if tech labs were somehow using bioengineering to create creatures to perform the kinds of tasks that would be done by advanced AI?

WillPetillo 12 Mar 2024 0:09 UTC
2 points
1
in reply to: Prometheus’s comment on: What if Alignment is Not Enough?
Bringing this back to the original point regarding whether an ASI that doesn’t want to kill humans but reasons that SNC is true would shut itself down, I think a key piece of context is the stage of deployment it is operating in. For example, if the ASI has already been deployed across the world, has gotten deep into the work of its task, has noticed that some of its parts have started to act in ways that are problematic to its original goals, and then calculated that any efforts at control are destined to fail, it may well be too late—the process of shutting itself down may even accelerate SNC by creating a context where components that are harder to shut down for whatever reason (including active resistance) have an immediate survival advantage. On the other hand, an ASI that has just finished (or is in the process of) pre-training and is entirely contained within a lab has a lot fewer unintended consequences to deal with—its shutdown process may be limited to convincing its operators that building ASI is a really bad idea. A weird grey area is if, in the latter case, the ASI further wants to ensure no further ASIs are built (pivotal act) and so needs to be deployed at a large scale to achieve this goal.

Another unstated assumption in this entire line of reasoning is that the ASI is using something equivalent to consequentialist reasoning and I am not sure how much of a given this is, even in the context of ASI.

WillPetillo 9 Mar 2024 9:04 UTC
2 points
0
in reply to: Prometheus’s comment on: What if Alignment is Not Enough?
The implication here being that, if SNC (substrate needs convergence) is true, then an ASI (assuming it is aligned) will figure this out and shut itself down?

WillPetillo 19 Sep 2023 6:20 UTC
2 points
0
in reply to: Max TK’s comment on: Memetic Judo #1: On Doomsday Prophets v.2.2
Just saw this, sure!

WillPetillo 13 Mar 2024 23:16 UTC
1 point
0
in reply to: Nathan Helm-Burger’s comment on: What if Alignment is Not Enough?
This sounds like a rejection of premise 5, not 1 & 2. The latter asserts that control issues are present at all (and 3 & 4 assert relevance), whereas the former asserts that the magnitude of these issues is great enough to kick off a process of accumulating problems. You are correct that the rest of the argument, including the conclusion, does not hold if this premise is false.

Your objection seems to be to point to the analogy of humans maintaining effective control of complex systems, with errors limiting rather than compounding, with the further assertion that a greater intelligence will be even better at such management.

Besides intelligence, there are two other core points of difference between humans managing existing complex systems and ASI:

1) The scope of the systems being managed. Implicit in what I have read of SNC is that ASI is shaping the course of world events.
2) ASI’s lack of inherent reliance on the biological world.

These points raise the following questions:
1) Do systems of control get better or worse as they increase in scope of impact and where does this trajectory point for ASI?
2) To what extent are humans’ ability to control our created systems reliant on us being a part of and dependent upon the natural world?

This second question probably sounds a little weird, so let me unpack the associated intuitions, albeit at the risk of straying from the actual assertions of SNC. Technology that is adaptive becomes obligate, meaning that once it exists everyone has to use it to not get left behind by those who use it. Using a given technology shapes the environment and also promotes certain behavior patterns, which in turn shape values and worldview. These tendencies together can sometimes result in feedback loops resulting in outcomes that everyone, including the creators of the technology, don’t like. In really bad cases, this can lead to self-terminating catastrophes (in local areas historically, now with the potential to be on global scales). Noticing and anticipating this pattern, however, leads to countervailing forces that push us to think more holistically than we otherwise would (either directly through extra planning or indirectly through customs of forgotten purpose). For an AI to fall into such a trap, however, means the death of humanity, not itself, so this countervailing force is not present.

WillPetillo 9 Mar 2024 9:12 UTC
1 point
0
in reply to: Seth Herd’s comment on: What if Alignment is Not Enough?
This counts as disagreeing with some of the premises—which ones in particular?

Re “incompetent superintelligence”: denotationally yes, connotationally no. Yes in the sense that its competence is insufficient to keep the consequences of its actions within the bounds of its initial values. No in the sense that the purported reason for this failing is that such a task is categorically impossible, which cannot be solved with better resource allocation.

To be clear, I am summarizing arguments made elsewhere, which do not posit infinite time passing, or timescales so long as to not matter.

WillPetillo

In­ter­view with Vanessa Kosoy on the Value of The­o­ret­i­cal Re­search for AI

What if Align­ment is Not Enough?

Unity Gridworlds

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI

What if Alignment is Not Enough?