Tobias_Baumann

Karma: 145

New book on s-risks

Tobias_Baumann28 Oct 2022 9:36 UTC

70 points

1 comment1 min readLW link

Tobias_Baumann 18 Mar 2019 12:17 UTC
13 points
0
on: More realistic tales of doom
I agree with you that the “stereotyped image of AI catastrophe” is not what failure will most likely look like, and it’s great to see more discussion of alternative scenarios. But why exactly should we expect that the problems you describe will be exacerbated in a future with powerful AI, compared to the state of contemporary human societies? Humans also often optimise for what’s easy to measure, especially in organisations. Is the concern that current ML systems are unable to optimise hard-to-measure goals, or goals that are hard to represent in a computerised form? That is true but I think of this as a limitation of contemporary ML approaches rather than a fundamental property of advanced AI. With general intelligence, it should also be possible to optimise goals that are hard-to-measure.

Similarly, humans / companies / organisations regularly exhibit influence-seeking behaviour, and this can cause harm but it’s also usually possible to keep it in check to at least a certain degree.

So, while you point at things that can plausibly go wrong, I’d say that these are perennial issues that may become better or worse during and after the transition to advanced AI, and it’s hard to predict what will happen. Of course, this does not make a very appealing tale of doom – but maybe it would be best to dispense with tales of doom altogether.

I’m also not yet convinced that “these capture the most important dynamics of catastrophe.” Specifically, I think the following are also potentially serious issues:
- Unfortunate circumstances in future cooperation problems between AI systems (and / or humans) result in widespread defection, leading to poor outcomes for everyone.
- Conflicts between key future actors (AI or human) result in large quantities of disvalue (agential s-risks).
- New technology leads to radical value drift of a form that we wouldn’t endorse.

Tobias_Baumann 25 Feb 2019 10:29 UTC
5 points
0
in reply to: Rob Bensinger’s comment on: Thoughts on Human Models
Thanks for elaborating. There seem to be two different ideas:
1), that it is a promising strategy to try and constrain early AGI capabilities and knowledge
2), that even without such constraints, a paperclipper entails a smaller risk of worst-case outcomes with large amounts of disvalue, compared to a near miss. (Brian Tomasik has also written about this.)
1) is very plausible, perhaps even obvious, though as you say it’s not clear how feasible this will be. I’m not convinced of 2), even though I’ve heard / read many people expressing this idea. I think it’s unclear what would result in more disvalue in expectation. For instance, a paperclipper would have no qualms to threaten other actors (with something that we would consider disvalue), while a near-miss might still have, depending on what exactly the failure mode is. In terms of incidental suffering, it’s true that a near-miss is more likely to do something about human minds, but again it’s also possible the system is, despite the failure, still compassionate enough to refrain from this, or use digital anesthesia. (It all depends on what plausible failure modes look like, and that’s very hard to say.)

Tobias_Baumann 23 Feb 2019 20:42 UTC
12 points
0
on: Thoughts on Human Models
Another risk from bugs comes not from the AGI system caring incorrectly about our values, but from having inadequate security. If our values are accurately encoded in an AGI system that cares about satisfying them, they become a target for threats from other actors who can gain from manipulating the first system.
I agree that this is a serious risk, but I wouldn’t categorise it as a “risk from bugs”. Every actor with goals faces the possibility that other actors may attempt to gain bargaining leverage by threatening to deliberately thwart these goals. So this does not require bugs; rather, the problem arises by default for any actor (human or AI), and I think there’s no obvious solution. (I’ve written about surrogate goals as a possible solution for at least some parts of the problem).
the very worst outcomes seem more likely if the system was trained using human modelling because these worst outcomes depend on the information in human models.
What about the possibility that the AGI system threatens others, rather than being threatened itself? Prima facie, that might also lead to worst-case outcomes. Do you envision a system that’s not trained using human modelling and therefore just wouldn’t know enough about human minds to make any effective threats? I’m not sure how an AI system can meaningfully be said to have “human-level general intelligence” and yet be completely inept in this regard. (Also, if you have such fine-grained control over what your system does or does not know about, or if you can have it do very powerful things without possessing dangerous kinds of knowledge and abilities, then I think many commonly discussed AI safety problems become non-issues anyway, as you can just constrain the system acccordingly.)

Tobias_Baumann 8 Jan 2019 17:05 UTC
8 points
0
on: Reframing Superintelligence: Comprehensive AI Services as General Intelligence
Upvoted. I’ve long thought that Drexler’s work is a valuable contribution to the debate that hasn’t received enough attention so far, so it’s great to see that this has now been published.
I am very sympathetic to the main thrust of the argument – questioning the implicit assumption that powerful AI will come in the shape of one or more unified agents that optimise the outside world according to their goals. However, given our cluelessness and the vast range of possible scenarios (e.g. ems, strong forms of biological enhancement, merging of biological and artificial intelligence, brain-computer interfaces, etc.), I find it hard to justify a very high degree of confidence in Drexler’s model in particular.

Why I expect successful (narrow) alignment

Tobias_Baumann29 Dec 2018 15:44 UTC

8 points

12 comments1 min readLW link

(s-risks.org)

Thoughts on short timelines

Tobias_Baumann23 Oct 2018 15:54 UTC

6 points

11 comments1 min readLW link

(s-risks.org)

Mechanism Design for AI

Tobias_Baumann18 Jul 2018 16:47 UTC

5 points

3 comments1 min readLW link

(s-risks.org)

What does the stock market tell us about AI timelines?

Tobias_Baumann12 Jul 2018 6:05 UTC

6 points

5 comments1 min readLW link

(s-risks.org)

An introduction to worst-case AI safety

Tobias_Baumann5 Jul 2018 16:09 UTC

14 points

2 comments1 min readLW link

(s-risks.org)

Tobias_Baumann 29 Jun 2018 19:57 UTC
5 points
0
on: Shaping economic incentives for collaborative AGI
I agree that establishing a cooperative mindset in the AI / ML community is very important. I’m less sure if economic incentives or government policy are a realistic way to get there. Can you think of a precedent or example for such external incentives in other areas?
Also, collaboration between the researchers that develop AI may be just one piece of the puzzle. You could still get military arms races between nations even if most researchers are collaborative. If there are several AI systems, then we also need to ensure cooperation between these AIs, which isn’t necessarily the same as cooperation between the researchers that build them.

A framework for thinking about AI timescales

Tobias_Baumann29 Mar 2018 9:29 UTC

7 points

0 comments1 min readLW link

(s-risks.org)

Is Evidential Decision Theory presumptuous?

Tobias_Baumann2 Feb 2017 13:41 UTC

5 points

39 comments1 min readLW link

Tobias_Baumann 2 Feb 2017 9:44 UTC
LW: 1 AF: 1
0
AF
in reply to: Johannes Treutlein’s comment on: Did EDT get it right all along? Introducing yet another medical Newcomb problem
What exactly do you think we need to specify in the Smoking Lesion?