MichaelA

Karma: 826

Failures in technology forecasting? A reply to Ord and Yudkowsky

MichaelA8 May 2020 12:41 UTC

44 points

19 comments11 min readLW link

Using vector fields to visualise preferences and make them consistent

MichaelA and JustinShovelain

28 Jan 2020 19:44 UTC

41 points

32 comments11 min readLW link

What are information hazards?

MichaelA18 Feb 2020 19:34 UTC

41 points

15 comments4 min readLW link

Feature suggestion: Could we get notifications when someone links to our posts?

MichaelA5 Mar 2020 8:06 UTC

36 points

4 comments1 min readLW link

Morality vs related concepts

MichaelA7 Jan 2020 10:47 UTC

26 points

17 comments8 min readLW link

Moral uncertainty vs related concepts

MichaelA11 Jan 2020 10:03 UTC

26 points

13 comments16 min readLW link

Notes on Schelling’s “Strategy of Conflict” (1960)

MichaelA10 Feb 2021 2:48 UTC

25 points

8 comments8 min readLW link

Survey on intermediate goals in AI governance

MichaelA and MaxRa

17 Mar 2023 13:12 UTC

25 points

3 comments1 min readLW link

Database of existential risk estimates

MichaelA20 Apr 2020 1:08 UTC

24 points

3 comments5 min readLW link

[Link and commentary] The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse?

MichaelA16 Feb 2020 19:56 UTC

24 points

4 comments3 min readLW link

Mapping downside risks and information hazards

MichaelA, JustinShovelain and David_Kristoffersson

20 Feb 2020 14:46 UTC

22 points

0 comments9 min readLW link

Memetic downside risks: How ideas can evolve and cause harm

MichaelA, JustinShovelain and algekalipso

25 Feb 2020 19:47 UTC

21 points

3 comments15 min readLW link

Value uncertainty

MichaelA29 Jan 2020 20:16 UTC

20 points

3 comments14 min readLW link

Making decisions under moral uncertainty

MichaelA30 Dec 2019 1:49 UTC

20 points

26 comments17 min readLW link

MichaelA 2 Feb 2020 19:34 UTC
19 points
on: A point of clarification on infohazard terminology
I agree that it’s valuable to note that information hazards can sometimes hurt the person who gets the information themselves. And I agree that Bostrom’s sense of information hazards is definitely broader than just that, so if people are using infohazards to mean just information that harms the person who knows it specifically, then clearing up their confusion seems good.
But I don’t know if memetic hazards is a great term for that, because it seems most natural to use the label “memetic hazards” for a superset of information hazards, not a subset. “Memes” are ideas or units of culture, of which true information is just one type. So it seems most natural to use the term “memetic hazards” for something like “harms that result from ideas” (or perhaps “ideas that spread”, or “ideas that evolve”), rather than just from true information, and rather than just harms for the knower (or just for the holder of the idea).
I think the fact that memetic hazards is already used in some places the way you propose using it is one reason to accept the term anyway. But I’m not sure it’s a strong enough reason, given 1) how unintuitive the term seems to be for what we want it to capture, and 2) the fact that the term seems like it is intuitive for a separate concept that would also be worth talking about (so perhaps we should hesitate to use up the term for something else). And it seems somewhat hard to come up with alternative terms for that separate concept—in particular, “idea hazards” is already used in a different way by Bostrom, so that’s not a good candidate.
In fact, “meme hazards” has already been used in roughly the way I suggest above, and I’m currently helping revamp the ideas in the post that use that, and was hoping to use the term “memetic hazards” for that purpose. (And this was going to be published this week, ironically enough—we’ve been scooped!) We did notice that the term memetic hazards was already used in the way you suggest, but thought that that use was sufficiently non-mainstream and sufficiently non-intuitive that it might make sense to stick with our proposed usage.
I don’t have great ideas for an alternative term for the concept you wish to point to, but perhaps something in the direction of “knower-harming infohazards”, “self-affecting infohazards”, or “internalised infohazards”?

Information hazards: Why you should care and what you can do

MichaelA, JustinShovelain, David_Kristoffersson and algekalipso

23 Feb 2020 20:47 UTC

18 points

4 comments15 min readLW link

Good and bad ways to think about downside risks

MichaelA and JustinShovelain

11 Jun 2020 1:38 UTC

18 points

12 comments11 min readLW link

Notes on Henrich’s “The WEIRDest People in the World” (2020)

MichaelA14 Feb 2021 8:40 UTC

17 points

0 comments8 min readLW link

Announcing the Nuclear Risk Forecasting Tournament

MichaelA16 Jun 2021 16:16 UTC

16 points

2 comments2 min readLW link

MichaelA 28 Mar 2020 7:42 UTC
16 points
on: MichaelA’s Shortform
Collection of discussions of key cruxes related to AI safety/alignment
These are works that highlight disagreements, cruxes, debates, assumptions, etc. about the importance of AI safety/alignment, about which risks are most likely, about which strategies to prioritise, etc.
I’ve also included some works that attempt to clearly lay out a particular view in a way that could be particularly helpful for others trying to see where the cruxes are, even if the work itself don’t spend much time addressing alternative views. I’m not sure precisely where to draw the boundaries in order to make this collection maximally useful.
These are ordered from most to least recent.
I’ve put in bold those works that very subjectively seem to me especially worth reading.
General, or focused on technical work
Ben Garfinkel on scrutinising classic AI risk arguments − 80,000 Hours, 2020
Critical Review of ‘The Precipice’: A Reassessment of the Risks of AI and Pandemics—James Fodor, 2020; this received pushback from Rohin Shah, which resulted in a comment thread worth adding here in its own right
Fireside Chat: AI governance—Ben Garfinkel & Markus Anderljung, 2020
My personal cruxes for working on AI safety—Buck Shlegeris, 2020
What can the principal-agent literature tell us about AI risk? - Alexis Carlier & Tom Davidson, 2020
Beyond Near- and Long-Term: Towards a Clearer Account of Research Priorities in AI Ethics and Society—Carina Prunkl & Jess Whittlestone, 2020 (commentary here)
Interviews with Paul Christiano, Rohin Shah, Adam Gleave, and Robin Hanson—AI Impacts, 2019 (summaries and commentary here and here)
Brief summary of key disagreements in AI Risk—iarwain, 2019
A list of good heuristics that the case for AI x-risk fails—capybaralet, 2019
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More − 2019
Clarifying some key hypotheses in AI alignment—Ben Cottier & Rohin Shah, 2019
A shift in arguments for AI risk—Tom Sittler, 2019 (summary and discussion here)
The Main Sources of AI Risk? - Wei Dai & Daniel Kokotajlo, 2019
Current Work in AI Alignment—Paul Christiano, 2019 (key graph can be seen at 21:05)
What failure looks like—Paul Christiano, 2019 (critiques here and here; counter-critiques here; commentary here)
Disentangling arguments for the importance of AI safety—Richard Ngo, 2019
Reframing superintelligence—Eric Drexler, 2019 (I haven’t yet read this; maybe it should be in bold)
Prosaic AI alignment—Paul Christiano, 2018
How sure are we about this AI stuff? - Ben Garfinkel, 2018 (it’s been a while since I watched this; maybe it should be in bold)
AI Governance: A Research Agenda—Allan Dafoe, 2018
Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”—Kaj Sotala, 2018 (full paper here)
A model I use when making plans to reduce AI x-risk—Ben Pace, 2018
Interview series on risks from AI—Alexander Kruel (XiXiDu), 2011 (or 2011 onwards?)
Focused on takeoff speed/discontinuity/FOOM specifically
Discontinuous progress in history: an update—Katja Grace, 2020 (also some more comments here)
My current framework for thinking about AGI timelines (and the subsequent posts in the series) - zhukeepa, 2020
What are the best arguments that AGI is on the horizon? - various authors, 2020
The AI Timelines Scam—jessicat, 2019 (I also recommend reading Scott Alexander’s comment there)
Double Cruxing the AI Foom debate—agilecaveman, 2018
Quick Nate/Eliezer comments on discontinuity − 2018
Arguments about fast takeoff—Paul Christiano, 2018
Likelihood of discontinuous progress around the development of AGI—AI Impacts, 2018
The Hanson-Yudkowsky AI-Foom Debate—various works from 2008-2013
Focused on governance/strategy work
My Updating Thoughts on AI policy—Ben Pace, 2020
Some cruxes on impactful alternatives to AI policy work—Richard Ngo, 2018
Somewhat less relevant
A small portion of the answers here − 2020
I intend to add to this list over time. If you know of other relevant work, please mention it in a comment.
What links here?