steven0461

Karma: 8,760

Steven K

steven0461 Apr 9, 2023, 9:29 PM
2 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Anonymous #2 asks:

A footnote in ‘Planning for AGI and beyond’ says “Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination”—why do shorter timelines seem more amenable to coordination?

steven0461 Apr 8, 2023, 11:44 PM
3 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Anonymous #1 asks:
This one is not technical: now that we live in a world in which people have access to systems like ChatGPT, how should I consider any of my career choices, primarily in the context of a computer technician? I’m not a hard-worker, and I consider that my intelligence is just a little above average, so I’m not going to pretend that I’m going to become a systems analyst or software engineer, but now code programming and content creation are starting to be automated more and more, so how should I update my decisions based on that?
Sure, this question is something that most can ask about their intellectual jobs, but I would like to see answers from people in this community; and particularly about a field in which, more than most, employers are going to expect any technician to stay up-to-date with these tools.

steven0461 Apr 8, 2023, 10:54 PM
13 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Here’s a form you can use to send questions anonymously. I’ll check for responses and post them as comments.

steven0461 Apr 8, 2023, 9:23 PM
11 points
0
in reply to: Boris Kashirin’s comment on: All AGI Safety questions welcome (especially basic ones) [April 2023]
From 38:58 of the podcast:
So I do think that over time I have come to expect a bit more that things will hang around in a near human place and weird shit will happen as a result. And my failure review where I look back and ask — was that a predictable sort of mistake? I feel like it was to some extent maybe a case of — you’re always going to get capabilities in some order and it was much easier to visualize the endpoint where you have all the capabilities than where you have some of the capabilities. And therefore my visualizations were not dwelling enough on a space we’d predictably in retrospect have entered into later where things have some capabilities but not others and it’s weird. I do think that, in 2012, I would not have called that large language models were the way and the large language models are in some way more uncannily semi-human than what I would justly have predicted in 2012 knowing only what I knew then. But broadly speaking, yeah, I do feel like GPT-4 is already kind of hanging out for longer in a weird, near-human space than I was really visualizing. In part, that’s because it’s so incredibly hard to visualize or predict correctly in advance when it will happen, which is, in retrospect, a bias.

steven0461 Mar 10, 2023, 7:46 PM
30 points
3
on: Speed running everyone through the bad alignement bingo. $5k bounty for a LW conversational agent
trevor has already mentioned the Stampy project, which is trying to do something very similar to what’s described here and wishes to join forces.

Right now, Stampy just uses language models for semantic search, but the medium-term plan is to use them for text generation as well: people will be able to go to chat.stampy.ai or chat.aisafety.info, type in questions, and have a conversational agent respond. This would probably use a language model fine-tuned by the authors of Cyborgism (probably starting with a weak model as a trial, then increasingly strong ones as they become available), with primary fine-tuning on the alignment literature and hopefully secondary fine-tuning on Stampy content. A question asked in chat would be used to do an extractive search on the literature, then the results would be put into the LM’s context window and it would generate a response.

Stampy welcomes volunteer developers to help with building the conversational agent and a front end for it, as well as volunteers to help write content.
What links here?

steven0461 Feb 7, 2023, 4:16 PM
6 points
3
on: Taboo P(doom)
There’s another issue where “P(doom)” can be read either as the probability that a bad outcome will happen, or the probability that a bad outcome is inevitable. I think the former is usually what’s meant, but if “P(doom)” means “the probability that we’re doomed”, then that suggests the latter as a distracting alternative interpretation.

steven0461 Oct 1, 2022, 8:11 PM
8 points
0
on: Do anthropic considerations undercut the evolution anchor from the Bio Anchors report?
How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects

steven0461 Jul 20, 2022, 11:14 PM
11 points
1
in reply to: AnnaSalamon’s comment on: What should you change in response to an “emergency”? And AI risk

In terms of “and those people who care will be broad and varied and trying their hands at making movies and doing varied kinds of science and engineering research and learning all about the world while keeping their eyes open for clues about the AI risk conundrum, and being ready to act when a hopeful possibility comes up” we’re doing less well compared to my 2008 hopes. I want to know why and how to unblock it.

I think to the extent that people are failing to be interesting in all the ways you’d hoped they would be, it’s because being interesting in those ways seems to them to have greater costs than benefits. If you want people to see the benefits of being interesting as outweighing the costs, you should make arguments to help them improve their causal models of the costs, and to improve their causal models of the benefits, and to compare the latter to the former. (E.g., what’s the causal pathway by which an hour of thinking about Egyptology or repairing motorcycles or writing fanfic ends up having, not just positive expected usefulness, but higher expected usefulness at the margin than an hour of thinking about AI risk?) But you haven’t seemed very interested in explicitly building out this kind of argument, and I don’t understand why that isn’t at the top of your list of strategies to try.

steven0461 Jul 13, 2022, 4:35 PM
3 points
1
in reply to: amarai’s comment on: How could the universe be infinitely large?
As far as I know, this is the standard position. See also this FAQ entry. A lot of people sloppily say “the universe” when they mean the observable part of the universe, and that’s what’s causing the confusion.

steven0461 Jul 12, 2022, 9:34 PM
4 points
9
in reply to: AnnaSalamon’s comment on: Slowing down AI progress is an underexplored alignment strategy

I have also talked with folks who’ve thought a lot about safety and who honestly think that existential risk is lower if we have AI soon (before humanity can harm itself in other ways), for example.

It seems hard to make the numbers come out that way. E.g. suppose human-level AGI in 2030 would cause a 60% chance of existential disaster and a 40% chance of existential disaster becoming impossible, and human-level AGI in 2050 would cause a 50% chance of existential disaster and a 50% chance of existential disaster becoming impossible. Then to be indifferent about AI timelines, conditional on human-level AGI in 2050, you’d have to expect a ¹⁄₅ probability of existential disaster from other causes in the 2030-2050 period. (That way, with human-level AGI in 2050, you’d have a ¹⁄₂ * ⁴⁄₅ = 40% chance of surviving, just like with human-level AGI in 2030.) I don’t really know of non-AI risks in the ballpark of 10% per decade.

(My guess at MIRI people’s model is more like 99% chance of existential disaster from human-level AGI in 2030 and 90% in 2050, in which case indifference would require a 90% chance of some other existential disaster in 2030-2050, to cut 10% chance of survival down to 1%.)

steven0461 Jul 1, 2022, 4:56 PM
7 points
3
on: Safetywashing
“Safewashing” would be more directly parallel to “greenwashing” and sounds less awkward to my ears than “safetywashing”, but on the other hand the relevant ideas are more often called “AI safety” than “safe AI”, so I’m not sure if it’s a better or worse term.

steven0461 Jun 3, 2022, 7:57 PM
35 points
7
in reply to: TekhneMakre’s comment on: Intergenerational trauma impeding cooperative existential safety efforts
Yes, my experience of “nobody listened 20 years ago when the case for caring about AI risk was already overwhelmingly strong and urgent” doesn’t put strong bounds on how much I should anticipate that people will care about AI risk in the future, and this is important; but it puts stronger bounds on how much I should anticipate that people will care about counterintuitive aspects of AI risk that haven’t yet undergone a slow process of climbing in mainstream respectability, even if the case for caring about those aspects is overwhelmingly strong and urgent (except insofar as LessWrong culture has instilled a general appreciation for things that have overwhelmingly strong and urgent cases for caring about them), and this is also important.

steven0461 May 13, 2022, 7:30 PM
3 points
1
on: “Tech company singularities”, and steering them to reduce x-risk
1. after a tech company singularity,
I think this was meant to read “2. after AGI,”

steven0461 May 11, 2022, 10:14 PM
4 points
0
in reply to: CallumMcDougall’s comment on: What are your recommendations for technical AI alignment podcasts?
Note that the full 2021 MIRI conversations are also available (in robot voice) in the Nonlinear Library archive.

steven0461 May 11, 2022, 10:06 PM
5 points
0
on: What are your recommendations for technical AI alignment podcasts?
- AXRP (the AI X-risk research podcast)
- The Alignment Newsletter Podcast
- Many entries on the Nonlinear Library
- Maybe Towards Data Science
edit: also FLI’s AI alignment podcast

steven0461 May 3, 2022, 9:37 PM
4 points
0
in reply to: Daniel Kokotajlo’s comment on: [Linkpost] New multi-modal Deepmind model fusing Chinchilla with images and videos
Some relevant Altman tweets: 1, 2, 3

steven0461 Apr 30, 2022, 10:12 PM
9 points
0
in reply to: AnnaSalamon’s comment on: Salvage Epistemology
As I see it, “rationalist” already refers to a person who thinks rationality is particularly important, not necessarily a person who is rational, like how “libertarian” refers to a person who thinks freedom is particularly important, not necessarily a person who is free. Then literally speaking “aspiring rationalist” refers to a person who aspires to think rationality is particularly important, not to a person who aspires to be rational. Using “aspiring rationalist” to refer to people who aspire to attain rationality encourages people to misinterpret self-identified rationalists as claiming to have attained rationality. Saying something like “person who aspires to rationality” instead of “aspiring rationalist” is a little more awkward, but it respects the literal meaning of words, and I think that’s important.

steven0461 Apr 24, 2022, 8:34 PM
5 points
0
on: Replicating and extending the grabby aliens model
Great report. I found the high decision-worthiness vignette especially interesting.

I haven’t read it closely yet, so people should feel free to be like “just read the report more closely and the answers are in there”, but here are some confusions and questions that have been on my mind when trying to understand these things:

Has anyone thought about this in terms of a “consequence indication assumption” that’s like the self-indication assumption but normalizes by the probability of producing paths from selves to cared-about consequences instead of the probability of producing selves? Maybe this is discussed in the anthropic decision theory sequence and I should just catch up on that?

I wonder how uncertainty about the cosmological future would affect grabby aliens conclusions. In particular, I think not very long ago it was thought plausible that the affectable universe is unbounded, in which case there could be worlds where aliens were almost arbitrarily rare that still had high decision-worthiness. (Faster than light travel seems like it would have similar implications.)

SIA and SSA mean something different now than when Bostrom originally defined them, right? Modern SIA is Bostrom’s SIA+SSA and modern SSA is Bostrom’s (not SIA)+SSA? Joe Carlsmith talked about this, but it would be good if there were a short comment somewhere that just explained the change of definition, so people can link it whenever it comes up in the future. (edit: ah, just noticed footnote 13)

SIA doomsday is a very different thing than the regular doomsday argument, despite the name, right? The former is about being unlikely to colonize the universe, the latter is about being unlikely to have a high number of observers? A strong great filter that lies in our future seems like it would require enough revisions to our world model to make SIA doom basically a variant of the simulation argument, i.e. the best explanation of our ability to colonize the stars not being real would be the stars themselves not being real. Many other weird hypotheses seem like they’d become more likely than the naive world view under SIA doom reasoning. E.g., maybe there are 10^50 human civilizations on Earth, but they’re all out of phase and can’t affect each other, but they can still see the same sun and stars. Anyway, I guess this problem doesn’t turn up in the “high decision-worthiness” or “consequence indication assumption” formulation.

steven0461 Apr 8, 2022, 10:00 PM
8 points
0
in reply to: Xodarap’s comment on: It’s time for EA leadership to pull the fast-takeoff fire alarm.
My impression (based on using Metaculus a lot) is that, while questions like this may give you a reasonable ballpark estimate and it’s great that they exist, they’re nowhere close to being efficient enough for it to mean much when they fail to move. As a proxy for the amount of mental effort that goes into it, there’s only been three comments on the linked question in the last month. I’ve been complaining about people calling Metaculus a “prediction market” because if people think it’s a prediction market then they’ll assume there’s a point to be made like “if you can tell that the prediction is inefficient, then why aren’t you rich, at least in play money?” But the estimate you’re seeing is just a recency-weighted median of the predictions of everyone who presses the button, not weighted by past predictive record, and not weighted by willingness-to-bet, because there’s no buying or selling and everyone makes only one prediction. It’s basically a poll of people who are trying to get good results (in terms of Brier/log score and Metaculus points) on their answers.

steven0461 Apr 6, 2022, 10:16 PM
2 points
0
in reply to: Jackson Wagner’s comment on: What Twitter fixes should we advocate, now that Elon is on the board?
Metaculus (unlike Manifold) is not a market and does not use play money except in the same sense that Tetris score is play money.