Steven K
steven0461
Announcing AISafety.info’s Write-a-thon (June 16-18) and Second Distillation Fellowship (July 3-October 2)
All AGI Safety questions welcome (especially basic ones) [May 2023]
I tried to answer this here
Anonymous #7 asks:
I am familiar with the concept of a utility function, which assigns numbers to possible world states and considers larger numbers to be better. However, I am unsure how to apply this function in order to make decisions that take time into account. For example, we may be able to achieve a world with higher utility over a longer period of time, or a world with lower utility but in a shorter amount of time.
Anonymous #6 asks:
Why hasn’t an alien superintelligence within our light cone already killed us?
Anonymous #5 asks:
How can programers build something and dont understand inner workings of it? Are they closer to biologists-cross-breeders than to car designers?
Anonymous #4 asks:
How large space of possible minds? How its size was calculated? Why is EY thinks that human-like minds are not fill most of this space? What are the evidence for it? What are the possible evidence against “giant Mind Design Space and human-like minds are tiny dot there”?
Anonymous #3 asks:
Can AIs be anything but utility maximisers? Most of the existing programs are something like finite-steps-executors (like Witcher 3 and calculator). So what’s the difference?
I don’t know why they think so, but here are some people speculating.
Anonymous #2 asks:
A footnote in ‘Planning for AGI and beyond’ says “Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination”—why do shorter timelines seem more amenable to coordination?
Anonymous #1 asks:
This one is not technical: now that we live in a world in which people have access to systems like ChatGPT, how should I consider any of my career choices, primarily in the context of a computer technician? I’m not a hard-worker, and I consider that my intelligence is just a little above average, so I’m not going to pretend that I’m going to become a systems analyst or software engineer, but now code programming and content creation are starting to be automated more and more, so how should I update my decisions based on that?
Sure, this question is something that most can ask about their intellectual jobs, but I would like to see answers from people in this community; and particularly about a field in which, more than most, employers are going to expect any technician to stay up-to-date with these tools.
Here’s a form you can use to send questions anonymously. I’ll check for responses and post them as comments.
From 38:58 of the podcast:
So I do think that over time I have come to expect a bit more that things will hang around in a near human place and weird shit will happen as a result. And my failure review where I look back and ask — was that a predictable sort of mistake? I feel like it was to some extent maybe a case of — you’re always going to get capabilities in some order and it was much easier to visualize the endpoint where you have all the capabilities than where you have some of the capabilities. And therefore my visualizations were not dwelling enough on a space we’d predictably in retrospect have entered into later where things have some capabilities but not others and it’s weird. I do think that, in 2012, I would not have called that large language models were the way and the large language models are in some way more uncannily semi-human than what I would justly have predicted in 2012 knowing only what I knew then. But broadly speaking, yeah, I do feel like GPT-4 is already kind of hanging out for longer in a weird, near-human space than I was really visualizing. In part, that’s because it’s so incredibly hard to visualize or predict correctly in advance when it will happen, which is, in retrospect, a bias.
All AGI Safety questions welcome (especially basic ones) [April 2023]
trevor has already mentioned the Stampy project, which is trying to do something very similar to what’s described here and wishes to join forces.
Right now, Stampy just uses language models for semantic search, but the medium-term plan is to use them for text generation as well: people will be able to go to chat.stampy.ai or chat.aisafety.info, type in questions, and have a conversational agent respond. This would probably use a language model fine-tuned by the authors of Cyborgism (probably starting with a weak model as a trial, then increasingly strong ones as they become available), with primary fine-tuning on the alignment literature and hopefully secondary fine-tuning on Stampy content. A question asked in chat would be used to do an extractive search on the literature, then the results would be put into the LM’s context window and it would generate a response.
Stampy welcomes volunteer developers to help with building the conversational agent and a front end for it, as well as volunteers to help write content.
- All AGI Safety questions welcome (especially basic ones) [April 2023] by Apr 8, 2023, 4:21 AM; 111 points) (EA Forum;
- All AGI Safety questions welcome (especially basic ones) [April 2023] by Apr 8, 2023, 4:21 AM; 57 points) (
- All AGI Safety questions welcome (especially basic ones) [July 2023] by Jul 20, 2023, 8:20 PM; 38 points) (
- All AGI Safety questions welcome (especially basic ones) [May 2023] by May 8, 2023, 10:30 PM; 33 points) (
- All AGI Safety questions welcome (especially basic ones) [May 2023] by May 8, 2023, 10:30 PM; 19 points) (EA Forum;
- All AGI Safety questions welcome (especially basic ones) [July 2023] by Jul 19, 2023, 6:08 PM; 12 points) (EA Forum;
- Aug 16, 2023, 11:55 PM; 4 points) 's comment on Stampy’s AI Safety Info—New Distillations #4 [July 2023] by (
- May 8, 2023, 10:02 PM; 4 points) 's comment on Vaniver’s Shortform by (
There’s another issue where “P(doom)” can be read either as the probability that a bad outcome will happen, or the probability that a bad outcome is inevitable. I think the former is usually what’s meant, but if “P(doom)” means “the probability that we’re doomed”, then that suggests the latter as a distracting alternative interpretation.
In terms of “and those people who care will be broad and varied and trying their hands at making movies and doing varied kinds of science and engineering research and learning all about the world while keeping their eyes open for clues about the AI risk conundrum, and being ready to act when a hopeful possibility comes up” we’re doing less well compared to my 2008 hopes. I want to know why and how to unblock it.
I think to the extent that people are failing to be interesting in all the ways you’d hoped they would be, it’s because being interesting in those ways seems to them to have greater costs than benefits. If you want people to see the benefits of being interesting as outweighing the costs, you should make arguments to help them improve their causal models of the costs, and to improve their causal models of the benefits, and to compare the latter to the former. (E.g., what’s the causal pathway by which an hour of thinking about Egyptology or repairing motorcycles or writing fanfic ends up having, not just positive expected usefulness, but higher expected usefulness at the margin than an hour of thinking about AI risk?) But you haven’t seemed very interested in explicitly building out this kind of argument, and I don’t understand why that isn’t at the top of your list of strategies to try.
As far as I know, this is the standard position. See also this FAQ entry. A lot of people sloppily say “the universe” when they mean the observable part of the universe, and that’s what’s causing the confusion.
Stampy’s AI Safety Info is a little like that in that it has 1) pre-written answers, 2) a chatbot under very active development, and 3) a link to a Discord with people who are often willing to explain things. But it could probably be more like that in some ways, e.g. if more people who were willing to explain things were habitually in the Discord.
Also, I plan to post the new monthly basic AI safety questions open thread today (edit: here), which is also a little like that.