steven0461

Karma: 8,757

Steven K

Announcing AISafety.info’s Write-a-thon (June 16-18) and Second Distillation Fellowship (July 3-October 2)

steven0461Jun 3, 2023, 2:03 AM

33 points

1 comment2 min readLW link

All AGI Safety questions welcome (especially basic ones) [May 2023]

steven0461May 8, 2023, 10:30 PM

33 points

44 comments2 min readLW link

steven0461 May 8, 2023, 10:02 PM
4 points
in reply to: Vaniver’s comment on: Vaniver’s Shortform
Stampy’s AI Safety Info is a little like that in that it has 1) pre-written answers, 2) a chatbot under very active development, and 3) a link to a Discord with people who are often willing to explain things. But it could probably be more like that in some ways, e.g. if more people who were willing to explain things were habitually in the Discord.

Also, I plan to post the new monthly basic AI safety questions open thread today (edit: here), which is also a little like that.

steven0461 Apr 12, 2023, 8:56 PM
2 points
0
in reply to: Linch’s comment on: All AGI Safety questions welcome (especially basic ones) [April 2023]
I tried to answer this here

steven0461 Apr 12, 2023, 1:11 AM
3 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Anonymous #7 asks:

I am familiar with the concept of a utility function, which assigns numbers to possible world states and considers larger numbers to be better. However, I am unsure how to apply this function in order to make decisions that take time into account. For example, we may be able to achieve a world with higher utility over a longer period of time, or a world with lower utility but in a shorter amount of time.

steven0461 Apr 11, 2023, 2:56 AM
2 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Anonymous #6 asks:

Why hasn’t an alien superintelligence within our light cone already killed us?

steven0461 Apr 9, 2023, 10:29 PM
3 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Anonymous #5 asks:

How can programers build something and dont understand inner workings of it? Are they closer to biologists-cross-breeders than to car designers?

steven0461 Apr 9, 2023, 10:29 PM
5 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Anonymous #4 asks:

How large space of possible minds? How its size was calculated? Why is EY thinks that human-like minds are not fill most of this space? What are the evidence for it? What are the possible evidence against “giant Mind Design Space and human-like minds are tiny dot there”?

steven0461 Apr 9, 2023, 10:27 PM
5 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Anonymous #3 asks:

Can AIs be anything but utility maximisers? Most of the existing programs are something like finite-steps-executors (like Witcher 3 and calculator). So what’s the difference?

steven0461 Apr 9, 2023, 10:12 PM
3 points
0
in reply to: steven0461’s comment on: All AGI Safety questions welcome (especially basic ones) [April 2023]
I don’t know why they think so, but here are some people speculating.

steven0461 Apr 9, 2023, 9:29 PM
2 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Anonymous #2 asks:

A footnote in ‘Planning for AGI and beyond’ says “Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination”—why do shorter timelines seem more amenable to coordination?

steven0461 Apr 8, 2023, 11:44 PM
3 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Anonymous #1 asks:
This one is not technical: now that we live in a world in which people have access to systems like ChatGPT, how should I consider any of my career choices, primarily in the context of a computer technician? I’m not a hard-worker, and I consider that my intelligence is just a little above average, so I’m not going to pretend that I’m going to become a systems analyst or software engineer, but now code programming and content creation are starting to be automated more and more, so how should I update my decisions based on that?
Sure, this question is something that most can ask about their intellectual jobs, but I would like to see answers from people in this community; and particularly about a field in which, more than most, employers are going to expect any technician to stay up-to-date with these tools.

steven0461 Apr 8, 2023, 10:54 PM
13 points
0
on: All AGI Safety questions welcome (especially basic ones) [April 2023]
Here’s a form you can use to send questions anonymously. I’ll check for responses and post them as comments.

steven0461 Apr 8, 2023, 9:23 PM
11 points
0
in reply to: Boris Kashirin’s comment on: All AGI Safety questions welcome (especially basic ones) [April 2023]
From 38:58 of the podcast:
So I do think that over time I have come to expect a bit more that things will hang around in a near human place and weird shit will happen as a result. And my failure review where I look back and ask — was that a predictable sort of mistake? I feel like it was to some extent maybe a case of — you’re always going to get capabilities in some order and it was much easier to visualize the endpoint where you have all the capabilities than where you have some of the capabilities. And therefore my visualizations were not dwelling enough on a space we’d predictably in retrospect have entered into later where things have some capabilities but not others and it’s weird. I do think that, in 2012, I would not have called that large language models were the way and the large language models are in some way more uncannily semi-human than what I would justly have predicted in 2012 knowing only what I knew then. But broadly speaking, yeah, I do feel like GPT-4 is already kind of hanging out for longer in a weird, near-human space than I was really visualizing. In part, that’s because it’s so incredibly hard to visualize or predict correctly in advance when it will happen, which is, in retrospect, a bias.

All AGI Safety questions welcome (especially basic ones) [April 2023]

steven0461Apr 8, 2023, 4:21 AM

57 points

89 comments2 min readLW link

steven0461 Mar 10, 2023, 7:46 PM
30 points
3
on: Speed running everyone through the bad alignement bingo. $5k bounty for a LW conversational agent
trevor has already mentioned the Stampy project, which is trying to do something very similar to what’s described here and wishes to join forces.

Right now, Stampy just uses language models for semantic search, but the medium-term plan is to use them for text generation as well: people will be able to go to chat.stampy.ai or chat.aisafety.info, type in questions, and have a conversational agent respond. This would probably use a language model fine-tuned by the authors of Cyborgism (probably starting with a weak model as a trial, then increasingly strong ones as they become available), with primary fine-tuning on the alignment literature and hopefully secondary fine-tuning on Stampy content. A question asked in chat would be used to do an extractive search on the literature, then the results would be put into the LM’s context window and it would generate a response.

Stampy welcomes volunteer developers to help with building the conversational agent and a front end for it, as well as volunteers to help write content.
What links here?

steven0461 7 Feb 2023 16:16 UTC
6 points
3
on: Taboo P(doom)
There’s another issue where “P(doom)” can be read either as the probability that a bad outcome will happen, or the probability that a bad outcome is inevitable. I think the former is usually what’s meant, but if “P(doom)” means “the probability that we’re doomed”, then that suggests the latter as a distracting alternative interpretation.

steven0461 1 Oct 2022 20:11 UTC
8 points
0
on: Do anthropic considerations undercut the evolution anchor from the Bio Anchors report?
How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects

steven0461 20 Jul 2022 23:14 UTC
11 points
1
in reply to: AnnaSalamon’s comment on: What should you change in response to an “emergency”? And AI risk

In terms of “and those people who care will be broad and varied and trying their hands at making movies and doing varied kinds of science and engineering research and learning all about the world while keeping their eyes open for clues about the AI risk conundrum, and being ready to act when a hopeful possibility comes up” we’re doing less well compared to my 2008 hopes. I want to know why and how to unblock it.

I think to the extent that people are failing to be interesting in all the ways you’d hoped they would be, it’s because being interesting in those ways seems to them to have greater costs than benefits. If you want people to see the benefits of being interesting as outweighing the costs, you should make arguments to help them improve their causal models of the costs, and to improve their causal models of the benefits, and to compare the latter to the former. (E.g., what’s the causal pathway by which an hour of thinking about Egyptology or repairing motorcycles or writing fanfic ends up having, not just positive expected usefulness, but higher expected usefulness at the margin than an hour of thinking about AI risk?) But you haven’t seemed very interested in explicitly building out this kind of argument, and I don’t understand why that isn’t at the top of your list of strategies to try.

steven0461 13 Jul 2022 16:35 UTC
3 points
1
in reply to: amarai’s comment on: How could the universe be infinitely large?
As far as I know, this is the standard position. See also this FAQ entry. A lot of people sloppily say “the universe” when they mean the observable part of the universe, and that’s what’s causing the confusion.

steven0461

An­nounc­ing AISafety.info’s Write-a-thon (June 16-18) and Se­cond Distil­la­tion Fel­low­ship (July 3-Oc­to­ber 2)

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [May 2023]

All AGI Safety ques­tions wel­come (es­pe­cially ba­sic ones) [April 2023]

Announcing AISafety.info’s Write-a-thon (June 16-18) and Second Distillation Fellowship (July 3-October 2)

All AGI Safety questions welcome (especially basic ones) [May 2023]

All AGI Safety questions welcome (especially basic ones) [April 2023]