eg

Karma: 5

New evidence on popular perception of “AI” risk

eg7 Nov 2019 23:36 UTC

12 points

0 comments1 min readLW link

eg 4 Aug 2021 12:12 UTC
6 points
in reply to: gwern’s comment on: What does GPT-3 understand? Symbol grounding and Chinese rooms
Instruction 5 is supererogatory, while instruction 8 is not.

eg 4 Aug 2021 12:17 UTC
2 points
in reply to: Measure’s comment on: What does GPT-3 understand? Symbol grounding and Chinese rooms
I agree and was going to make the same point: GPT-3 has 0 reason to care about instructions as presented here. There has to be some relationship to what text follows immediately after the end of the prompt.

eg 4 Aug 2021 12:35 UTC
4 points
on: What does GPT-3 understand? Symbol grounding and Chinese rooms
Two big issues I see with the prompt:

a) It doesn’t actually end with text that follows the instructions; a “good” output (which GPT-3 fails in this case) would just be to list more instructions.

b) It doesn’t make sense to try to get GPT-3 to talk about itself in the completion. GPT-3 would, to the extent it understands the instructions, be talking about whoever it thinks wrote the prompt.

eg 4 Aug 2021 12:53 UTC
1 point
on: LCDT, A Myopic Decision Theory
Seems potentially valuable as an additional layer of capability control to buy time for further control research. I suspect LCDT won’t hold once intelligence reaches some threshold: some sense of agents, even if indirect, is such a natural thing to learn about the world.

[Question] Why not more small, intense research teams?

eg5 Aug 2021 11:57 UTC

19 points

9 comments1 min readLW link

eg 5 Aug 2021 15:45 UTC
3 points
in reply to: adamShimi’s comment on: LCDT, A Myopic Decision Theory
For a start, low-level deterministic reasoning:

”Obviously I could never influence an agent, but I found some inputs to deterministic biological neural nets that would make things I want happen.”

″Obviously I could never influence my future self, but if I change a few logic gates in this processor, it would make things I want happen.”

eg 5 Aug 2021 15:53 UTC
3 points
in reply to: eg’s comment on: LCDT, A Myopic Decision Theory
Probabilistic/inductive reasoning from past/simulated data (possibly assumes imperfect implementation of LCDT):

”This is really weird because obviously I could never influence an agent, but when past/simulated agents that look a lot like me did X, humans did Y in 90% of cases, so I guess the EV of doing X is 0.9 * utility(Y).”

Cf. smart humans in Newcomb’s prob: “This is really weird but if I one box I get the million, if I two-box I don’t, so I guess I’ll just one box.”

eg 5 Aug 2021 16:49 UTC
3 points
on: Two AI-risk-related game design ideas
D&D website estimates 13.7m active players and rising.

Great Negotiation MOOC on Coursera

eg9 Aug 2021 12:23 UTC

16 points

5 comments1 min readLW link

eg 16 Aug 2021 13:02 UTC
1 point
on: Simultaneous Redundant Research
And for more conceptual rather than empirical research, the teams might go in completely different directions and generate insights that a single team or individual would not.

Simultaneous Redundant Research

eg17 Aug 2021 12:17 UTC

4 points

1 comment1 min readLW link

eg 28 Mar 2022 0:05 UTC
1 point
in reply to: tailcalled’s comment on: Alignment via rational selfish cooperation
Maybe that’s why abstract approaches to real-world alignment seem so intractable.

If real alignment is necessarily messy, concrete, and changing, then abstract formality just wasn’t the right problem framing to begin with.

eg 3 Apr 2022 14:38 UTC
1 point
in reply to: MondSemmel’s comment on: It’s not prudent to exploit the weak if friendly aliens are watching
Maybe true one-shot prisoner’s dilemmas aren’t really a thing, because of the chance of encountering powerful friendliness.

We have, for practical purposes, an existence proof of powerful friendliness in humans.

eg 3 Apr 2022 15:07 UTC
5 points
on: AI Governance across Slow/Fast Takeoff and Easy/Hard Alignment spectra
It seems like time to start focusing resources on a portfolio of serious prosaic alignment approaches, as well as effective interdisciplinary management. In my inside view, the highest-marginal-impact interventions involve making multiple different things go right simultaneously for the first AGIs, which is not trivial, and the stakes are astronomical.

Little clear progress has been made on provable alignment after over a decade of trying. My inside view is that it got privileged attention because the first people to take the problem seriously happened to be highly abstract thinkers. Then they defined the scope and expectations of the field, alienating other perspectives and creating a self-reinforcing trapped prior.

eg 5 Apr 2022 22:04 UTC
2 points
on: What Would A Fight Between Humanity And AGI Look Like?
It goes both ways. We would be truly alien to an AGI trained in a reasonably different virtual environment.

eg 5 Apr 2022 23:47 UTC
1 point
in reply to: Dagon’s comment on: What Would A Fight Between Humanity And AGI Look Like?
That’s because we haven’t been trying to create safely different virtual environments. I don’t know how hard they are to make, but it seems like at least a scalable use of funding.

eg 8 Apr 2022 17:46 UTC
25 points
on: It’s time for EA leadership to pull the fast-takeoff fire alarm.
It’s way too late for the kind of top-down capabilities regulation Yudkowsky and Bostrom fantasized about; Earth just doesn’t have the global infrastructure. I see no benefit to public alarm—EA already has plenty of funding.
We achieve marginal impact by figuring out concrete prosaic plans for friendly AI and doing outreach to leading AI labs/researchers about them. Make the plans obviously good ideas and they will probably be persuasive. Push for common-knowledge windfall agreements so that upside is shared and race dynamics are minimized.

eg 9 Apr 2022 18:37 UTC
0 points
on: Strategies for keeping AIs narrow in the short term
[removed]

eg 10 Apr 2022 12:37 UTC
1 point
in reply to: Pablo Villalobos’s comment on: Outline of training scheme for friendly AGI
We have unpredictable changing goals and so will they. Instrumental convergence is the point. It’s positive-sum and winning to respectfully share our growth with them and vice-versa, so it is instrumentally convergent to do so.