Krieger

Karma: 35

Signaling Guilt

Krieger8 Oct 2022 20:40 UTC

20 points

6 comments1 min readLW link

[Question] Any further work on AI Safety Success Stories?

Krieger2 Oct 2022 9:53 UTC

8 points

6 comments1 min readLW link

[Question] How do people do remote research collaborations effectively?

Krieger16 Nov 2022 11:51 UTC

8 points

0 comments1 min readLW link

Krieger 26 Sep 2022 7:15 UTC
4 points
2
on: LW Petrov Day 2022 (Monday, 9/26)
well crap, that was fast. does anyone know what karma threshold the button was pressed at?

[Question] Broken Links for the Audio Version of 2021 MIRI Conversations

Krieger8 Oct 2022 16:16 UTC

1 point

1 comment1 min readLW link

Krieger 16 Nov 2022 11:36 UTC
1 point
0
in reply to: Martín Soto’s comment on: Vanessa Kosoy’s PreDCA, distilled
AGI might not literally search over all possible policies, but just employ some heuristics to get a good approximation of the best policy. But then this is a capabilities short-coming, not misalignment

...

Coming back to our scenario, if our model just finds an approximate best policy, it would seem very unlikely that this policy consistently brings about some misaligned goal
In my model this isn’t a capabilities failure, because there are demons in imperfect search; what you would get out of a heuristic-search-to-approximate-the-best-policy wouldn’t only be something close to the global optimum, but something that has also been optimized by whatever demons (don’t even have to be “optimizers” necessarily) that emerged through the selection pressures.
Maybe I’m still misunderstanding PreDCA and it somehow rules out this possibility, but afaik it only seems to do so in the limit of perfect search.

Krieger 14 Nov 2022 1:27 UTC
1 point
0
on: Vanessa Kosoy’s PreDCA, distilled
After having chosen a utility function to maximize, how would it maximize? I’m thinking that the search/planning process for finding good policies naturally introduce mesa-optimizers, regardless of everything that came before in the PreDCA (detecting precursors and extrapolating their utility function).

Krieger 5 Oct 2022 3:50 UTC
1 point
0
on: my current outlook on AI risk mitigation
It seems like the AI risk mitigation solutions you’ve listed aren’t mutually exclusive, but we’ll likely have to use a combination of them to succeed. While I agree that it would be ideal for us to end up with a FAS, the pathway towards the outcome would likely involve “sponge coordination” and “pivotal acts” as mechanisms by which our civilization can buy some time before FAS.
A possible scenario in a world where FAS takes some time (chronological):
1. “sponge coordination” with the help of various AI policy work
2. one cause-aligned lab executes a “pivotal act” with their (not Friendly FAS, but likely corrigible) AI and ends the period of vulnerability from humanity being a sponge; now the vulnerability lies on the human operators running this Pivotal Act AI.
3. FAS takeover, happy ending!
Of course we wouldn’t need all of this if FAS happens to be the first capable AGI to be developed, (kinda unlikely in my model). I would like to know what scenarios you think are most likely to happen (or should aim towards), or if I overlooked any other pathways. (also relevant)