abergal

Karma: 755

Request for proposals for projects in AI alignment that work with deep learning systems

abergal and Nick_Beckstead

29 Oct 2021 7:26 UTC

87 points

0 comments5 min readLW link

Long-Term Future Fund: April 2023 grant recommendations

abergal, calebp99, Linch, habryka, Thomas Larsen and Vaniver

2 Aug 2023 7:54 UTC

81 points

3 comments50 min readLW link

abergal 12 May 2021 0:17 UTC
65 points
on: MIRI location optimization (and related topics) discussion
I feel pretty bad about both of your current top two choices (Bellingham or Peekskill) because they seem too far from major cities. I worry this distance will seriously hamper your ability to hire good people, which is arguably the most important thing MIRI needs to be able to do. [Speaking personally, not on behalf of Open Philanthropy.]

Open Philanthropy is seeking proposals for outreach projects

abergal and ClaireZabel

16 Jul 2021 21:19 UTC

61 points

2 comments10 min readLW link

Interpretability

abergal and Nick_Beckstead

29 Oct 2021 7:28 UTC

60 points

13 comments12 min readLW link

Provide feedback on Open Philanthropy’s AI alignment RFP

abergal and Nick_Beckstead

20 Aug 2021 19:52 UTC

56 points

6 comments1 min readLW link

Conversation with Paul Christiano

abergal11 Sep 2019 23:20 UTC

44 points

6 comments30 min readLW link

(aiimpacts.org)

Truthful and honest AI

abergal, Nick_Beckstead and Owain_Evans

29 Oct 2021 7:28 UTC

42 points

1 comment13 min readLW link

Rohin Shah on reasons for AI optimism

abergal31 Oct 2019 12:10 UTC

40 points

58 comments1 min readLW link

(aiimpacts.org)

Robin Hanson on the futurist focus on AI

abergal13 Nov 2019 21:50 UTC

31 points

24 comments1 min readLW link

(aiimpacts.org)

Updates to Open Phil’s career development and transition funding program

abergal and Bastian Stern

4 Dec 2023 18:10 UTC

28 points

0 comments2 min readLW link

abergal 22 Dec 2020 6:35 UTC
LW: 25 AF: 11
AF
on: 2020 AI Alignment Literature Review and Charity Comparison
AI Impacts now has a 2020 review page so it’s easier to tell what we’ve done this year—this should be more complete / representative than the posts listed above. (I appreciate how annoying the continuously updating wiki model is.)

Techniques for enhancing human feedback

abergal, Ajeya Cotra and Nick_Beckstead

29 Oct 2021 7:27 UTC

22 points

0 comments2 min readLW link

abergal 24 Jun 2021 22:40 UTC
LW: 21 AF: 13
AF
on: Discussion: Objective Robustness and Inner Alignment Terminology
Thank you so much for writing this! I’ve been confused about this terminology for a while and I really like your reframing.

An additional terminological point that I think it would be good to solidify is what people mean when they refer to “inner alignment” failures. As you alude to, my impression is that some people use it to refer to objective robustness failures, broadly, whereas others (e.g. Evan) use it to refer to failures that involve mesa optimization. There is then additional confusion around whether we should think “inner alignment” failures that don’t involve mesa optimization will be catastrophic and, relatedly, around whether humans count as mesa optimizers.

I think I’d advocate for letting “inner alignment” failures refer to objective robustness failures broadly, talking about “mesa optimization failures” as such, and then leaving the question about whether there are problematic inner alignment failures that aren’t mesa optimization-related on the table.
What links here?

Measuring and forecasting risks

abergal, Nick_Beckstead and jsteinhardt

29 Oct 2021 7:27 UTC

20 points

0 comments12 min readLW link

[Question] Could you save lives in your community by buying oxygen concentrators from Alibaba?

abergal16 Mar 2020 8:58 UTC

18 points

12 comments1 min readLW link

abergal 19 Sep 2020 13:46 UTC
LW: 16 AF: 9
AF
on: Draft report on AI timelines
So exciting that this is finally out!!!
I haven’t gotten a chance to play with the models yet, but thought it might be worth noting the ways I would change the inputs (though I haven’t thought about it very carefully):
- I think I have a lot more uncertainty about neural net inference FLOP/s vs. brain FLOP/s, especially given that the brain is significantly more interconnected than the average 2020 neural net—probably closer to 3 − 5 OOM standard deviation.
- I think I also have a bunch of uncertainty about algorithmic efficiency progress—I could imagine e.g. that the right model would be several independent processes all of which constrain progress, so probably would make that some kind of broad distribution as well.

[AMA] Announcing Open Phil’s University Group Organizer and Century Fellowships [x-post]

abergal and ClaireZabel

6 Aug 2022 21:48 UTC

14 points

0 comments13 min readLW link

(forum.effectivealtruism.org)

abergal 29 May 2020 12:05 UTC
LW: 11 AF: 5
AF
in reply to: Daniel Kokotajlo’s comment on: OpenAI announces GPT-3
I’m a bit confused about this as a piece of evidence—naively, it seems to me like not carrying the 1 would be a mistake that you would make if you had memorized the pattern for single-digit arithmetic and were just repeating it across the number. I’m not sure if this counts as “memorizing a table” or not.

abergal 20 Sep 2020 5:28 UTC
LW: 10 AF: 6
AF
on: Draft report on AI timelines
From Part 4 of the report:
Nonetheless, this cursory examination makes me believe that it’s fairly unlikely that my current estimates are off by several orders of magnitude. If the amount of computation required to train a transformative model were (say) ~10 OOM larger than my estimates, that would imply that current ML models should be nowhere near the abilities of even small insects such as fruit flies (whose brains are 100 times smaller than bee brains). On the other hand, if the amount of computation required to train a transformative model were ~10 OOM smaller than my estimate, our models should be as capable as primates or large birds (and transformative AI may well have been affordable for several years).
I’m not sure I totally follow why this should be true—is this predicated on already assuming that the computation to train a neural network equivalent to a brain with N neurons scales in some particular way with respect to N?