abergal

Karma: 875

abergal 11 Mar 2026 2:09 UTC
6 points
5
in reply to: MichaelDickens’s comment on: The case for AI safety capacity-building work
I’ve heard variants of this argument, and I overall haven’t found them that persuasive for reasons close to the ones Habryka gives—I think if you carve things up such that capacity-building work has been responsible for e.g. speeding up the creation of frontier AI labs, you should also credit it for the broader movement focused on catastrophic risks, and my intuition is the counterfactual world without that movement would be worse off overall. I don’t buy that the acceleration has been substantial enough that the unaccelerated world would have bought a lot of time for societal improvements useful for addressing catastrophic risks; instead, it feels to me like the unaccelerated world would be facing these risks more blindly and with less time to usefully prepare.
I also think given the massive amount of non-GCR-related interest and resources in AI now, the forward-looking acceleration effects seem likely to be much smaller than any historic effect. I generally think the ratio of “meaningfully adding to the talent pool of people working on catastrophic risks” to “meaningfully accelerating AI capabilities” for most CB programs will look extremely favorable.

abergal 11 Mar 2026 2:09 UTC
11 points
13
in reply to: sanyer’s comment on: The case for AI safety capacity-building work
I think it bites somewhat on work engaging with much younger age groups (i.e. high school students). Outside of that, I think empirically the turnaround time I’ve seen between people getting engaged and doing useful work is really short in many cases (like < 1 year, even for some university students), such that in even very short timelines, most capacity-building work still looks very good.

The case for AI safety capacity-building work

abergal10 Mar 2026 2:43 UTC

88 points

25 comments22 min readLW link

Funding for programs and events on global catastrophic risk, effective altruism, and other topics

abergal and reallyeli

14 Aug 2024 23:59 UTC

9 points

0 comments2 min readLW link

Funding for work that builds capacity to address risks from transformative AI

abergal and reallyeli

14 Aug 2024 23:52 UTC

16 points

0 comments5 min readLW link

abergal 7 Aug 2024 0:40 UTC
3 points
0
in reply to: Ben Pace, the Vacationing Vagabond’s comment on: Ben Pace’s Shortform Feed
Thanks for writing this up—at least for myself, I think I agree with the majority of this, and it articulates some important parts of how I live my life in ways that I hadn’t previously made explicit for myself.

Updates to Open Phil’s career development and transition funding program

abergal and Bastian Stern

4 Dec 2023 18:10 UTC

28 points

0 comments2 min readLW link

The Long-Term Future Fund is looking for a full-time fund chair

Linch, calebp99 and abergal

5 Oct 2023 22:18 UTC

52 points

0 comments7 min readLW link

(forum.effectivealtruism.org)

Long-Term Future Fund Ask Us Anything (September 2023)

Linch, calebp99, abergal, habryka, Thomas Larsen, LawrenceC and Lauro Langosco

31 Aug 2023 0:28 UTC

33 points

6 comments1 min readLW link

(forum.effectivealtruism.org)

Long-Term Future Fund: April 2023 grant recommendations

abergal, calebp99, Linch, habryka, Thomas Larsen and Vaniver

2 Aug 2023 7:54 UTC

81 points

3 comments50 min readLW link

abergal 5 Nov 2022 18:31 UTC
1 point
0
on: Don’t you think RLHF solves outer alignment?
I think your first point basically covers why—people are worried about alignment difficulties in superhuman systems, in particular (because those are the dangerous systems which can cause existential failures). I think a lot of current RLHF work is focused on providing reward signals to current systems in ways that don’t directly address the problem of “how do we reward systems with behaviors that have consequences that are too complicated for humans to understand”.

[AMA] Announcing Open Phil’s University Group Organizer and Century Fellowships [x-post]

abergal and ClaireZabel

6 Aug 2022 21:48 UTC

14 points

0 comments13 min readLW link

(forum.effectivealtruism.org)

abergal 10 Nov 2021 1:58 UTC
3 points
0
in reply to: ESRogs’s comment on: Interpretability
Chris Olah wrote this topic prompt (with some feedback from me (Asya) and Nick Beckstead). We didn’t want to commit him to being responsible for this post or responding to comments on it, so we submitted this on his behalf. (I’ve changed the by-line to be more explicit about this.)

Truthful and honest AI

abergal, Nick_Beckstead and Owain_Evans

29 Oct 2021 7:28 UTC

42 points

1 comment13 min readLW link

Interpretability

abergal and Nick_Beckstead

29 Oct 2021 7:28 UTC

61 points

13 comments12 min readLW link

Techniques for enhancing human feedback

abergal, Ajeya Cotra and Nick_Beckstead

29 Oct 2021 7:27 UTC

22 points

0 comments2 min readLW link

Measuring and forecasting risks

abergal, Nick_Beckstead and jsteinhardt

29 Oct 2021 7:27 UTC

22 points

0 comments12 min readLW link

Request for proposals for projects in AI alignment that work with deep learning systems

abergal and Nick_Beckstead

29 Oct 2021 7:26 UTC

87 points

0 comments5 min readLW link

abergal 2 Sep 2021 15:48 UTC
LW: 3 AF: 2
0
AF
on: Call for research on evaluating alignment (funding + advice available)
Thanks for writing this! Would “fine-tune on some downstream task and measure the accuracy on that task before and after fine-tuning” count as measuring misalignment as you’re imagining it? My sense is that there might be a bunch of existing work like that.

abergal 28 Aug 2021 2:46 UTC
LW: 3 AF: 2
0
AF
in reply to: RyanCarey’s comment on: Provide feedback on Open Philanthropy’s AI alignment RFP
This RFP is an experiment for us, and we don’t yet know if we’ll be doing more of them in the future. I think we’d be open to including research directions we think that are promising that apply equally well to both DL and non-DL systems—I’d be interested in hearing any particular suggestions you have.

(We’d also be happy to fund particular proposals in the research directions we’ve already listed that apply to both DL and non-DL systems, though we will be evaluating them on how well they address the DL-focused challenges we’ve presented.)