AI Impacts now has a 2020 review page so it’s easier to tell what we’ve done this year—this should be more complete / representative than the posts listed above. (I appreciate how annoying the continuously updating wiki model is.)
abergal
Request for proposals for projects in AI alignment that work with deep learning systems
Long-Term Future Fund: April 2023 grant recommendations
Open Philanthropy is seeking proposals for outreach projects
Interpretability
Provide feedback on Open Philanthropy’s AI alignment RFP
Conversation with Paul Christiano
Truthful and honest AI
Rohin Shah on reasons for AI optimism
Robin Hanson on the futurist focus on AI
Updates to Open Phil’s career development and transition funding program
Techniques for enhancing human feedback
Thank you so much for writing this! I’ve been confused about this terminology for a while and I really like your reframing.
An additional terminological point that I think it would be good to solidify is what people mean when they refer to “inner alignment” failures. As you alude to, my impression is that some people use it to refer to objective robustness failures, broadly, whereas others (e.g. Evan) use it to refer to failures that involve mesa optimization. There is then additional confusion around whether we should think “inner alignment” failures that don’t involve mesa optimization will be catastrophic and, relatedly, around whether humans count as mesa optimizers.
I think I’d advocate for letting “inner alignment” failures refer to objective robustness failures broadly, talking about “mesa optimization failures” as such, and then leaving the question about whether there are problematic inner alignment failures that aren’t mesa optimization-related on the table.- Refactoring Alignment (attempt #2) by 26 Jul 2021 20:12 UTC; 46 points) (
- Collection of arguments to expect (outer and inner) alignment failure? by 28 Sep 2021 16:55 UTC; 20 points) (
- Collection of arguments to expect (outer and inner) alignment failure? by 28 Sep 2021 16:55 UTC; 20 points) (
Measuring and forecasting risks
[Question] Could you save lives in your community by buying oxygen concentrators from Alibaba?
So exciting that this is finally out!!!
I haven’t gotten a chance to play with the models yet, but thought it might be worth noting the ways I would change the inputs (though I haven’t thought about it very carefully):
I think I have a lot more uncertainty about neural net inference FLOP/s vs. brain FLOP/s, especially given that the brain is significantly more interconnected than the average 2020 neural net—probably closer to 3 − 5 OOM standard deviation.
I think I also have a bunch of uncertainty about algorithmic efficiency progress—I could imagine e.g. that the right model would be several independent processes all of which constrain progress, so probably would make that some kind of broad distribution as well.
[AMA] Announcing Open Phil’s University Group Organizer and Century Fellowships [x-post]
I’m a bit confused about this as a piece of evidence—naively, it seems to me like not carrying the 1 would be a mistake that you would make if you had memorized the pattern for single-digit arithmetic and were just repeating it across the number. I’m not sure if this counts as “memorizing a table” or not.
From Part 4 of the report:
Nonetheless, this cursory examination makes me believe that it’s fairly unlikely that my current estimates are off by several orders of magnitude. If the amount of computation required to train a transformative model were (say) ~10 OOM larger than my estimates, that would imply that current ML models should be nowhere near the abilities of even small insects such as fruit flies (whose brains are 100 times smaller than bee brains). On the other hand, if the amount of computation required to train a transformative model were ~10 OOM smaller than my estimate, our models should be as capable as primates or large birds (and transformative AI may well have been affordable for several years).
I’m not sure I totally follow why this should be true—is this predicated on already assuming that the computation to train a neural network equivalent to a brain with N neurons scales in some particular way with respect to N?
I feel pretty bad about both of your current top two choices (Bellingham or Peekskill) because they seem too far from major cities. I worry this distance will seriously hamper your ability to hire good people, which is arguably the most important thing MIRI needs to be able to do. [Speaking personally, not on behalf of Open Philanthropy.]