Possibly offer a prize on formalizing and/or distilling the argument for deception (Also its constituents i.e. gradient hacking, situational awareness, non-myopia)
How should we model software progress? In particular, what is the right function for modeling short-term return on investment to algorithmic progress?
My guess is that most researchers with short timelines think, as I do, that there’s lots of low-hanging fruit here. Funders may underestimate the prevalence of this opinion, since most safety researchers do not talk about details here to avoid capabilities acceleration.
A long time ago I spent a few months reading and thinking about Ajeya’s bio anchors report. I played around with the spreadsheet version of it, trying out all sorts of different settings, and in particular changing the various settings to values that I thought were more plausible.
As a result I figured out what the biggest cruxes were between me and Ajeya—the differences in variable-settings that led to the largest differences in our timelines.
The biggest one was (unsurprisingly, in retrospect) the difference in where we put our probability mass for the training requirements distribution. That in turn broke down into several sub-cruxes.
I wrote Fun with +12 OOMs to draw everyone’s attention to that big uber-crux. In addition to just pointing out that uber-crux, my post also operationalized it and explained it so that people didn’t have to be super familiar with Ajeya’s report to understand what the debate was about. Also, I gave five examples of things you could do with +12 OOMs, very concrete examples, which people could then argue about, in the service of answering the uber-crux.
So, what I would like to see now is the same thing I wanted to see after writing the post, i.e. what I hoped to inspire with the post: A vigorous debate over questions like “What are the reasons to think OmegaStar would constitute AGI/TAI/etc.? What are the reasons to think it wouldn’t?” and “What about Crystal Nights?” and “What about a smaller version of OmegaStar, that was only +6 OOMs instead of +12? Is that significantly less likely to work, or is the list of reasons why it might or might not work basically the same?” All in the service of answering the Big Crux, i.e. probability that +12 OOMs would be enough / more generally, what the probability distribution over OOMs should be.
What concrete cruxes would you most like to see investigated?
This is an empirical question, so I may be missing some key points. Anyway here are a few:
My above points on Ajeya anchors and semi-informative priors
Or, put another way, why reject Daniel’s post?
Can deception precede economically TAI?
Possibly offer a prize on formalizing and/or distilling the argument for deception (Also its constituents i.e. gradient hacking, situational awareness, non-myopia)
How should we model software progress? In particular, what is the right function for modeling short-term return on investment to algorithmic progress?
My guess is that most researchers with short timelines think, as I do, that there’s lots of low-hanging fruit here. Funders may underestimate the prevalence of this opinion, since most safety researchers do not talk about details here to avoid capabilities acceleration.
To elaborate on what Jacob said:
A long time ago I spent a few months reading and thinking about Ajeya’s bio anchors report. I played around with the spreadsheet version of it, trying out all sorts of different settings, and in particular changing the various settings to values that I thought were more plausible.
As a result I figured out what the biggest cruxes were between me and Ajeya—the differences in variable-settings that led to the largest differences in our timelines.
The biggest one was (unsurprisingly, in retrospect) the difference in where we put our probability mass for the training requirements distribution. That in turn broke down into several sub-cruxes.
I wrote Fun with +12 OOMs to draw everyone’s attention to that big uber-crux. In addition to just pointing out that uber-crux, my post also operationalized it and explained it so that people didn’t have to be super familiar with Ajeya’s report to understand what the debate was about. Also, I gave five examples of things you could do with +12 OOMs, very concrete examples, which people could then argue about, in the service of answering the uber-crux.
So, what I would like to see now is the same thing I wanted to see after writing the post, i.e. what I hoped to inspire with the post: A vigorous debate over questions like “What are the reasons to think OmegaStar would constitute AGI/TAI/etc.? What are the reasons to think it wouldn’t?” and “What about Crystal Nights?” and “What about a smaller version of OmegaStar, that was only +6 OOMs instead of +12? Is that significantly less likely to work, or is the list of reasons why it might or might not work basically the same?” All in the service of answering the Big Crux, i.e. probability that +12 OOMs would be enough / more generally, what the probability distribution over OOMs should be.