Why wouldn’t techniques that work on contemporary AI systems extend to AGI?
If by “techniques that work on contemporary AIs” you mean RLHF/RLAIF, then I don’t know anyone claiming that the robustness and safety of these techniques will “extend to AGI”. I think that AGI labs will soon move in the direction of releasing an agent architecture rather that a bare LLM, and will apply reasoning verification techniques. From OpenAI’s side, see “Let’s verify step by step” paper. From DeepMind’s side, see this interview with Shane Legg.
What I find incredible is how contributing to the development of existentially dangerous systems is viewed as a morally acceptable course of action within communities that on paper accept that AGI is a threat.
I think this passage (and the whole comment) is unfair because it presents what AGI labs are pursuing (i.e., plans like “superalignment”) as obviously consequentially bad plans. But this is actually very far from obvious. I personally tend to conclude that these are consequentially good plans, conditioned on the absence of coordination on “pause and united, CERN-like effort about AGI and alignment” (and the presence of open-source maximalist and risk-dismissive players like Meta AI).
What I think is bad in labs’ behaviour (if true, which we don’t know, because such coordination efforts might be underway but we don’t know about them) is that the labs are not trying to coordinate (among themselves and with the support of governments for legal basis, monitoring, and enforcement) on “pause and united, CERN-like effort about AGI and alignment”. Instead, we only see the labs coordinating and advocating for RSP-like policies.
Another thing that I think is bad in labs’ behaviour is inadequately little funding to safety efforts. Thus, I agree with the call in “Managing AI Risks in the Era of Rapid Progress” for the labs to allocate at least a third of their budgets to safety efforts. These efforts, by the way, shouldn’t be narrowly about AI models. Indeed, this is a major point of Roko’s OP. Investments and progress in computer and system security, political, economic, and societal structures is inadequate. This couldn’t be the responsibility of AGI labs alone, obviously, but I think they have to own at a part of it. They actually do own it, a little: they fund and support efforts like proof of humanness, UBI studies, and have stuff and/or teams that are at least in part working on these issues. But I think AGI labs are doing about an order of magnitude less than they should on these fronts.
If by “techniques that work on contemporary AIs” you mean RLHF/RLAIF, then I don’t know anyone claiming that the robustness and safety of these techniques will “extend to AGI”. I think that AGI labs will soon move in the direction of releasing an agent architecture rather that a bare LLM, and will apply reasoning verification techniques. From OpenAI’s side, see “Let’s verify step by step” paper. From DeepMind’s side, see this interview with Shane Legg.
I think this passage (and the whole comment) is unfair because it presents what AGI labs are pursuing (i.e., plans like “superalignment”) as obviously consequentially bad plans. But this is actually very far from obvious. I personally tend to conclude that these are consequentially good plans, conditioned on the absence of coordination on “pause and united, CERN-like effort about AGI and alignment” (and the presence of open-source maximalist and risk-dismissive players like Meta AI).
What I think is bad in labs’ behaviour (if true, which we don’t know, because such coordination efforts might be underway but we don’t know about them) is that the labs are not trying to coordinate (among themselves and with the support of governments for legal basis, monitoring, and enforcement) on “pause and united, CERN-like effort about AGI and alignment”. Instead, we only see the labs coordinating and advocating for RSP-like policies.
Another thing that I think is bad in labs’ behaviour is inadequately little funding to safety efforts. Thus, I agree with the call in “Managing AI Risks in the Era of Rapid Progress” for the labs to allocate at least a third of their budgets to safety efforts. These efforts, by the way, shouldn’t be narrowly about AI models. Indeed, this is a major point of Roko’s OP. Investments and progress in computer and system security, political, economic, and societal structures is inadequate. This couldn’t be the responsibility of AGI labs alone, obviously, but I think they have to own at a part of it. They actually do own it, a little: they fund and support efforts like proof of humanness, UBI studies, and have stuff and/or teams that are at least in part working on these issues. But I think AGI labs are doing about an order of magnitude less than they should on these fronts.