the chance that [...] alignment is so easy that standard ML techniques work
I think this is probably true for LLM AGIs at least in the no-extinction sense, but has essentially no bearing on transitive AI risk (danger of AI tech that comes after first AGIs, developed by them or their successors). Consequently P(extinction) by 2100 only improves through alignment of first AGIs if they manage to set up reliable extinction risk governance, otherwise they are just going to build some more AGIs that don’t have the unusual property of being aligned by default.
And there is no indication that LLM AGIs would be in a much better position to delay AGI capability research until alignment theory makes it safe than we are, though the world order disruption from change in serial speed of thought probably gives them a chance to set this up.
Presumably we will build ML AGIs because they are safe and they won’t build unsafe non-ML AGI for the same reason we didn’t—because it wouldn’t be safe. So the idea is that alignment is so easy it actually transitive.
Presumably we will build ML AGIs because they are safe
I don’t see anything in the structure of humanity’s AGI-development process that would ensure this property. LLM human imitations are only plausibly aligned because they are imitations of humans. There are other active lines of research vying with them for the first AGI, with no hope for their safety.
For the moment, LLM characters have the capability advantage of wielding human faculties, not needing to reinvent alternatives for them from scratch. This is an advantage for crossing the AGI threshold, which humans already crossed, but not for improving further than that. There is nothing in this story that predicates the outcome on safety.
I think this is probably true for LLM AGIs at least in the no-extinction sense, but has essentially no bearing on transitive AI risk (danger of AI tech that comes after first AGIs, developed by them or their successors). Consequently P(extinction) by 2100 only improves through alignment of first AGIs if they manage to set up reliable extinction risk governance, otherwise they are just going to build some more AGIs that don’t have the unusual property of being aligned by default.
And there is no indication that LLM AGIs would be in a much better position to delay AGI capability research until alignment theory makes it safe than we are, though the world order disruption from change in serial speed of thought probably gives them a chance to set this up.
Presumably we will build ML AGIs because they are safe and they won’t build unsafe non-ML AGI for the same reason we didn’t—because it wouldn’t be safe. So the idea is that alignment is so easy it actually transitive.
I don’t see anything in the structure of humanity’s AGI-development process that would ensure this property. LLM human imitations are only plausibly aligned because they are imitations of humans. There are other active lines of research vying with them for the first AGI, with no hope for their safety.
For the moment, LLM characters have the capability advantage of wielding human faculties, not needing to reinvent alternatives for them from scratch. This is an advantage for crossing the AGI threshold, which humans already crossed, but not for improving further than that. There is nothing in this story that predicates the outcome on safety.