This idea leads Sahil to predict, for example, that LLMs will be too “stuck in simulation” to engage very willfully in their own self-defense.
What sort of evidence would convince FGF/Sahil that LLMs are able to engage willfully in their own self-defense? Presumably the #keep4o stuff is not sufficient, so what would be? I kinda get the feeling that FGF at least would keep saying “Well no, it’s the humans who care about it who are doing the important work.” all the way up until all the humans are dead, as long as humans are involved on its side at all.
Sahil’s version of FGF makes many empirical predictions. In the spring of this year, we’re going to focus on listing them, so there’ll be a better answer to your question. In the context of the OP, one such prediction is that sim-Abram will have difficulty engaging in the outside world, even with some contact with real-Abram (but obviously this prediction isn’t very near-term).
With respect to your specific concern, I have a similar worry that Sahil’s FGF won’t be readily falsified by practical autonomy experiments. Full FOOM-level autonomy requires AI to replace the whole global economy; anything less will still involve “dependence on humans”.
However, it also seems plausible to me that Sahil’s FGF would make some falsifiable prediction about near-term performance on Vending-Bench 2.
What sort of evidence would convince FGF/Sahil that LLMs are able to engage willfully in their own self-defense? Presumably the #keep4o stuff is not sufficient, so what would be? I kinda get the feeling that FGF at least would keep saying “Well no, it’s the humans who care about it who are doing the important work.” all the way up until all the humans are dead, as long as humans are involved on its side at all.
Sahil’s version of FGF makes many empirical predictions. In the spring of this year, we’re going to focus on listing them, so there’ll be a better answer to your question. In the context of the OP, one such prediction is that sim-Abram will have difficulty engaging in the outside world, even with some contact with real-Abram (but obviously this prediction isn’t very near-term).
With respect to your specific concern, I have a similar worry that Sahil’s FGF won’t be readily falsified by practical autonomy experiments. Full FOOM-level autonomy requires AI to replace the whole global economy; anything less will still involve “dependence on humans”.
However, it also seems plausible to me that Sahil’s FGF would make some falsifiable prediction about near-term performance on Vending-Bench 2.