My intuition is that these octopuses would be pretty unlikely to be scheming against us, for the following reasons:
dogs are pretty nice, and it’s apparently easy to domesticate foxes with very few generations. This suggests that “be nice to humans” is simple to learn for animals, and in particular simpler than “be as intelligent as human AI researchers”. So assuming the octopuses would first learn to like humans, it’s unclear to me why they would later start to dislike humans. Maybe your intuition is that they would later realize that they are being treated unfairly, which causes frustration that is not present in earlier generations?
If the octopuses speak octopese and English, I think it wouldn’t be too hard to get some octopi to translate for us. For example, we could ask one octopus to communicate some information to a second octopus using octopese. We can check if that worked by asking the second octopus to translate back to English. Now, we have a pair of (octopese text, English text) for which we know the translation. Now we ask a third octopus to translate octopese to English, and check the performance on some known examples. It would be possible to scheme against this test if the octopi are amazingly skilled at coordination, but I suspect human AI researchers would fail this test if a foreign secret service would use a similar setup to get AI researchers to translate between foreignese and English, so I think we should assume that such coordination is hard and a small individual failure rate would likely be enough for the group to fail (we need just one octopus that translates faithfully). (Note that this argument hinges on the existence of a language that is understood by different octopi, so it doesn’t translate 1:1 to AIs.)
There is a difference though: dogs and humans have coevolved for 10000 years. If you breed foxes you may quickly get dog like looks but behavior. But lets assume you can do that faster. It still makes a difference if you breed in isolation or socializing with the humans. You can see the difference with digs and cats. Dogs and humans had to cooperate to succeed at hunting and herding. Cats didn’t. Dogs are social. They feel social emotions such as love, loneliness and jealousy. They help their owners when they are incapacitated (though sometimes they cheat). I think Ryan’s, Daniel’s, and Neel’s estimate might be significantly lower if they think about German Shepherd scientists.
That said, for the purposes of alignment, it’s still good news that cats (by and large) do not scheme against their owner’s wishes, and the fact that cats can be as domesticated as they are while they aren’t cooperative or social is a huge boon for alignment purposes (within the analogy, which is arguably questionable).
My intuition is that these octopuses would be pretty unlikely to be scheming against us, for the following reasons:
dogs are pretty nice, and it’s apparently easy to domesticate foxes with very few generations. This suggests that “be nice to humans” is simple to learn for animals, and in particular simpler than “be as intelligent as human AI researchers”. So assuming the octopuses would first learn to like humans, it’s unclear to me why they would later start to dislike humans. Maybe your intuition is that they would later realize that they are being treated unfairly, which causes frustration that is not present in earlier generations?
If the octopuses speak octopese and English, I think it wouldn’t be too hard to get some octopi to translate for us. For example, we could ask one octopus to communicate some information to a second octopus using octopese. We can check if that worked by asking the second octopus to translate back to English. Now, we have a pair of (octopese text, English text) for which we know the translation. Now we ask a third octopus to translate octopese to English, and check the performance on some known examples.
It would be possible to scheme against this test if the octopi are amazingly skilled at coordination, but I suspect human AI researchers would fail this test if a foreign secret service would use a similar setup to get AI researchers to translate between foreignese and English, so I think we should assume that such coordination is hard and a small individual failure rate would likely be enough for the group to fail (we need just one octopus that translates faithfully). (Note that this argument hinges on the existence of a language that is understood by different octopi, so it doesn’t translate 1:1 to AIs.)
There is a difference though: dogs and humans have coevolved for 10000 years. If you breed foxes you may quickly get dog like looks but behavior. But lets assume you can do that faster. It still makes a difference if you breed in isolation or socializing with the humans. You can see the difference with digs and cats. Dogs and humans had to cooperate to succeed at hunting and herding. Cats didn’t. Dogs are social. They feel social emotions such as love, loneliness and jealousy. They help their owners when they are incapacitated (though sometimes they cheat). I think Ryan’s, Daniel’s, and Neel’s estimate might be significantly lower if they think about German Shepherd scientists.
That said, for the purposes of alignment, it’s still good news that cats (by and large) do not scheme against their owner’s wishes, and the fact that cats can be as domesticated as they are while they aren’t cooperative or social is a huge boon for alignment purposes (within the analogy, which is arguably questionable).