If I was asked the question, I think I would have said something closer to GPT-3 on Input E (though I would give the reason for making the inference she’s on an airplane as being because she’s looking down at clouds, not because she’s looking out a window), as opposed to PaLM’s response.
This isn’t directly the case here, but thinking about this made me realize that in some sense, a flawed answer which is more human-like is a better answer than one which is perfect (because the flawed human response would be a more likely completion of the text). Considering that, I’m not sure if it would even be possible to utilize any future iteration of this sort of architecture to get it to answer in a significantly “superhuman” manner. It would become the perfect mimic, but can text completion bots ever go beyond that?
The “inference” “We can also infer that she is traveling at a high speed because she is unbuckling her seatbelt.” is also nonsensical. People don’t typically unbuckle their seatbelts when traveling at high speed. (Albeit, this does maybe happen to be true for airplane travel because one isn’t allowed to unbuckle one’s seatbelt while traveling at low speed, i.e. during taxi, takeoff and landing; but that’s enough of a non-central case that it needs to be called out for the reasoning not to sound absurd.)
Why is it a non-central example when this is, in fact, about commercial airplane travel where you will be moving fastest at cruising altitude and that is when you’re allowed to unbuckle and move about the cabin?
I think I have that intuition because the great majority of seatbelt unbucklings in my experience happen while traveling at a speed of zero (because they’re in cars, not planes). The sentence has no cues to indicate the unusual context of being in a plane (and in fact, figuring that out is the point of the example). So my mental process reading that sentence is “that’s obviously false” → “hmm, wonder if I’m missing something” → “oh, maybe in a plane?” and the first step there seems a lot more reliable (in other reasoners as well, not just me) than the second or third.
If I was asked the question, I think I would have said something closer to GPT-3 on Input E (though I would give the reason for making the inference she’s on an airplane as being because she’s looking down at clouds, not because she’s looking out a window), as opposed to PaLM’s response.
This isn’t directly the case here, but thinking about this made me realize that in some sense, a flawed answer which is more human-like is a better answer than one which is perfect (because the flawed human response would be a more likely completion of the text). Considering that, I’m not sure if it would even be possible to utilize any future iteration of this sort of architecture to get it to answer in a significantly “superhuman” manner. It would become the perfect mimic, but can text completion bots ever go beyond that?
The “inference” “We can also infer that she is traveling at a high speed because she is unbuckling her seatbelt.” is also nonsensical. People don’t typically unbuckle their seatbelts when traveling at high speed. (Albeit, this does maybe happen to be true for airplane travel because one isn’t allowed to unbuckle one’s seatbelt while traveling at low speed, i.e. during taxi, takeoff and landing; but that’s enough of a non-central case that it needs to be called out for the reasoning not to sound absurd.)
Why is it a non-central example when this is, in fact, about commercial airplane travel where you will be moving fastest at cruising altitude and that is when you’re allowed to unbuckle and move about the cabin?
I think I have that intuition because the great majority of seatbelt unbucklings in my experience happen while traveling at a speed of zero (because they’re in cars, not planes). The sentence has no cues to indicate the unusual context of being in a plane (and in fact, figuring that out is the point of the example). So my mental process reading that sentence is “that’s obviously false” → “hmm, wonder if I’m missing something” → “oh, maybe in a plane?” and the first step there seems a lot more reliable (in other reasoners as well, not just me) than the second or third.