I looked at the Q1/4/5 answers[1]. I think they would indeed most likely all get 7s: there’s quite a bit of verbosity, and in particular OpenAI’s Q4 answer spends a lot of time talking its way around in circles, but I believe there’s a valid proof in all of them.
Most interesting is Q1, where OpenAI produces what I think is a very human answer (the same approach I took, and the one I’d expect most human solvers to take) while Google takes a less intuitive approach but one that ends up much neater. This makes me a little bit suspicious about whether some functionally-identical problem showed up somewhere in Google’s training, but if it didn’t that is extra impressive.
IMO Q3 and Q6 are generally much harder: the AI didn’t solve Q6, and I haven’t gone through the Q3 answers. Q2 was a geometry one, which is weirder to look through and which I find very unpleasant.
I’m not convinced that the Google approach to Q1 actually ends up significantly neater than the more obvious one (which was mine too, and like you I’d expect most human solvers to do it that way), but I do think the lemma they prove is a very nice one!
there are WLOG only three options for your single non-sunny line, and you can pretty much draw the diagrams and say “behold!” but even writing out longhand why you can’t cover what’s left in any of those cases with two sunny lines isn’t too painful.
I looked at the Q1/4/5 answers[1]. I think they would indeed most likely all get 7s: there’s quite a bit of verbosity, and in particular OpenAI’s Q4 answer spends a lot of time talking its way around in circles, but I believe there’s a valid proof in all of them.
Most interesting is Q1, where OpenAI produces what I think is a very human answer (the same approach I took, and the one I’d expect most human solvers to take) while Google takes a less intuitive approach but one that ends up much neater. This makes me a little bit suspicious about whether some functionally-identical problem showed up somewhere in Google’s training, but if it didn’t that is extra impressive.
IMO Q3 and Q6 are generally much harder: the AI didn’t solve Q6, and I haven’t gone through the Q3 answers. Q2 was a geometry one, which is weirder to look through and which I find very unpleasant.
I’m not convinced that the Google approach to Q1 actually ends up significantly neater than the more obvious one (which was mine too, and like you I’d expect most human solvers to do it that way), but I do think the lemma they prove is a very nice one!
I think the obvious approach is comparably neat until you get to the point of proving
that k=2 won’t work
at which point it’s a mess. The Google approach manages to prove that part in a much nicer way as a side effect of its general result.
The obvious approach lets you
reduce to the case of a size-3 triangle
at which point
there are WLOG only three options for your single non-sunny line, and you can pretty much draw the diagrams and say “behold!” but even writing out longhand why you can’t cover what’s left in any of those cases with two sunny lines isn’t too painful.