I read a bunch of its “thinking” and it gets SO close to solving it after the second message, but it miscounts the number of [] in the text provided for 19. Repeatedly. While quoting it verbatim. (I assume it foolishly “trusts” the first time it counted.) And based on its miscount, thinks that should be the representation for 23, instead. And thus rules out (a theory starting to point towards) the corrects answer.
I think this may at least be evidence that having anything unhelpful in context, even (maybe especially!) if self-generated, can be really harmful to model capabilities. I still think it’s pretty interesting.
I read a bunch of its “thinking” and it gets SO close to solving it after the second message, but it miscounts the number of [] in the text provided for 19. Repeatedly. While quoting it verbatim. (I assume it foolishly “trusts” the first time it counted.) And based on its miscount, thinks that should be the representation for 23, instead. And thus rules out (a theory starting to point towards) the corrects answer.
I think this may at least be evidence that having anything unhelpful in context, even (maybe especially!) if self-generated, can be really harmful to model capabilities. I still think it’s pretty interesting.