Proposal on AI evaluation: false-proving

Currently, one of arguments against deploying powerful non-agentic AIs is that they can deceive you: you ask for plan for doing something, AI creates one and then unexpectedly gets power as result of that plan.

We can ask AI to provide proof or reasoning why its plan works. However, it can reason as fictional characters do (e.g. choose the outcome producing the most emotions; mostly applicable to LLMs), or it can include false statements in the speech. So there should be a way to find false statements (logic or factual errors).

Usually people (most importantly, pupils and students) are taught on true proofs but not on false ones. This can lead to thinking “if something looks like it’s based on logic, then the result is true”. So at the moment people are not very effective in searching reasoning flaws.

A possible solution to that is to introduce false-prove tasks. Such tasks shouldn’t be long. Also, the conclusion could be true, to mix things up a bit.

Task: prove that 2 + 2 = 4.

Lemma: Any two natural numbers (a and b) are equal.
Let’s prove that lemma by induction on max(a, b). The base case is a = b = 1, where the statement is true.
Now, we have max(a, b) > 1. Let’s reduce both numbers by one and apply the induction hypothesis. Because a-1 = b-1, a=b follows, so the lemma is correct.
---
3 + 1 = 4, as 4 is the number following 3
3 = 2 by lemma
1 = 2 by lemma
After substituion we get 2 + 2 = 4. Q.E.D.

And students’ own research should probably be rewarded, even leading to wrong results but where flaw is not obvious.