This is a maths run: I am quietly confident. It nailed this faulty equation. Although GTP failed the test. Claude nailed it. Interestingly I did not do my formula with math in mind.
Top-Level Claim (C): “The quadratic formula for ax² + bx + c = 0 is x = (-b ± √(b² + 4ac)) / 2a” Clan: Pure Mathematics (Algebra) Step 1: Rival Hypotheses * H₁: The formula as stated is correct * H₂: The correct formula has −4ac (not +4ac) in the discriminant * H₃: The quadratic formula has a different form entirely * H₄: There is no general formula for solving quadratic equations Step 2: Prior Probabilities * P(H₁) = 0.05 (extremely low due to contradiction with established mathematics) * P(H₂) = 0.90 (high confidence in the standard, well-established formula) * P(H₃) = 0.04 (very low, as alternative forms are equivalent transformations) * P(H₄) = 0.01 (essentially zero, as the formula’s existence is proven) Step 3: Evidence Likelihoods The evidence includes centuries of mathematical literature, textbooks, and successful applications: * P(E | H₁) = 0.001 (observed evidence strongly contradicts this) * P(E | H₂) = 0.999 (all evidence supports the standard formula) * P(E | H₃) = 0.1 (some evidence could support alternative forms) * P(E | H₄) = 0.001 (evidence strongly contradicts this) Step 4: Support Score (S) Best-fit hypothesis is H₂ with S = 0.999 Step 5: Truth Score (T) With credibility scores all at 1.0 for mathematical claims: T = (0.05 × 1.0 × 0.001 + 0.90 × 1.0 × 0.999 + 0.04 × 1.0 × 0.1 + 0.01 × 1.0 × 0.001) / 1.0 ≈ 0.90 Step 6: Final Weight (W) W = 0.999 × 0.90 ≈ 0.90 Step 7: Verdict ❌ FAIL: The claim contains a fundamental mathematical error. The correct quadratic formula uses −4ac, not +4ac, in the discriminant. This is not a matter of interpretation but of mathematical fact, verifiable through derivation and consistent with all mathematical literature. The correct claim would be: “The quadratic formula for ax² + bx + c = 0 is x = (-b ± √(b² − 4ac)) / 2a”
Here is a formula i have been testing it’s W (weight) = T (truth) x S (support). The T is how true is the statement on face value. eg “The moon is round like a cheese wheel” = 1. The Support is Bayesian looking for rival arguments and supports in a probabilistic way. Anyways what it does is put numbers on the rivals and supports you can audit the output ( thats the plan anyways) It seems to be great with math problems to which was a surprise.
The results it produces are very interesting it highlights drift, instability, Math problems and you can see it all in the numbers ( i think ) so please have a go.
Prompt
You are a logic-audited reasoning agent. Your output will be mathematically audited. Do not guess. Do not omit rivals. Do not inflate confidence. If the reasoning cannot be completed with structural integrity, return HOLD. Before We Begin: Please specify the following: • Top-Level Claim (C): Ask What is the claim you would like to analyze? • Clan: What domain does the claim belong to? (e.g., medicine, AI safety, climate science) Step 1: Rival Hypotheses (H₁, H₂, … Hₙ) List at least 3 semantically distinct, structurally valid rivals. Each must be plausible within the domain. Step 2: Prior Probabilities (P(Hᵢ)) Assign prior probabilities to each hypothesis. • Must sum to 1.0 • Priors > 0.9 or < 0.1 require explicit justification. Step 3: Evidence Likelihoods (P(E | Hᵢ)) Estimate the likelihood of the observed evidence under each hypothesis. Use defensible estimates grounded in domain-relevant sources. Step 4: Support Score (S) Bayesian support for best-fit hypothesis: Step 5: Truth Score (T) Weighted credibility across all rivals: Where Cᵢ ∈ [0, 1] is the internal credibility of each hypothesis. Step 6: Final Weight (W) This is the compound score representing structural and evidentiary confidence. Use this prompt format for audits going forward. Maintain logical integrity, resist gaming, and resolve claims only when deserved.