A List of things I might do with a Proof Oracle

I have previously written that I don’t think Formal Proof is a good roadmap for AI alignment.

However, this framing seems unreasonably strong.

So here is a list of 20 problems related to alignment that a godlike proof oracle would be useful for.

For the purpose of this exercise, assume that you have an oracle that takes as input a math problem stated with about the level of rigor of a Math Olympiad problem, converts it into a formal specification, creates a formal proof, and then generates a human readable summary of that proof.

Notice that if you have such an oracle you can also ask it questions of the form “find X such that X has property Y”

1. Formal specification and proof of a quantum-resistant encryption scheme

Being able to encrypt messages knowing that no AI in the future will be able to read them is useful e.g. for avoiding Roko’s Basilisk concerns

2. Defining Programs whose effects are limited in space and time

Suppose you have a superhuman AI and you don’t want to unbox it, but you do want to use it. For example, you might ask it to build a nanobot that converts a certain amount of matter into diamond but not all matter.

3. Solve the alignment problem

Certain specifications of the alignment problem are computationally intractable but easy to specify.

for example

An AI is aligned, if after turning it on, a simulation of all living humans giving infinite time and resources to study the world created by the AI without interference would agree that it is aligned

4. Acquire 6 Million dollars and donate it to pro-alignment causes

Duh.

5. Determine whether the many-worlds hypothesis is true

...by solving the Grand Unified Theory in Physics.

If Many Worlds is false (e.g. because ER=EPR or Objective Waveform collapse is true), then certain alignment strategies (e.g. bargaining with other timelines) are no longer feasible.

6. Prove the existence of free-will (or not)

...by proving one of

  1. no-cloning implies that an agent’s actions cannot be knowing in advance

  2. irreducible complexity implies an agent cannot be predicted without simulating it

7. Create a text-to-image model that provably does not reproduce any of the images in its training set

Seems useful to make people like this shut up.

8. Human Whole Brain Emulation

Take a person, put them in a VR environment, record their actions and then solve the problem:

find a neural network of with minimum complexity+error which predicts these actions

9. Proof (or disprove) the existence of God

Suppose a God of the Gaps exists. Or that God answers prayer. With a perfect model of physics this should be statistically detectable in terms of violations of natural law.

10. Prove that the Chinchilla Scaling Laws are correct (or not)

The question of whether or not this is the optimal text-prediction accuracy for a given amount of compute and data.

11. Prove a version of the Natural Abstraction Hypothesis is true (or false)

This hypothesis is currently lies on my “most likely path” for AI Alignment.

12. Create a chatbot that is provably “helpful, harmless and honest”

People seem that this would be useful. Or maybe not.

13. Prove whether the universe is deterministic (or not)

Suppose the universe is deterministic (and started from a low-entropy initialization point) then it should be possible to record a few hours of static anywhere in the universe and then find a low-Kolmogorov-complexity algorithm that outputs that static.

14. Prove the Collatz Conjecture

Per Erdos, “Mathematics may not be ready for such problems”, which implies a proof would yield substantial new breakthroughs in math. Some of these may turn out to be useful for Alignment

15. Execute the Long Reflection

Given an emulation of a human brain, determine how they would solve the Alignment Problem if given 1m years to think about the problem.

16. Solve the Moral Mazes Problem

Determine what social structure optimally avoids the principal agent problems arising in large organizations.

17. TPU Design

Given a formal design language for computer chips, find the TPU Design which optimally achieves a goal (e.g GPT-3 accuracy at next character prediction).

Note: even if you don’t plan to use such a TPU (because acceleration bad) knowing what it is seems important. Is it 10x faster 10,000x ?

18. Solve the Diamond Alignment Problem

Should be easier than the Alignment Problem to formally specify.

19. Stock Market Arbitrage

Find the stock trading algorithm that statistically maximizes returns when back tested against financial data.

20. Generate More questions for this List

Generate the lowest kolmogorov complexity program which outputs this list and ask it what question 21 would be.