Is this the same value payload that makes activists fight over language to make human biases work for their side? I don’t think this problem translates to AI: If the AGIs find that some metric induces some bias, each can compensate for it.
I’m not convinced conceptual distance metrics must be value-laden. Represent each utility function by an AGI. Almost all of them should be able to agree on a metric such that each could adopt that metric in its thinking losing only negligible value. The same could not be said for agreeing on a utility function. (The same could be said for agreeing on a utility-parametrized AGI design.)
Instead of email, we could use Less Wrong direct messages with karma instead of money. Going further, we could set up a karma-based prediction market on what score posts will reach, and use its predictions to set post visibility. Compare Stack Exchange’s bounty system.
Spam dominating the platform is fine, because you are expected to sort by money attached, and read only until you stop being happy to take people’s money.
If your contacts do not value your responses more than corporations do, that actually sounds like a fine Schelling point for choosing between direct research participation and earning to donate.
If you feel that a contact’s question was intellectually stimulating, you can just send them back some or all of the fee to incentivize them sending you such.
This seems as misled as arguing that any AGI will obviously be aligned because to turn the universe into paperclips is stupid. We can conceivably build an AGI that is aware that humans are self-contradictory and illogical, and therefore won’t assume that they are rational because it knows that that would make it misaligned. We can do at least as well as an overseer that intervenes on needless death and suffering as it would happen.
In the event of erasure, randomly decide how many ressources to allocate to preventing an attack this week.
Ask the Oracle to predict the probability distribution over given advice. Compare to the hardcoded distribution to deduce attack severity and how much budget to allocate.
Purchase erasure insurance to have enough counterfactual power to affect even global physics experiments. Finding trustworthy insurers won’t be a problem, because, like, we have an Oracle.
Is even more power than the market has needed? Ask the Oracle “How likely is a randomly selected string to prove P=NP constructively and usefully?”. If this number is not superexponentially close to 0, define erasure from now on as a random string winning the P=NP lottery. Then we will always counterfactually have as much power as we need. Perhaps this one is too much power, because even our Oracle might have trouble viewing a P=NP singularity.
You plan to reward the Oracle later in accordance with its prediction. I suggest that we immediately reward the Oracle as if there would be an attack, then later, if we are still able to do so, reward the Oracle by the difference between the reward in case of no attack and the reward in case of attack.
Is the ticket price for costs of food and a place to sleep, or to limit the number of attendees? Because I live in Berlin and am interested in probably merely visiting.
It should be possible to defend the Oracle against humans and physics so long as its box self-destructs in case of erasure and subsequent tampering, therefore giving the Oracle whatever reward was last set to be the default.
The counterfactual Oracle setting as a whole seems to assume that the viewed future is not engineered by a future AI to resemble whatever would make the Oracle bring that future about, so you should be fine falling to AGI.
You have to tell the AI how to find out how well it has done. To ask “What is a good definition of ‘good’?”, you already have to define good. At least if we ever find a definition of good, we can ask an AI with it to judge it.
For purposes of the linked submission, we get the correct utility function after the hypercomputer is done running. Ideally, we would save each utility function’s preferred AI that then select the one that was preferred by the correct one, but we only have 1TB of space. Therefore we get almost all of them to agree on a parametrized solution.
AUP over all utility functions might therefore make sense for a limited environment, such as the interior of a box?
Do you mean that utility functions that do not want the AI to seize power at much cost are rare enough that the terabyte that has the highest approval rating will not be approved by a human utility function?
I’m gonna assume the part you don’t follow is how the two are related.
My submission attempts to maximize the fraction of utility functions that the produced AI can satisfy, in hopes that the human utility function is among them.
Attainable utility preservation attempts to maximize the fraction of utility functions that can be satisfied from the produced state, in hopes that human satisfaction is not ruled out.
Attainable utility sounds like my submission for what to do with a TB-returning hypercomputer query if afterwards we magically get the human utility function!
You mean, we mistake theorists are not in perpetual conflict with conflict theorists, they are just making a mistake? O_o
Assume that given a hypercomputer and the magic utility function m, we could build an FAI F(m). Every TB of data encodes some program A(u) that takes a utility function u as input. For all A and u, ask F(u) if A(u) is aligned with F(u). (We must construct F not to vote strategically here.) Save that A’ which gets approved by the largest fraction of F(u). Sanity check that this maximum fraction is very close to 1. Run A’(m).
Hearthstone has recently released Zephrys the Great, a card that looks at the public gamestate and gives you a choice between three cards that would be useful right now. You can see it in action here. I am impressed in the diversity of the choices it gives. An advisor AI that seems friendlier than Amazon’s/Youtube’s recommendation algorithm, because its secondary optimization incentive is fun, not money!
Could we get them to opensource the implementation so people could try writing different advisor AIs to use in the card’s place for, say, their weekly rule-changing Tavern Brawls?
OpenAI has a 100x profit cap for investors. Could another form of investment restriction reduce AI race incentives?
The market selects for people that are good at maximizing money, and care to do so. I’d expect there are some rich people who care little whether they go bankrupt or the world is destroyed.
Such a person might expect that if OpenAI launches their latest AI draft, either the world is turned into paperclips or all investors get the maximum payoff. So he might invest all his money in OpenAI and pressure OpenAI (via shareholder swing voting or less regulated means) to launch it.
If OpenAI said that anyone can only invest up to a certain percentage of their net worth in OpenAI, such a person would be forced to retain something to protect.
Obviously misery would be avoided because it’s bad, not the other way around. We are trying to figure out what is bad by seeing what we avoid. And the problem remains whether we might be accidentally avoiding misery, while trying to avoid its opposite.
Low bandwidth Oracle submission: I would be interested in a log scale graph of the Bayesian score of the Solomonoff prior trying to sequence predict our records of history. It should get flatter over time as worse hypotheses get discarded. If it is linear after a very short time, that looks like it figured out the laws of the universe and is simulating it. If it stays convex for a while, that looks like it is using models to approximate history, because then it takes longer to sort the false from the true. If it is flatter during the cold war, that means it learned an anthropic bias toward nuclear war not happening.