Took the survey!
FactorialCode
Submission: Counterfactual Oracle:
Use the oracle to compress data according to the MDL Principle. Specifically, give the oracle a string and ask it to produce a program that, when run, outputs the original string. The reward to the oracle is large and negative if the program does not reproduce the string when run, or inversely proportional to the length of the program if it does. The oracle receives a reward after the program runs or fails to terminate in a sufficient amount of time.
Submission: Low Bandwidth Oracle:
Have the oracle predict the price of a commodity / security / sports bet at some point in the future from a list of plausible prices. Ideally, the oracle would spit out a probability distribution which can be scored using a proper scoring rule, but just predicting the nearest most likely price should also work. Either way, the length of the episode is the time until the prediction can be verified. From there, it shouldn’t be too difficult to use those predictions to make money.
More generally, I suppose we can use the counterfactual oracle to solve any optimisation or decision problem that can be evaluated with a computer, such as protein folding, SAT problems, or formally checked maths proofs.
I suppose another way this could happen is that the company could set up a branch in a much poorer and easily corrupted nation, since it’s not constrained by people, it could build up a very large amount of power in a place that’s beyond the reach of a superpower’s anti-trust institutions.
I suppose that’s true. Although assuming that the company has developed intent aligned AGI, I don’t see why the entire branch couldn’t be automated, with the exception of a couple of human figureheads. Even if the AGI isn’t good enough to do AI research, or the company doesn’t trust it to do that, there are other methods for the company to grow. For instance, it could set up fully automated mining operations and factories in the corrupted country.
I think a lot of the animosity that Gary Markus drew was less that some of his points were wrong, and more that he didn’t seem to have a full grasp of the field before criticizing it. Here’s an r/machinelearning thread on one of his papers. Granted, r/ML is not necessarily representative of the AI community, especially now, but you see some people agreeing with some of his points, and others claiming that he’s not up to date with current ML research. I would recommend people take a look at the thread, to judge for themselves.
I’m also not inclined to take any twitter drama as strong evidence of the attitudes of the general ML community, mainly because twitter seems to strongly encourage/amplify the sort of argumentative shit-flinging pointed out in the post.
As others have said, one of the downsides of doing this is that adding a readable name interrupts the flow of the text. This is especially a problem if you’re skimming the article, because the expanded text will catch your attention even if it’s not a central point the article is trying to make. As for printing, I wonder what fraction of people actually do this, to see if the change is justified.
If I can try and make your solution concrete for the original 5-10 problem. Would it look something like this?
A() := let f(x) := Take A() := x as an axiom instead of A() := this function Take U() := to be the U() in the original 5-10 problem Look for a proof of "U() = y" return y in return argmax f(x) where x in {5,10}
How would f() map 10 to 0? Wouldn’t that require that from
A() := 10 U() := if A() = 10 return 10 if A() = 5 return 5
there’s a proof of
U() = 0
?
My understanding is that in the original formulation, the agent takes it’s own definition along with a description of the universe and looks for proofs of the form
[A() = 10 -> U() = x] & [A() = 5 -> U() = y ]
But since “A()” is the same in both sides of the expression, one of the implications is guaranteed to be vacuously true. So the output of the program depends on the order in which it looks for proofs. But here f looks for theorems starting from different axioms depending on it’s input, so “A()” and “U()” in f(5) can be different than “A()” and “U()” when f(10).
What should be the optimal initial Karma for a post/comment?
By default on reddit and lesswrong, posts start with 1 karma, coming from the user upvoting themselves. On lesswrong right now, it appears that this number can be set higher by strongly upvoting yourself. But this need not be the case. Posts and comments could start with either positive or negative karma. If posts start with larger positive karma, this might incentivize people to post/comment more often. Likewise, if posts or comments start off with negative karma, this acts as a disincentive to post.
A related idea would be to create a special board where posts/comments start off with large negative karma, but each upvote from users would give the poster more karma than usual. As a result, people would only post there if they expected their post to “break-even” in terms of Karma.
The standard Solomonoff prior discounts hypotheses by 2^(-l), where l is the number of bits required to describe them. However, we can easily imagine a whole class of priors, each with a different discount rate. For instance, one could discount by 1/Z2^(-(2l)) where Z is a normalizing factor to get probabilities to add up to one. Why do we put special emphasis on this rate of discounting rather than any other rate of discounting?
I think that we can justify this discount rate with the principle of maximum entropy, as distributions with steeper asymptotic discounting rates will have lower entropy than distributions with shallower asymptotic discounting rates and any distribution with a shallower discounting rate than 2^(-l) would (probably) diverge and therefore constitute an invalid probability distribution.
Are there arguments/situations justifying steeper discounting rates?
I actually like the fact that I don’t immediately know who is speaking to who in a thread. I feel like it prevents me from immediately biasing my judgment of what a person is saying before they say it.
Was this meant to be a reply to my bit about the Solmonoff prior?
If so, in the algorithmic information literature, they usually fix the unnormalizability stuff by talking about Prefix Turing machines. Which corresponds to only allowing TM descriptions that correspond to a valid Prefix Code.
But it is a good point that for steeper discounting rates, you don’t need to do that.
I’ve been thinking about how one could get continuous physics from a discrete process. Suppose you had a differential equation, and you wanted to make a discrete approximation to it. Furthermore, suppose you had a discrete algorithms for simulating this differential equation that takes in a parameter, say, dt which controls the resolution of the simulation. As dt tends toward zero, the dynamics of the simulated diff eq will tend towards the dynamics of the real diff eq.
Now suppose, we have a a turing machine that implements this algorithm as a subroutine. More precisely, the turing machine runs a simulations of diff equation at a resolutions of 1 then 1⁄2, then 1⁄3 and so on and so forth.
Finally, suppose their we’re a conscious observer in this simulation, at what resolution would they expect their physics to be simulated? Depending on one’s notion of anthropics, one could argue that at any resolution, there is a finite amount of observers in lower resolution simulations, but an infinite amount in higher resolution simulations. Consequently, the observer should expect to live in a universe with continuous physics.
I agree that the coordination games between nukes and AI are different, but I still think that nukes make for a good analogy. But not after multiple parties have developed them. Rather I think key elements of the analogy is the game changing and decisive strategic advantage that nukes/AI grant once one party develops them. There aren’t too many other technologies that have that property. (maybe the bronze-iron age transition?)
Where the analogy breaks down is with AI safety. If we get AI safety wrong there’s a risk of large permanent negative consequences. A better analogy might be living near the end of WW2, but if you build a nuclear bomb incorrectly, it ignites the atmosphere and destroys the world.
In either case, under this model, you end up with the following outcomes:
(A): Either party incorrectly develops the technology
(B): The other party successfully develops the technology
(C): My party successfully develops the technology
and generally a preference ordering of A<B<C, although a sufficiently cynical actor might have B<A<C.
If there’s a sufficiently shallow trade-off between speed of development and the risk of error, this can lead to a dollar auction like dynamic where each party is incentivized to trade a bit more risk in order to develop the technology first. In a symmetric situation without coordination, the
equilibriumnash equilibrium is all parties advancing as quickly as possible to develop the technology and throwing caution to the wind.
At some point, there was definitely discussion about formal verification of AI systems. At the very least, this MIRIx event seems to have been about the topic.
From Safety Engineering for Artificial General Intelligence:
An AI built in the Artificial General Intelligence paradigm, in which the design is engineered de novo, has the advantage over humans with respect to transparency of disposition, since it is able to display its source code, which can then be reviewed for trustworthiness (Salamon, Rayhawk, and Kramár 2010; Sotala 2012). Indeed, with an improved intelligence, it might find a way to formally prove its benevolence. If weak early AIs are incentivized to adopt verifiably or even provably benevolent dispositions, these can be continually verified or proved and thus retained, even as the AIs gain in intelligence and eventually reach the point where they have the power to renege without retaliation (Hall 2007a).
Also, from section 2 of Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda:
When constructing intelligent systems which learn and interact with all the complexities of reality, it is not sufficient to verify that the algorithm behaves well in test settings. Additional work is necessary to verify that the system will continue working as intended in application. This is especially true of systems possessing general intelligence at or above the human level: superintelligent machines might find strategies and execute plans beyond both the experience and imagination of the programmers, making the clever oscillator of Bird and Layzell look trite. At the same time, unpredictable behavior from smarter-than-human systems could cause catastrophic damage, if they are not aligned with human interests (Yudkowsky 2008). Because the stakes are so high, testing combined with a gut-level intuition that the system will continue to work outside the test environment is insufficient, even if the testing is extensive. It is important to also have a formal understanding of precisely why the system is expected to behave well in application. What constitutes a formal understanding? It seems essential to us to have both (1) an understanding of precisely what problem the system is intended to solve; and (2) an understanding of precisely why this practical system is expected to solve that abstract problem. The latter must wait for the development of practical smarter than-human systems, but the former is a theoretical research problem that we can already examine.
I suspect that this approach has fallen out of favor as ML algorithms have gotten more capable while our ability to prove anything useful about those algorithms has heavily lagged behind. Although deep mind and a few others are is still trying.
Interesting. I had the Nash equilibrium in mind, but it’s true that unlike a dollar auction, you can de-escalate, and when you take into account how your opponent will react to you changing your strategy, doing so becomes viable. But then you end up with something like a game of chicken, where ideally, you want to force your opponent to de-escalate first, as this tilts the outcomes toward option C rather than B.
I like this idea, and I think for it to take off, it would have to be implemented by easily piggy backing off of the existing email system. If I could download some kind of browser-extension that allowed me to accept payment for emails while letting me continue to use my existing email, I would consider having that option.
However, I think this could face some adoption problems. I could easily imagine there being negative social consequences to advertising having a paid email address. As it makes the statement “I am more likely to ignore your messages unless you pay me for my time.” common knowledge.
I think this is a good sign, this paper goes over many of the ideas that the RatSphere has discussed for years, and Deepmind is giving those ideas publicity. It also brings up preliminary solutions, of which, “Model Based Rewards” seems to go farthest in the right direction.(Although even the paper admits the idea’s been around since 2011)
However, the paper is still phrasing things in terms of additive reward functions, which don’t really naturally capture many kinds of preferences (such as those over possible worlds). I also feel that the causal influence diagrams, when unrolled for multiple time steps, needlessly complicate the issues being discussed. Most interesting phenomena in decision theory can be captured by simple 1 or 2 step games or decision trees. I don’t see the need to phrase things as multi-timestep systems. The same goes for presenting the objectives in terms of grid worlds.
Overall, the authors seem to still be heavily influenced by the RL paradigm. It’s a good start, we’ll see if the rest of the AI community notices.
I notice that there’s a fair bit of “thread necromancy” on LessWrong. I don’t think it’s a bad thing, but I think it would be cool to have an option to filter comments based on the time gap between when the post was made and when the comment was made. That way it’s easier to see to see what the discussion was like around the time when the post was made.
On a related note, does LessWrong record when upvotes are made? It would also be cool to have a “time-machine” to see how up-votes and down-votes in a thread evolve over time. Could be good for analysing the behaviour of threads in the short term, and a way to see how community norms change in the long term.
You might want to rot13 that last bit there for anyone who plans to see the movie.