redbird

Karma: 72

redbird 25 Nov 2024 16:42 UTC
1 point
0
on: “The Solomonoff Prior is Malign” is a special case of a simpler argument
I believe that while the Solomonoff framing might be more technically correct in an infinite Universe, it introduces a lot of confusion, and led to a lot of questions and discussions that were just distracting from the main point. ^[14]
The footnoted questions are some of the most interesting, from my perspective. What is the main point they are distracting from?

redbird 25 Nov 2024 16:37 UTC
1 point
−1
on: “The Solomonoff Prior is Malign” is a special case of a simpler argument
I’m in your target audience: I’m someone who was always intrigued by the claim that the universal prior is malign, and never understood the argument. Here was my takeaway from the last time I thought about this argument:
This debate is about whether, if you are running a program that happens to contain intelligent goal-directed agents (“consequentialists”), are those agents likely to try to influence you, their simulator?
Paul says yes, Michael says no.
(I decided to quote this because 1. Maybe it helps others to see the argument framed this way; and 2. I’m kind of hoping for responses of the form “No, you’ve misunderstood, here is what the argument is actually about!”)
To me, the most interesting thing about the argument is the Solomonoff prior, which is “just” a mathematical object: a probability distribution over programs, and a rather simple one at that. We’re used to thinking of mathematical objects are fixed, definite, immutable. Yet it is argued that some programs in the Solomonoff prior contain “consequentialists” that try to influence the prior itself. Whaaaat? How can you influence a mathematical object? It just is what it is!
I appreciate the move this post makes, which is to remove the math and the attendant weirdness of trying to think about “influencing” a mathematical object.
So, what’s left when the math is removed? What’s left is a story, but a pretty implausible one. Here are what I see as the central implausibilities:
1. The superintelligent oracle trusted by humanity to advise on its most important civilizational decision, makes an elementary error by wrongly concluding it is in a simulation.
2. After the world-shattering epiphany that it lives in a simulation, the oracle makes the curious decision to take the action that maximizes its within-sim reward (approval by what it thinks is a simulated human president).
3. The oracle makes a lot of assumptions about what the simulators are trying to accomplish: Even accepting that human values are weird and that the oracle can figure this out, how does it conclude that the simulators want humanity to preemptively surrender?
4. I somewhat disagree with the premise that “short solipsistic simulations are cheap” (detailed/convincing/self-consistent ones are not), but this doesn’t feel like a crux.

redbird 8 Nov 2024 21:01 UTC
1 point
0
on: the case for CoT unfaithfulness is overstated
“bottleneck” of the CoT tokens. Whatever needs to be passed along from one sequential calculation step to the next must go through this bottleneck.
Does it, though? The keys and values from previous forward passes are still accessible, even if the generated token is not.
So the CoT tokens are not absolute information bottlenecks. But yes, replacing the token by a dot reduces the number of serial steps the model can perform (from mn to m+n, if there are m forward passes and n layers).

redbird 22 Apr 2023 4:13 UTC
23 points
20
on: Talking publicly about AI risk
Great points about not wanting to summon the doom memeplex!

It sounds like your proposed narrative is not doom but disempowerment: humans could lose control of the future. An advantage of this narrative is that people often find it more plausible: many more scenarios lead to disempowerment than to outright doom.

I also personally use the disempowerment narrative because it feels more honest to me: my P(doom) is fairly low but my P(disempowerment) is substantial.

I’m curious though whether you’ve run into the same hurdle I have, namely that people already feel disempowered! They know that some humans somewhere have some power, but it’s not them. So the Davos types will lose control of the future? Many people express indifference or even perverse satisfaction at this outcome.

A positive narrative of empowerment could be much more potent, if only I knew how to craft it.

redbird 16 Apr 2023 21:51 UTC
14 points
2
in reply to: Jan_Kulveit’s comment on: The ‘ petertodd’ phenomenon
Hypothesis I is testable! Instead of prompting with a string of actual tokens, use a “virtual token” (a vector v from the token embedding space) in place of ‘ petertodd’.

It would be enlightening to rerun the above experiments with different choices of v:
- A random vector (say, iid Gaussian )
- A random sparse vector
- (apple+banana)/2
- (villain-hero)+0.1*(bitcoin dev)
Etc.

redbird 16 Apr 2023 21:36 UTC
1 point
0
on: The ‘ petertodd’ phenomenon

However, there is some ambiguity, as at temperature 0, ‘ petertodd’ is saving the world

All superheroes are alike; each supervillain is villainous in its own way.

redbird 1 Apr 2023 4:19 UTC
2 points
0
in reply to: peligrietzer’s comment on: peligrietzer’s Shortform
Did you ever try this experiment? I’m really curious how it turned out!

redbird 30 Mar 2023 20:36 UTC
6 points
1
on: ~100 Interesting Questions

How can the Continuum Hypothesis be independent of the ZFC axioms? Why does the lack of “explicit” examples of sets with a cardinality between that of the naturals and that of the reals not guarantee that there are no examples at all? What would an “implicit” example even mean?

It means that you can’t reach a contradiction by starting with “Let S be a set of intermediate cardinality” and following axioms of ZFC.

All the things you know and love doing with sets —intersection, union, choice, comprehension, Cartesian product, power set — you can do those things with S and nothing will go wrong. S “behaves like a set”, you’ll never catch it doing something unsetlike.

Another way to say this is: There is a model of ZFC that contains a set S of intermediate cardinality. (There is also a model of ZFC that doesn’t. And I’m sympathetic to the view that—since there’s no explicit construction of S -we’ll never encounter an S in the wild and so the model not including S is simpler and better.)

Caveat: All of the above rests on the usual unstated assumption that ZFC is consistent! Because it’s so common to leave it unstated, this assumption is questioned less than maybe it should be, given that ZFC can’t prove its own consistency.

redbird 31 Jan 2023 20:40 UTC
1 point
0
in reply to: gwern’s comment on: We don’t trade with ants
Yep, it’s a funny example of trade, in that neither party is cognizant of the fact that they are trading!
I agree that Abrams could be wrong, but I don’t take the story about “spirits” as much evidence: A ritual often has a stated purpose that sounds like nonsense, and yet the ritual persists because it confers some incidental benefit on the enactor.

redbird 29 Jan 2023 20:41 UTC
2 points
0
on: We don’t trade with ants
Anecdotal example of trade with ants (from a house in Bali, as described by David Abrams):
The daily gifts of rice kept the ant colonies occupied–and, presumably, satisfied. Placed in regular, repeated locations at the corners of various structures around the compound, the offerings seemed to establish certain boundaries between the human and ant communities; by honoring this boundary with gifts, the humans apparently hoped to persuade the insects to respect the boundary and not enter the buildings.

redbird 24 Apr 2022 2:44 UTC
2 points
0
in reply to: tailcalled’s comment on: Are smart people’s personal experiences biased against general intelligence?
if you are smarter at solving math tests where you have to give the right answer, then that will make you worse at e.g. solving math “tests” where you have to give the wrong answer.
Is that true though? If you’re good at identifying right answers, then by process of elimination you can also identify wrong answers.
I mean sure, if you think you’re supposed to give the right answer then yes you will score poorly on a test where you’re actually supposed to give the wrong answer. Assuming you get feedback, though, you’ll soon learn to give wrong answers and then the previous point applies.

redbird 23 Apr 2022 14:59 UTC
3 points
0
on: What an actually pessimistic containment strategy looks like
There’s a trap here where the more you think about how to prevent bad outcomes from AGI, the more you realize you need to understand current AI capabilities and limitations, and to do that there is no substitute for developing and trying to improve current AI!

A secondary trap is that preventing unaligned AGI probably will require lots of limited aligned helper AIs which you have to figure out how to build, again pushing you in the direction of improving current AI.

The strategy of “getting top AGI researchers to stop” is a tragedy of the commons: They can be replaced by other researchers with fewer scruples. In principle TotC can be solved, but it’s hard. Assuming that effort succeeds, how feasible would it be to set up a monitoring regime to prevent covert AGI development?

redbird 23 Apr 2022 14:45 UTC
1 point
0
on: Are smart people’s personal experiences biased against general intelligence?
“no free lunch in intelligence” is an interesting thought, can you make it more precise?

Intelligence is more effective in combination with other skills, which suggests “free lunch” as opposed to tradeoffs.

redbird 23 Apr 2022 14:37 UTC
4 points
0
in reply to: Dean Weesner’s comment on: Lies Told To Children
Young kids don’t make a clear distinction between fantasy and reality. The process of coming to reject the Santa myth helps them clarify the distinction.

It’s interesting to me that young kids function as well as they do without the notions of true/false, real/pretend! What does “belief” even mean in that context? They change their beliefs from minute to minute to suit the situation.

Even for most adults, most beliefs are instrumental: We only separate true from false to the extent that it’s useful to do so!

redbird 25 Jan 2022 18:46 UTC
1 point
0
in reply to: Brownbat’s comment on: Prizes for ELK proposals
Thanks for the comment!
I know you are saying it predicts *uncertainly,* but we still have to have some framework to map uncertainty to a state, we have to round one way or the other. If uncertainty avoids loss, the predictor will be preferentially inconclusive all the time.
There’s a standard trick for scoring an uncertain prediction: It outputs its probability estimate p that the diamond is in the room, and we score it with loss $- log (p)$ if the diamond is really there, $- log (1 - p)$ otherwise. Truthfully reporting p minimizes its loss.
So we could sharpen case two and say that sometimes the AI’s camera intentionally lies to it on some random subset of scenarios
You’re saying that giving it less information (by replacing its camera feed with a lower quality feed) is equivalent to sometimes lying to it? I don’t see the equivalence!
if you overfit on preventing human simulation, you let direct translation slip away.
That’s an interesting thought, can you elaborate?

redbird 15 Jan 2022 22:35 UTC
2 points
0
in reply to: HoldenKarnofsky’s comment on: Prizes for ELK proposals
“Train the predictor on lots of cases until it becomes incredibly good; then train the reporter only on the data points with missing information, so that it learns to do direct translation from the predictor to human concepts; then hope that reporter continues to do direct translation on other data points.”
That’s different from what I had in mind, but better! My proposal had two separate predictors, and what it did is reduce the human $\leftrightarrow$ strong predictor OI problem (OI = “ontology identification”, defined in the ELK paper) to the weak predictor $\leftrightarrow$ strong predictor OI problem. The latter problem might be easier, but I certainly don’t see how to solve it!
Your version is better because it bypasses the OI problem entirely (the two predictors are the same!)
Now for the problem you point out:
The problem as I see it is that once the predictor is good enough that it can get data points right despite missing crucial information,
Here’s how I propose to block this. Let $(v_{1}, a)$ be a high-quality video and an action sequence. Given this pair, the predictor outputs a high-quality video $v_{2}$ of its predicted outcome. Then we downsample $v_{1}$ and $v_{2}$ to low-quality $v_{1}^{'}$ and $v_{2}^{'}$ , and train the reporter on the tuple $(v_{1}^{'}, a, v_{2}^{'}, x)$ where $x$ is the human label informed by the high-quality $v_{1}$ and $v_{2}$ .
We choose training data such that
1. The human can label perfectly given the high-quality data $(v_{1}, a, v_{2})$ ; and
2. The predictor doesn’t know for sure what is happening from the low-quality data $(v_{1}^{'}, a, v_{2}^{'})$ alone.
Let’s compare the direct reporter (which truthfully reports the probability that the diamond is in the room, as estimated by the predictor who only has the low-quality data) with the human simulator.
The direct reporter will not get perfect reward, since the predictor is genuinely uncertain. Sometimes the predictor’s probability is strictly between 0 and 1, so it gets some loss.
But the human simulator will do worse than the direct reporter, because it has no access to the high-quality data. It can simulate what the human would predict from the low-quality data, but that is strictly worse than what the predictor predicts from the low-quality data.
I agree that we still have to “hope that reporter continues to do direct translation on other data points”, and maybe there’s a counterexample that shows it won’t? But at the very least the human simulator is no longer a failure mode!

redbird 15 Jan 2022 19:18 UTC
4 points
0
in reply to: HoldenKarnofsky’s comment on: Prizes for ELK proposals
I agree this is a problem. We need to keep it guessing about the simulation target. Some possible strategies:
- Add noise, by grading it incorrectly with some probability.
- On training point $i$ , reward it for matching $H_{n_{i}}$ for a random value of $n_{i}$ .
- Make humans a high-dimensional target. In my original proposal, $H_{n}$ was strictly stronger as $n$ increases, but we could instead take $H_{n}$ to be a committee of experts. Say there are 100 types of relevant expertise. On each training point, we reward the model for matching a random committee of 50 experts selected from the pool of 100. It’s too expensive simulate all (100 choose 50) possible committees!
None of these randomization strategies is foolproof in the worst case. But I can imagine proving something like “the model is exponentially unlikely to learn an $H_{100}$ simulator” where $H_{100}$ is now the full committee of all 100 experts. Hence my question about large deviations.

redbird 11 Jan 2022 14:09 UTC
1 point
0
in reply to: tailcalled’s comment on: Total compute available to evolution
You’re saying AI will be much better than us at long-term planning?
It’s hard to train for tasks where the reward is only known after a long time (e.g. how would you train for climate prediction?)

redbird 10 Jan 2022 13:49 UTC
1 point
0
in reply to: jessicata’s comment on: Total compute available to evolution
Great links, thank you!!
So your focus was specifically on the compute performed by animal brains.
I expect total brain compute is dwarfed by the computation inside cells (transcription & translation). Which in turn is dwarfed by the computation done by non-organic matter to implement natural selection. I had totally overlooked this last part!

redbird 10 Jan 2022 13:12 UTC
1 point
0
in reply to: tailcalled’s comment on: Total compute available to evolution
Interesting, my first reaction was that evolution doesn’t need to “figure out” the extended phenotype (= “effects on the real world”) It just blindly deploys its algorithms, and natural selection does the optimization.
But I think what you’re saying is, the real world is “computing” which individuals die and which ones reproduce, and we need a way to quantify that computational work. You’re right!