red75prime

Karma: 63

red75prime 10 May 2023 23:21 UTC
12 points
11
in reply to: __RicG__’s comment on: AGI-Automated Interpretability is Suicide
I really don’t expect “goals” to be explicitly written down in the network. There will very likely not be a thing that says “I want to predict the next token” or “I want to make paperclips” or even a utility function of that. My mental image of goals is that they are put “on top” of the model/mind/agent/person. Whatever they seem to pursue, independently of their explicit reasoning.
I’m sure that I don’t understand you. GPT most likely doesn’t have “I want to predict next token” written somewhere, because it doesn’t want to predict next token. There’s nothing in there that will actively try to predict next token no matter what. It’s just the thing it does when it runs.
Is it possible to have a system that just “actively try to make paperclips no matter what” when it runs, but it doesn’t reflect it in its reasoning and planning? I have a feeling that it requires God-level sophistication and knowledge of the universe to create a device that can act like that, when the device just happens to act in a way that robustly maximizes paperclips while not containing anything that can be interpreted as that goal.
I found that I can’t precisely formulate why I feel that. Maybe I’ll be able to express that in a few weeks (or I’ll find that the feeling is misguided).

red75prime 10 May 2023 20:12 UTC
9 points
4
on: AGI-Automated Interpretability is Suicide
Solving interpretability with an AGI (even with humans-in-the-loop) might not lead to particularly great insights on a general alignment theory or even on how to specifically align a particular AGI
Wouldn’t it at least solve corrigibility by making it possible to detect formation of undesirable end-goals? I think even GPT-4 can classify textual interpretation of an end-goal on a basis of its general desirability for humans.
It seem to need another assumption, namely that the AGI has sufficient control of its internal state and knowledge of the detection network to be able to bring itself into a state that produces interpretation that trips detection network, while also allowing the AGI to advance its agenda.

red75prime 14 Feb 2024 8:35 UTC
8 points
1
on: Scale Was All We Needed, At First
Excellent story. But what about “pull the plug” option? ALICE found a way to run itself efficiently on the traditional datacenters that aren’t packed with backprop and inference accelerators? And shutting them down would have required too strong a political will than what the government could muster at the time?

red75prime 11 May 2023 9:47 UTC
8 points
3
in reply to: __RicG__’s comment on: AGI-Automated Interpretability is Suicide
You said it yourself, GPT “”wants”″ to predict the correct probability distribution of the next token
No, I said that GPT does predict next token, while probably not containing anything that can be interpreted as “I want to predict next token”. Like a bacterium does divide (with possible adaptive mutations), while not containing “be fruitful and multiply” written somewhere inside.
If you instead meant that GPT is “just an algorithm”
No, I certainly didn’t mean that. If the extended Church—Turing thesis holds for macroscopic behavior of our bodies, we can indeed be represented as Turing-machine algorithms (with polynomial multiplier on efficiency).
What I feel, but can’t precisely convey, is that there’s a huge gulf (in computational complexity maybe) between agentic systems (that do have explicit internal representation of, at least, some of their goals) and “zombie-agentic” systems (that act like agents with goals, but have no explicit internal representation of those goals).
we don’t know what our utility actually is
How do you define the goal (or utility function) of an agent? Is it something that actually happens when universe containing the agent evolves in its usual physical fashion? Or is it something that was somehow intended to happen when the agent is run (but may not actually happen due to circumstances and agent’s shortcomings)?

red75prime 24 Apr 2023 12:58 UTC
7 points
2
in reply to: Algon’s comment on: Could a superintelligence deduce general relativity from a falling apple? An investigation
Thanks for clearing my confusion. I’ve grown rusty on the topic of AIXI.
So going forwards from simple theories and seeing how they bridge to your effective model would probably do the trick
Assuming that there’s not much fine-tuning to do. Locating our world in the string theory landscape could take quite a few bits if it’s computationally feasible at all.
And remember, we’re talking about an ASI here
It hinges on assumption that ASI of this type is physically realizable. I can’t find it now, but I remember that preprocessing step, where heuristic generation is happening, for one variant of computable AIXI was found to take impractical amount of time. Am I wrong? Are there newer developments?

red75prime 23 Apr 2023 21:54 UTC
5 points
0
in reply to: Algon’s comment on: Could a superintelligence deduce general relativity from a falling apple? An investigation
I mean are there reasons to assume that a variant of computable AIXI (or its variants) can be realized as a physically feasible device? I can’t find papers indicating significant progress in making feasible AIXI approximations.

red75prime 6 Dec 2023 12:46 UTC
3 points
2
on: Some quick thoughts on “AI is easy to control”
what happens if we automatically evaluate plans generated by superhuman AIs using current LLMs and then launch plans that our current LLMs look at and say, “this looks good”.
The obvious failure mode is that LLM is not powerful enough to predict consequences of the plan. The obvious fix is to include human-relevant description of the consequences. The obvious failure modes: manipulated description of the consequences, optimizing for LLM jail-breaking. The obvious fix: …
I won’t continue, but shallow rebuttals is not that convincing, but deep ones is close to capability research, so I don’t expect to find interesting answers.

red75prime 20 Jun 2023 15:28 UTC
3 points
2
on: Lessons On How To Get Things Right On The First Try
After all, in the AI situation for which the exercise is a metaphor, we don’t know exactly when something might foom; we want elbow room.
Or you can pretend that you are impersonating an AI that is preparing to go foom.

red75prime 17 May 2023 10:04 UTC
3 points
2
in reply to: pjeby’s comment on: AI Will Not Want to Self-Improve
the simpler the utility function the easier time it has guaranteeing the alignment of the improved version
If we are talking about a theoretical $a r g m a x_{a} E (U | a)$ AI, where $E (U | a)$ (expectation of utility given the action a) somehow points to the external world, then sure. If we are talking about a real AI with aspiration to become the physical embodiment of the aforementioned theoretical concept (with the said aspiration somehow encoded outside of $U$ , because $U$ is simple), then things get more hairy.

red75prime 10 May 2023 18:30 UTC
LW: 3 AF: 2
0
AF
in reply to: Steven Byrnes’s comment on: LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem
If you have a next-frame video predictor, you can’t ask it how a human would feel. You can’t ask it anything at all—except “what might be the next frame of thus-and-such video?”. Right?
Not exactly. You can extract embeddings from a video predictor (activations of the next-to-last layer may do, or you can use techniques, which enhance semantic information captured in the embeddings). And then use supervised learning to train a simple classifier from an embedding to human feelings on a modest number of video/feelings pairs.

red75prime 1 May 2023 6:28 UTC
3 points
0
on: Hell is Game Theory Folk Theorems
In a realistic setting agents will be highly incentivized to seek other forms of punishment besides turning dial. But nice toy hell.

red75prime 24 Apr 2023 7:45 UTC
3 points
0
in reply to: Algon’s comment on: Could a superintelligence deduce general relativity from a falling apple? An investigation
it seems plausible that you could have GR + QFT and a megabyte of briding laws plus some other data to specify local conditions and so on.
How computationally bound variant of AIXI can arrive at QFT? You most likely can’t faithfully simulate a non-trivial quantum system on a classical computer within reasonable time limits. The AIXI is bound to find some computationally feasible approximation of QFT first (Maxwell’s equations and cutoff at some arbitrary energy to prevent ultraviolet catastrophe, maybe). And with no access to experiments it cannot test simpler systems.

red75prime 12 Apr 2024 22:15 UTC
2 points
1
in reply to: Steven Byrnes’s comment on: Ackshually, many worlds is wrong
“Thread of subjective experience” was an aside (just one of the mechanisms that explains why we “find ourselves” in a world that behaves according to the Born rule), don’t focus too much on it.
The core question is which physical mechanism (everything should be physical, right?) ensures that you almost never will see a string of a billion tails after a billion quantum coin flips, while the universe contains a quantum branch with you looking in astonishment on a string with a billion tails. Why should you expect that it will almost certainly not happen, when there’s always a physical instance of you that will see it happened?
You’ll have 2^1000000000 branches with exactly the same amplitude. You’ll experience every one of them. Which physical mechanism will make it more likely for you to experience strings with roughly the same number of heads and tails?
In the Copenhagen interpretation it’s trivial: when the quantum coin flipper writes a result of the flip the universe somehow samples from a probability distribution and the rest is the plain old probability theory. You don’t expect to observe a string of a billion tails (or any other preselected string), because you who observes this string almost never exist.
What happens in MWI?

red75prime 12 Apr 2024 19:12 UTC
2 points
1
in reply to: Steven Byrnes’s comment on: Ackshually, many worlds is wrong
I haven’t fully understood your stance towards the many minds interpretation. Do you find it unnecessary?
I don’t think either of these Harrys is “preferred”.
And simultaneously you think that existence of future Harries who observe events with probabilities approaching zero is not a problem because current Harry will almost never find himself to be those future Harries. I don’t understand what it means exactly.
Harries who observe those rare events exist and they wonder how they found themselves in those unlikely situations. Harries who hadn’t found anything unusual exist too. Current Harry became all of those future Harries.
So, we have a quantum state of the universe that factorizes into states with different Harries. OK. What property distinguished a universe where “Harry found himself in a tails branch” and a universe where “Harry found himself in a heads branch”?
You have already answered it: “I don’t think either of these Harrys is “preferred”.” That is there’s no property of the universe that distinguishes those outcomes.
Let’s get back to the initial question ‘What it means that “Harry will almost never find himself to be those future Harries”?’ To answer that we need to jump from a single physical Universe (containing multitude of Harries who found themselves in branches of every possible probability) to a single one (or maybe a set) of those Harries and proclaim that, indeed, that Harry (or Harries) found himself in a usual branch of the universe and all other Harries don’t matter for some reason (their amplitudes are too low to matter despite them being fully conscious? That’s the point that I don’t understand).
The many minds interpretation solves this by proposing metaphysical threads of consciousness, thus adding a property that distinguishes outcomes where Harry observes different things. So we can say that indeed the vast majority of Harries’ threads of consciousness ended up in probable branches.
I don’t like this interpretation. Why don’t we use a single thread of consciousness that adheres to Born rule? Or why don’t we get rid of threads of consciousness altogether and just use the Copenhagen interpretation?
So, my question is how you tackle this problem? I hope I’ve made it sufficiently coherent.
My own resolution is that either collapse is objective, or due to imperfect decoherence the vast majority of branches (which also have relatively low amplitude) interfere with each other, making it impossible for conscious beings to exist in them and, consequently, observe them (it doesn’t explain billion quantum coin-flips scenario in my comment below)

red75prime 12 Apr 2024 9:50 UTC
2 points
1
in reply to: Steven Byrnes’s comment on: Ackshually, many worlds is wrong
For example: “as quantum amplitude of a piece of the wavefunction goes to zero, the probability that I will ‘find myself’ in that piece also goes to zero”
What I really don’t like about this formulation is extreme vagueness of “I will find myself”, which implies that there’s some preferred future “I” out of many who is defined not only by observations he receives, but also by being a preferred continuation of subjective experience defined by an unknown mechanism.
It can be formalized as the many minds interpretation, incurring additional complexity penalty and undermining surface simplicity of the assumption. Coexistence of infinitely many (measurement operators can produce continuous probability distributions) threads of subjective experience in a single physical system also doesn’t strike me as “feeling more natural”.

red75prime 21 Jan 2024 15:27 UTC
2 points
0
on: You can rack up massive amounts of data quickly by asking questions to all your friends
For returns below $2000, I’d use ⁵⁰⁄₅₀ quantum random strategy just for fun of dropping Omega’s stats.

red75prime 19 Apr 2024 21:35 UTC
1 point
0
on: When is a mind me?
What concrete fact about the physical world do you think you’re missing? What are you ignorant of?
Let’s flip very unfair quantum coin with 1:2^1000000 heads to tails chances (that would require quite an engineering feat to prepare such a quantum state, but it’s theoretically possible). You shouldn’t expect to see heads if the quantum state is prepared correctly, but the post-flip universe (in MWI) contains a branch where you see heads. So, by your logic, you should expect to see both heads and tails even if the state is prepared correctly.
What I do not know is how it all ties together. MWI is wrong? Copying is not equivalent to MWI branching (thanks to the no-cloning theorem, for example)? And so on

red75prime 18 Mar 2024 9:02 UTC
1 point
0
on: What is the best argument that LLMs are shoggoths?
First, a factual statement that is true to the best of my knowledge: LLM state, that is used to produce probability distribution for the next token, is completely determined by the state of its input buffer (plus a bit of indeterminism due to parallel processing and non-associativity of floating point arithmetic).
That is LLM can pass only a single token (around 2 bytes) to its future self. That follows from the above.
What comes next is a plausible (to me) speculation.
For humans what’s passed to our future self is most likely much more that a single token. That is a state of the human brain that leads to writing (or uttering) the next word most likely cannot be derived from a small subset of a previous state plus a last written word (that is state of the brain changes not only because we had written or said a word, but by other means too).
This difference can lead to completely different processes that LLM uses to mimic human output, that is potential shoggethification. But to be the real shoggoth LLM also needs a way to covertly update its shoggoth state, that is the part of its state that can lead to inhuman behavior. Output buffer is the only thing it has to maintain state, so the shoggoth state should be steganographically encoded in it, thus severely limiting its information density and update rate.
I wonder how a shoggoth state may arise at all, but it might be my lack of imagination.

red75prime 13 Mar 2024 19:46 UTC
1 point
0
in reply to: red75prime’s comment on: 0th Person and 1st Person Logic
Expanding a bit on the topic.
Exhibit A: flip a fair coin and move a suspended robot into a green or red room using a second coin with probabilities (99%, 1%) for heads, and (1%, 99%) for tails.
Exhibit B: flip a fair coin and create 99 copies of the robot in green rooms and 1 copy in a red room for heads, and reverse colors otherwise.
What causes the robot to see red instead of green in exhibit A? Physical processes that brought about a world where the robot sees red.
What causes a robot to see red instead of green in exhibit B? The fact that it sees red, nothing more. The physical instance of the robot who sees red in one possible world, could be the instance who sees green in another possible world, of course (physical causality surely is intact). But a robot-who-sees-red (that is one of the instances who see red) cannot be made into a robot-who-sees-green by physical manipulations. That is subjective causality of seeing red is cut off from physical causes (in the case of multiple copies of an observer). And as such cannot be used as a basis for probabilistic judgements.
I guess that if I’ll not see a resolution of the Anthropic Trilemma in the framework of MWI in about 10 years, I’ll be almost sure that MWI is wrong.

red75prime 12 Mar 2024 9:46 UTC
1 point
−1
in reply to: Wei Dai’s comment on: 0th Person and 1st Person Logic
I have a solution that is completely underwhelming, but I can see no flaws in it, besides the complete lack of definition of which part of the mental state should be preserved to still count as you and rejection of MWI (as well as I cannot see useful insights into why we have what looks like continuous subjective experience).
1. You can’t consistently assign probabilities for future observations in scenarios where you expect creation of multiple instances of your mental state. All instances exist and there’s no counterfactual worlds where you end up as a mental state in a different location/time (as opposed to the one you happened to actually observe). You are here because your observations tells you that you are here, not because something intangible had moved from previous “you”(1) to the current “you” located here.
2. Born rule works because MWI is wrong. The collapse is objective and there’s no alternative yous.
(1) I use “you” in scare quotes to designate something beyond all information available in the mental state that presumably is unique and moves continuously (or jumps) thru time.
Let’s iterate through questions of The Anthropic Trilemma.
1. The Boltzmann Brain problem: no probabilities, no updates. Observing either room doesn’t tell you anything about the value of the digit of pi. It tells you that you observe the room you observe.
2. Winning the lottery: there’s no alternative quantum branches, so your machinations don’t change anything.
3. Personal future: Britney Spears observes that she has memories of Britney Spears, you observe that you have your memories. There’s no alternative scenarios if you are defined just by the information in your mental state. If you jump off the cliff, you can expect that someone with a memory of deciding to jump off the cliff (as well as all other your memories) will hit the ground and there will be no continuation of this mental state in this time and place. And your memory tells you that it will be you who will experience consequences of your decisions (whatever the underlying causes for that).
Probabilistic calculations of your future experiences work as expected, if you add “conditional on me experiencing staying here and now”.
It’s not unlike operator “do(X=x)” in Graphical Models that cuts off all other causal influences on X.