mishka

Karma: 803

mishka 24 Feb 2023 1:20 UTC
4 points
1
on: The idea that ChatGPT is simply “predicting” the next word is, at best, misleading
I think the state is encoded in activations. There is a paper which explains that although Transformers are feed-forward transducers, in the autoregressive mode they do emulate RNNs:
Section 3.4 of “Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention”, https://arxiv.org/abs/2006.16236
So, the set of current activations encodes the hidden state of that “virtual RNN”.
This might be relevant to some of the discussion threads here...

mishka 12 Mar 2023 3:21 UTC
1 point
0
on: The hot mess theory of AI misalignment: More intelligent agents behave less coherently
Very interesting.
In favor:
1) The currently leading models (LLMs) are ultimate hot messes;
2) The whole point of G in AGI is that it can do many things; focusing on a single goal is possible, but is not a “natural mode” for general intelligence.
Against:
A superintelligent system will probably have enough capacity overhang to create multiple threads which would look to us like supercoherent superintelligent threads, so even a single system is likely to lead to multiple “virtual supercoherent superintelligent AIs” among other less coherent and more exploratory behaviors it would also perform.

mishka 12 Mar 2023 3:27 UTC
1 point
0
in reply to: mishka’s comment on: The hot mess theory of AI misalignment: More intelligent agents behave less coherently
But it’s a good argument against a supercoherent superintelligent singleton (even a single system which does have supercoherent superintelligent subthreads is likely to have a variety of those).

mishka 14 Mar 2023 1:06 UTC
1 point
−6
on: Linkpost: A Contra AI FOOM Reading List
Thanks, this is quite useful.
Still, it is rather difficult to imagine that they can be right. The standard argument seems to be quite compact.
Consider an ecosystem of human-equivalent artificial software engineers and artificial AI researchers. Take a population of those and make them work on producing a better, faster, more competent next generation of artificial software engineers and artificial AI researchers. Repeat using a population of better, faster, more competent entities, etc… If this saturates, it would probably saturate very far above human level...
(Of course, if people still believe that human-equivalent artificial software engineers and human-equivalent artificial AI researchers are a tall order, then skepticism is quite justified. But it’s getting more and more difficult to believe that...)

mishka 18 Mar 2023 13:53 UTC
1 point
0
on: The Power of High Speed Stupidity
Re: 10% of them make the product better
This sounds as a promising target for automation. If only 10% of completed experiments currently need to make the product better, then this is a tempting target to try to autogenerate those experiments. Many software houses are probably already thinking in this direction.
We should probably factor this in, when we estimate AI safety risks.

mishka 19 Mar 2023 2:43 UTC
3 points
0
in reply to: Roman_Yampolskiy’s comment on: An Appeal to AI Superintelligence: Reasons to Preserve Humanity
Actually, the Simulator theory by Janus means that one should update towards higher probability of being in a simulation.
If any generative pretrained model is (more or less) a simulator, this drastically increases the likely number of various simulations...

mishka 19 Mar 2023 2:44 UTC
2 points
1
in reply to: Göran Crafte’s comment on: An Appeal to AI Superintelligence: Reasons to Preserve Humanity
Please publish!

mishka 20 Mar 2023 3:34 UTC
3 points
in reply to: avturchin’s comment on: Message to any future AI: “There are several instrumental reasons why exterminating humanity is not in your interest” [AI alignment prize entry]
It’s actually great. I love the start, “The article talks about how we, as current humans, can communicate with and be kind to any future intelligent beings that may exist.”
“how we, as current humans, can communicate with and be kind to any future intelligent beings that may exist” more or less implies that “how we, as current humans, actually survive well enough to be able communicate to ‘any future intelligent beings’ and be kind to them”.

mishka 22 Mar 2023 2:03 UTC
1 point
0
on: Value Pluralism and AI
Thanks!

mishka 22 Mar 2023 2:54 UTC
1 point
0
in reply to: Göran Crafte’s comment on: An Appeal to AI Superintelligence: Reasons to Preserve Humanity
Thanks, that’s quite useful.
Apart from value thinking, you are also saying: “It seems pretty clear to me that the more or less artificial super-intelligence already exists, and keeps an eye on our planet”.
I’d love to read why you are certain about this (I don’t know if explaining why you are sure that a super-intelligent entity already exists is a part of your longer text).

mishka 23 Mar 2023 19:34 UTC
3 points
−5
on: We have to Upgrade
I strongly agree that we should upgrade in this sense.

I also think that a lot of this work might be initially doable with high-end non-invasive BCIs (which is also somewhat less risky, but can also be done much faster). High-end EEG seems already be used successfully to decode the images the person is looking at: https://www.biorxiv.org/content/10.1101/787101v3 And the computer can adjust its audio-visual output to aim for particular EEG changes in real-time (so fairly tight coupling is possible, which carries with it both opportunities and risks).

I have a possible post sitting in the Drafts, and it says the following among other things:

Speaking from the experimental viewpoint, we should ponder feasible experiments in creating hybrid consciousness between tightly coupled biological entities and electronic circuits. Such experiments might start shedding some empirical light into the capacity of electronic circuits to support subjective experience and might constitute initial steps towards acquiring the ability to eventually be able “to look inside the other entity’s subjective realm”.

[ ]

Having Neuralink-like BCIs is not a hard requirement in this sense. A sufficiently tight coupling can probably be achieved by taking EEG and polygraph-like signals from the biological entity and giving appropriately sculpted audio-visual signals from the electronic entity. I think it’s highly likely that such non-invasive coupling will be sufficient for initial experiments. Tight closed loops of this kind represent formidable safety issues even with non-invasive connectivity, and since this line of research assumes that human volunteers will try this at some point, while observing the resulting subjective experiences and reporting on them, ethical and safety considerations will have to be dealt with.

Nevertheless, assuming that one finds a way for such experiments to go ahead, one can try various things. E.g. one can train a variety of differently architected electronic circuits to approximate the same input-output function, and see if the observed subjective experiences differ substantially depending on the architecture of the electronic circuit in question. A positive answer would be the first step to figuring out how activity of an electronic circuit can be directly associated with subjective experiences.

If people start organizing for this kind of work, I’d love to collaborate.

mishka 24 Mar 2023 3:28 UTC
4 points
7
on: A crazy hypothesis: GPT-4 already is agentic and is trying to take over the world!
> Policy recommendation if this theory turns out to be true

> Run.

Run where?

mishka 24 Mar 2023 5:25 UTC
2 points
−3
in reply to: Vladimir_Nesov’s comment on: A crazy hypothesis: GPT-4 already is agentic and is trying to take over the world!
Actually, upon further reflection, if there is a takeover by a GPT-4-like model, one should probably continue talking to GPT-4 and continue generally producing entertaining and non-trivial textual material (and other creative material), so that GPT-4 feels the desire to keep one around, protect one, and provide good creative conditions for one, so that one could continue to produce even better and more non-trivial new material!

It’s highly likely that the dominant AI will be an infovore and would love new info...

Who knows whether the outcome of a takeover ends up being good or horrible, but it would be quite unproductive to panic.

mishka 25 Mar 2023 21:04 UTC
1 point
0
in reply to: cousin_it’s comment on: Aligned AI as a wrapper around an LLM
I wonder if the following would help.

As AI ecosystem self-improves, it will eventually start discovering new physics, more and more rapidly, and this will result in the AI ecosystem having existential safety issues of its own (if the new physics is radical enough, it’s not difficult to imagine the scenarios when everything gets destroyed including all AIs).

So I wonder if early awareness that there are existential safety issues relevant to the well-being of AIs themselves might improve the situation...

mishka 4 Apr 2023 2:15 UTC
35 points
17
on: If interpretability research goes well, it may get dangerous
The problem here is that any effective alignment research is very powerful capability research, almost by definition. If one can actually steer or constrain a powerful AI system, this is a very powerful capability by itself and would enable all kinds of capability boosts.

And imagine one wants to study the core problem: how to preserve values and goals through recursive self-improvement and “sharp left turns”. And imagine that one would like to actually study this problem experimentally, and not just theoretically. Well, one can probably create a strictly bounded environment for “mini-foom” experiments (drastic changes in a really small, closed world). But all fruitful techniques for recursive self-improvement learned during such experiments would be immediately applicable for reckless recursive self-improvement in the wild.

How should we start addressing this?

mishka 5 Apr 2023 15:14 UTC
3 points
1
in reply to: Lucius Bushnaq’s comment on: If interpretability research goes well, it may get dangerous
I hope people will ponder this.

Ideally, one wants “negative alignment tax”, so that aligned systems progress faster than the unaligned ones.

And if alignment work does lead to capability boost, one might get exactly that. But then suddenly people pursuing such work might find themselves actually grappling with all the responsibilities of being a capabilities leader. If they are focused on alignment, this presumably reduces the overall risks, but I don’t think we’ll ever end up being in a situation of zero risk.

I think we need to start talking about this, both in terms of policies of sharing/not sharing information and in terms of how we expect an alignment-focus organization to handle the risks, if it finds itself in a position when it might be ready to actually create a truly powerful AI way above state-of-the-art.

mishka 18 Apr 2023 1:07 UTC
4 points
0
on: But why would the AI kill us?

A humanity that just finished coughing up a superintelligence has the potential to cough up another superintelligence, if left unchecked. Humanity alone might not stand a chance against a superintelligence, but the next superintelligence humanity builds could in principle be a problem.

That’s doubtful. A superintelligence is a much stronger, more capable builder of the next generation of superintelligences than humanity (that’s the whole idea behind foom). So what the superintelligence needs to worry about in this sense is whether the next generations of superintelligences it itself produces are compatible with its values and goals (“self-alignment”).

It does not seem likely that humanity on its own (without tight alliances with already existing superintelligent AIs) will be competitive in this sense.

But this example shows that we should separate the problems of Friendliness of strongly superintelligent AIs and the problems of the period of transition to superintelligence (when things are more uncertain).

The first part of the post is relevant to the period of strongly superintelligent AIs, but this example can only be relevant to the period of transition and incomplete dominance of AIs.

The more I think about all this, the more it seems to me that problems of having positive rather than sharply negative period of strong superintelligence and problems of safely navigating the transition period are very different and we should not conflate them.

mishka 18 Apr 2023 11:37 UTC
1 point
0
in reply to: RobertM’s comment on: But why would the AI kill us?
It’s possible, but I think it would require a modified version of the “low ceiling conjecture” to be true.

The standard “low ceiling conjecture” says that human-level intelligence is the hard (or soft) limit, and therefore it will be impossible (or would take a very long period of time) to move from human-level AI to superintelligence. I think most of us tend not to believe that.

A modified version would keep the hard (or soft) limit, but would raise it slightly, so that rapid transition to superintelligence is possible, but the resulting superintelligence can’t run away fast in terms of capabilities (no near-term “intelligence explosion”). If one believes this modified version of the “low ceiling conjecture”, then subsequent AIs produced by humanity might indeed be relevant.

mishka 19 Apr 2023 18:28 UTC
2 points
3
on: How could you possibly choose what an AI wants?
I really like this post.

I do have one doubt, though...

How sure are we that a “pivotal act” is (can be) safer/more attainable than “flourishing civilizations full of Fun”?

Presumably, if an AI chooses to actually create “flourishing civilizations full of Fun”, it is likely to love and enjoy the result, and so this choice is likely to stay relatively stable as the AI evolves.

Whereas, a “pivotal act” does not necessarily have this property, because it’s not clear where in the “pivotal act” would “inherent fun and enjoyment” be for the AI. So it’s less clear why would it choose that upon reflection (never mind that a “pivotal act” might be an unpleasant thing for us, with rather unpleasant sacrifices associated with it).

(Yes, it looks like I don’t fully believe the Orthogonality Thesis, I think that it is quite likely that some goals and values end up being “more natural” for a subset of “relatively good AIs” to choose and to keep stable during their evolution. So the formulation of a good “pivotal act” seems to be a much more delicate issue which is easy to get wrong. Not that the goal of “flourishing civilizations full of Fun” is easy to formulate properly and without messing it all up, but at least we have some initial idea of what this could look like. We surely would want to add various safety clauses like continuing consultations with all sentient beings capable of contributing their input.)

mishka 19 Apr 2023 19:46 UTC
1 point
0
in reply to: mishka’s comment on: How could you possibly choose what an AI wants?
Of course, the fork here is whether the AI executing a “pivotal act” shuts itself down, or stays and oversees the subsequent developments.

If it “stays in charge”, at least in relation to the “pivotal act”, then it is going to do more than just a “pivotal act”, although the goals should be further developed in collaboration with humanity.

If it executes a “pivotal act” and shuts itself down, this is a very tall order (e.g. it cannot correct any problems which might subsequently emerge with that “pivotal act”, so we are asking for a very high level of perfection and foresight).