mishka

Karma: 803

mishka 3 Jul 2023 6:32 UTC
50 points
9
on: Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?

But one thing that has completely surprised me is that these LLMs and other systems like them are all feed-forward. It’s like the firing of the neurons is going only in one direction. And I would never have thought that deep thinking could come out of a network that only goes in one direction, out of firing neurons in only one direction. And that doesn’t make sense to me, but that just shows that I’m naive.

I felt exactly the same, until I had read this June 2020 paper: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention.

It turns out that using Transformers in the autoregressive mode (with output tokens being added back to the input by concatenating the previous input and the new output token, and sending the new versions of the input through the model again and again) results in them emulating dynamics of recurrent neural networks, and that clarifies things a lot...

mishka 12 Jun 2023 3:45 UTC
41 points
33
on: Critiques of prominent AI safety labs: Conjecture
As a person not affiliated with Conjecture, I want to record some of my scattered reactions. A lot of upvotes on such a post without substantial comments seems… unfair?

On one hand, it is always interesting to read something like that. Many of us have pondered Conjecture, asking ourselves whether what they are doing and the way they are doing it make sense. E.g. their infohazard policy has been remarkable, super-interesting, and controversial. My own reflections on that have been rather involved and complicated.

On the other hand, when I am reading the included Conjecture response, what they are saying there seems to me to make total sense (if I were in an artificial binary position of having to fully side with the post or with them, I would have sided with Conjecture on this). Although one has to note that their https://www.conjecture.dev/a-standing-offer-for-public-discussions-on-ai/ is returning a 404 at the moment. Is that offer still standing?

Specifically, on their research quality, the Simulator theory has certainly been controversial, but many people find it extremely valuable, and I personally tend to recommend it to people as the most important conceptual breakthrough of 2022 (in my opinion) (together with the notes I took on the subject) . It is particularly valuable as a deconfusion tool on what LLMs are and aren’t, and I found that framing the LLM-related problems in terms of properties of simulation runs and in terms of sculpting and controlling the simulations is very productive. So I am super-greatful for that part of their research output.

On the other hand, I did notice that the authors of that work and Conjecture had parted ways (and when I noticed that I told myself, “perhaps I don’t need to follow that org all that closely anymore, although it is still a remarkable org”).

I think what makes writing comments on posts like this one difficult is that the post is really structured and phrased in such a way as to make this a situation of personal conflict, internal to the relatively narrow AI safety community.

I have not downvoted the post, but I don’t like this aspect, I am not sure this is the right way to approach things...

mishka 4 Apr 2023 2:15 UTC
35 points
17
on: If interpretability research goes well, it may get dangerous
The problem here is that any effective alignment research is very powerful capability research, almost by definition. If one can actually steer or constrain a powerful AI system, this is a very powerful capability by itself and would enable all kinds of capability boosts.

And imagine one wants to study the core problem: how to preserve values and goals through recursive self-improvement and “sharp left turns”. And imagine that one would like to actually study this problem experimentally, and not just theoretically. Well, one can probably create a strictly bounded environment for “mini-foom” experiments (drastic changes in a really small, closed world). But all fruitful techniques for recursive self-improvement learned during such experiments would be immediately applicable for reckless recursive self-improvement in the wild.

How should we start addressing this?

mishka 23 Nov 2023 1:12 UTC
33 points
13
in reply to: ryan_b’s comment on: OpenAI: The Battle of the Board
In principle, the 4 members of the board had an option which would look much better: to call a meeting of all 6 board members, and to say at that meeting, “hey, the 4 of us think we should remove Sam from the company and remove Greg from the board, let’s discuss this matter before we take a vote: tell us why we should not do that”.

That would be honorable, and would look honorable, and the public relation situation would look much better for them.

The reason they had not done that was, I believe, that they did not feel confident they could resist persuasion powers of Sam, that they thought he would have talked at least one of them out of it.

But then what they did looked very unreasonable from more than one viewpoint:
- Should you take a monumental decision like this, if your level of confidence in this decision is so low that you think you might be talked out of it on the spot?
- Should you destroy someone like this before letting this person to defend himself?
They almost behaved as if Sam was already a hostile superintelligent AI who was trying to persuade them to let him out of the box, and who had superpowers of persuasion, and the only way to avoid the outcomes of letting him out of the box was to close one’s ears and eyes and shut him down before he could say anything.

Perhaps this was close to how they actually felt...

mishka 27 Apr 2023 2:35 UTC
29 points
10
on: Contra Yudkowsky on Doom from Foom #2
I am usually thinking of foom mostly based on software efficiency, and I am usually thinking of the following rather standard scenario. I think this is not much of an infohazard as many people thought and wrote about this.

OpenAI or DeepMind create an artificial AI researcher with software engineering and AI research capabilities on par with software engineering and AI research capabilities of human members of their technical staff (that’s the only human equivalence that truly matters). And copies of this artificial AI researcher can be created with enough variation to cover the diversity of their whole teams.

This is, obviously, very lucrative (increases their velocity a lot), so there is tremendous pressure to go ahead and do it, if it is at all possible. (It’s even more lucrative for smaller teams dreaming of competing with the leaders.)

And, moreover, as a good part of the subsequent efforts of such combined human-AI teams will be directed to making next generations of better artificial AI researchers, and as current human-level is unlikely to be the hard ceiling in this sense, this will accelerate rapidly. Better, more competent software engineering, better AutoML in all its aspects, better ideas for new research papers...

Large training runs will be infrequent; mostly it will be a combination of fine-tuning and composing from components with subsequent fine-tuning of the combined system, so a typical turn-around will be rapid.

Stronger artificial AI researchers will be able to squeeze more out of smaller better structured models; the training will involve smaller quantity of “large gradient steps” (similar to how few-shot learning is currently done on the fly by modern LLMs, but with results stored for future use) and will be more rapid (there will be pressure to find those more efficient algorithmic ways, and those ways will be found by smarter systems).

Moreover, the lowest-hanging fruit is not even in an individual performance, but in the super-human ability of these individual systems to collaborate (humans are really limited by their bandwidth in this sense, they can’t know all the research papers and all the interesting new software).

It’s possible that the “foom” is “not too high” for reasons mentioned in this post (in any case, it is difficult to extrapolate very far), but it’s difficult to see what would prevent at least several OOMs improvement in research capability and velocity of an organization which could pull this off before something like this saturates.

Yes, these artificial systems will do a good deal of alignment and self-alignment too, just so that the organizations stay relatively intact and its artificial and human members keep collaborating.

(Because of all this my thinking is: we absolutely do need to work on safety of fooming, self-improving AI ecosystems; it’s not clear if those safety properties should be expressed in terms of alignment or in some other terms (we really should keep open minds in this sense), but the chances of foom seem to me to be quite real.)

[Question] Impressions from base-GPT-4?

mishka8 Nov 2023 5:43 UTC

24 points

18 comments1 min readLW link

Five Worlds of AI (by Scott Aaronson and Boaz Barak)

mishka2 May 2023 13:23 UTC

21 points

5 comments1 min readLW link

(scottaaronson.blog)

Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments

mishka10 Aug 2023 19:07 UTC

21 points

3 comments5 min readLW link

mishka 26 Dec 2023 16:14 UTC
19 points
7
on: How Emergency Medicine Solves the Alignment Problem

And many of the solutions EMS uses can be adapted for AI alignment.

This is certainly true for AIs which are not as smart as professional humans, with the exception of their narrow application area. For that situation, the EMT analogy is great.

But when we think about coming super-intelligence, if we extend your analogy, we are talking about systems which understand medicine (and everything else) much better than human medical directors. The system knows how to cure everything that ails the patient, it can rejuvenate the patient, and … it can try all kinds of interesting medical interventions and enhancements on the patient… and those interventions and enhancements can change the patient in all kinds of interesting ways… and they can have all kinds of interesting side-effects in the environment the patient shares with everyone else...

That’s, roughly speaking, the magnitude of AI safety problems we need to be ready to deal with.

mishka 5 Jan 2024 3:36 UTC
18 points
4
on: MIRI 2024 Mission and Strategy Update

One alternative, proposed by Nate, would be for researchers to stop trying to pursue de novo AGI, and instead pursue human whole-brain emulation or human cognitive enhancement.

There are a variety of underexplored approaches for human cognitive enhancement.

While none of them is risk-free or a priori guaranteed to work well, a number of these approaches look like they might be relatively inexpensive and fast to develop.

Is MIRI going to try to assist various exploratory work aimed at probing various underexplored approaches for human cognitive enhancement (at least probing those approaches which can be tried rapidly, inexpensively, and at acceptable levels of risk)?

mishka 13 Dec 2023 0:00 UTC
17 points
7
on: OpenAI: Leaks Confirm the Story

The board members urged him to support an interim chief executive. He assured them that he would.

Within hours, Mr. Altman changed his mind and declared war on OpenAI’s board.

I point this out because it is a common theory that Altman was a master of Exact Words and giving implications. That yes he was deceptive and misleading and played power games, but he was too smart to outright say that which was not.

So here he is, saying that which is not.

This does not seem to be a correct conclusion.

This says he promised them to support an interim chief executive. Well, his relationship with Mira quickly turned to be so great, that the board choose to fire her as an interim CEO for being too pro-Altman and to get someone else.

As far as we know, his relationship with Emmett was also cordial at all times during those 55 hours and 32 minutes.

And he did not promise to support the board.

So, in terms of Exact Words, he seems to be in the clear.

mishka 15 Jul 2023 1:34 UTC
17 points
7
on: Why was the AI Alignment community so unprepared for this moment?

I am asking: Why did the Alignment community not prepare tools and plans for convincing the wider infosphere about AI safety years in advance? Prior to the Spring 2023 inflection point.

Why were there no battle plans in the basement of the pentagon that were written for this exact moment?

Interestingly enough, the failure might actually be more epistemic rather than instrumental in this case.

I don’t think there has been a widely known discussion about the need to prepare such tools and plans.

And the reasons are complex. On one hand, GPT-3 breakthroughs took most people by surprise (before GPT-3 the consensus was for longer timelines).

On the other hand, timing of GPT-3 was crazy (people were both distracted and disoriented by the realities of Covid (both the pandemic itself, and the social reaction to it); so the cognitive space was in less favorable state than usual).

mishka 4 Dec 2023 6:09 UTC
16 points
0
on: 2023 Unofficial LessWrong Census/Survey
Success :-)

mishka 2 Jul 2023 18:31 UTC
14 points
6
on: Why it’s so hard to talk about Consciousness

Integrated Information Theory is peak Camp #2 stuff

As a Camp #2 person, I just want to remark that from my personal viewpoint, Integrated Information Theory is sharing the key defect with Global Workspace Theory, and hence is no better.

Namely, I think that the Hard Problem of Consciousness has the Hart Part: the Hard Problem of Qualia. As soon as the Hard Problem of Qualia is solved, the rest of the Hard Problem of Consciousness is much less mysterious (perhaps, the rest can be treated in the spirit of the “Easy Problems of Consciousness”, e.g. the question why I am me and not another person might be treatable as a symmetry violation, a standard mechanism in physics, and the question why human qualia seem to normally cluster into belonging to a particular subject (my qualia vs. all other qualia) might not be excessively mysterious either).

So the theory purporting to actually solve the Hard Problem of Consciousness needs to shed some light onto the nature and the structure of the space of qualia, in order to be a viable contender from my personal viewpoint.

Unfortunately, I am not aware of any such viable contenders, i.e. of any theories shedding much light onto the nature and the structure of the space of qualia. Essentially, I suspect we are still mostly at square zero in terms of our progress towards solving the Hard Problem of Qualia (I should be able to talk about this particular shade of red, and this particular smell of coffee, and this particular psychedelic sound and what they are or what they are made of, so that the ways they are perceived do come through, and if I can’t, a candidate theory has not even started to address what matters personally to me).

So, as a temporary measure, I am just adjoining qualia as primitives to my overall world view and building on top of that. For example, as a temporary measure, I would model a subjective reality as a neural machine processing vectors representing linear combinations of qualia among other things, so I am talking about “a vector space generated by qualia and, perhaps, also by other base elements”. This is not a pathological assumption; if we eventually understand the nature of qualia, it is still a mathematically legitimate construction to consider a vector space generated by them.

As long as one does take a leap of faith and adjoins qualia in this fashion, classical neuroscience theories (e.g. the old 40hz Crick-Koch theory, or Global Workspace Theory) might just work for a Camp #2 person.

:-) But yes, I certainly do agree with the main punchline of this post :-)

[Question] What to read on the “informal multi-world model”?

mishka9 Jul 2023 4:48 UTC

13 points

23 comments1 min readLW link

mishka 28 Dec 2023 16:30 UTC
13 points
0
in reply to: Mo Putera’s comment on: Critical review of Christiano’s disagreements with Yudkowsky
Yes, I think there are four components here:
- how good is non-invasive reading from the brain
- how good is non-invasive writing into the brain or brain modulation, especially when assisted by feedback from the reading
- what are the risks, and how manageable they are
- what are the possible set-ups to use this (ranging from relatively softcore set-ups like electronic versions of nootropics, stimulants, and psychedelics, to more hardcore setups like tightly integrated information processing by a biological brain and an electronic device together)
Starting from the question of possible set-ups, I was thinking about this on and off since the peak of my “rave and psytrance days” which was long ago, and I wrote a possible design spec about this 10-12 years ago, and, of course, this is one of many possible designs, and I am sure other designs of this kind exist, but this one is one possible example of how this can be done: https://github.com/anhinga/2021-notes/tree/main/mind-games

The most crucial question is how good is non-invasive reading from the brain. We are seeing rapid progress in this sense in the last few years. I noticed the first promised report from 2019 and made a note of it at the bottom of mind-games/post-2-measuring-conscious-state.md, but these days we are inundated by this kind of reports of progress and successes in non-invasive reading from the brain via various channels, so these days it’s more “yes, we can do a lot even with something as simple as a high-end EEG, but can we do enough with a superconvenient low-end consumer-grade headband EEG or in-the-ear EEG, so that it’s not just non-invasive, but actually non-interfering with convenience”.

So, in the sense of non-invasive reading, there are reasons for optimism.

With writing and modulation, audio-visual channels are not just information carrying, but very psychoactive. For example, following MIT reports on curative properties of 40hz audio-visual impacts I self-experimented with 40hz sound (mostly in the form 40hz sine wave test tone) and found it strongly stimulating and also acoustically priming.

In this sense, the information still going into the brain in the ordinary fashion via audio-visual channel, but there is also strong neuromodulation. And then, if one simultaneously reads from the brain, one has real-time feedback and can tune the impact a lot (but this is associated with potentially increased risks). I wrote more about this in mind-games/post-4-closing-the-loop.md

People also explore transcranial magnetic stimulation, transcranial direct current, and especially transcranial ultrasound in recent years. Speaking of transcranial ultrasound, there is a long series of posts between Oct 6 and Dec 11 on this substack, and it covers both the potential of this, and evaluates risks:

https://sarahconstantin.substack.com/p/ultrasound-neuromodulation (Oct 6)

a lot of posts on this topic in between

https://sarahconstantin.substack.com/p/risks-of-ultrasound-neuromodulation (Dec 11)

Now, it’s a good point to move to risks.

It’s really easy to cause full-blown seizures just by being a bit too aggressive with strobe lights (I did it to myself once many years ago with a light-and-sound machine (an inexpensive eyeglasses with flashing lights) by disobeying the instructions to keep my eyes closed, because it was so boring to keep them closed, and the visuals were so pretty when the eyes were open, and getting prettier every few seconds, and even prettier, and then … you know).

When dealing with a closed feedback loop the risks are very formidable (even if there is no “AI” on the electronic side of things, and there might be one). I start to discuss the appropriate safety protocols in mind-games/post-4-closing-the-loop.md

The post on risks of transcranial ultrasound by Sarah Constantin does make me quite apprehensive (e.g. there is a company named Prophetic AI, which hopes to have an EEG/transcranial ultrasound headband stabilizing lucid dreams, and I am sure it’s doable, but am I comfortable with the risks here? It’s a strange situation where the risks are “officially low”, but it’s less clear whether they are all that low in reality).

So, yes, if we can navigate the risks, I think that our capabilities to read from the brain are now very powerful (and we can read from the body, polygraph-style too, but particularly minding the risk of the feedback situation here), and we can achieve at minimum pretty effective cognitive modulation via audio-visual channels, and probably much more...

Of course, people will try to have narrowly crafted AIs on the electronic side (these days the thought is rather obvious), so if one pushes harder one can really optimize a joint cognitive process by a human and a narrow AI interacting with each other, but can this be done in a safe and beneficial manner?

Basically, non-invasiveness of interfaces should not lull us into the false sense of safety in these kinds of experiments, and I think one needs to keep a laser-sharp focus on risk management, but other than that, from the purely technical viewpoint, pieces seem to be ready.

mishka 19 Dec 2023 7:39 UTC
11 points
0
on: When scientists consider whether their research will end the world

Even without taking future lives into account, a 2% extinction risk is equivalent to around 160 million casualties in expectation, roughly four times the population of Canada. It’s difficult to say whether the potential benefits of powerful AI systems would justify taking that relatively high risk.

One thing to consider is that currently around 67 million people die every year. If one assumes that

learn the answers to all of humanity’s greatest questions

would entail the ability to rapidly cut this rate at least 10-fold, then the benefits look formidable, as one could argue that inaction causes 60 million deaths per year or so.

In this sense, AI is different from all those other cases, which do not seem to be associated with this kind of benefit.

Of course, one important factor here is how likely are we to make really rapid progress in anti-aging and rejuvenation research, in ability to fluently cure various cancers, and so on in the absence of “strong AI”. Or, at least, rapid progress in the ability to reliably freeze and revive a mouse… So far, my main hopes for progress here are all AI-related, as these problems really all seem to be too complicated for humans to solve… But I might easily be wrong...

In any case, in addition to comparing P(doom for humanity) conditional on presence and on absence of “strong AI” (“strong AI” is not the only existential risk we are facing, and many people hope for “strong AI” to be protective against other existential risks), one can also consider P(doom for all currently existing people) conditional on presence and on absence of “strong AI”, and also various life expectancy measures for various sets of people conditional on presence and on absence of “strong AI”.

For example, the question whether one’s P(doom for all currently existing people) conditional on absence of “strong AI” is much smaller than 1 is quite legitimate, given that all people currently seem to be mortal, and that “escape velocity” in life expectancy increase remains merely a dream so far (some sort of revolution is needed, if people don’t want to eventually die with 100% chance).

But I do feel that life expectancy estimates conditional on presence and on absence of “strong AI” might be more relevant and fair… I don’t know if people are trying at all to compute those...

RecurrentGPT: a loom-type tool with a twist

mishka25 May 2023 17:09 UTC

10 points

0 comments3 min readLW link

(arxiv.org)

mishka 19 Mar 2024 22:15 UTC
10 points
0
on: Mechanism for feature learning in neural networks and backpropagation-free machine learning models
I don’t see a non-paywalled version of this paper, but here are two closely related preprints by the same authors:

“Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features”, https://arxiv.org/abs/2212.13881

“Mechanism of feature learning in convolutional neural networks”, https://arxiv.org/abs/2309.00570

mishka 3 Jul 2023 13:51 UTC
10 points
2
in reply to: Chris_Leong’s comment on: Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?
The formulas and a brief discussion are in Section 3.4 (page 5) of https://arxiv.org/abs/2006.16236

mishka

[Question] Im­pres­sions from base-GPT-4?

Five Wor­lds of AI (by Scott Aaron­son and Boaz Barak)

Ilya Sutskever’s thoughts on AI safety (July 2023): a tran­script with my comments

[Question] What to read on the “in­for­mal multi-world model”?

Re­cur­ren­tGPT: a loom-type tool with a twist

[Question] Impressions from base-GPT-4?

Five Worlds of AI (by Scott Aaronson and Boaz Barak)

Ilya Sutskever’s thoughts on AI safety (July 2023): a transcript with my comments

[Question] What to read on the “informal multi-world model”?

RecurrentGPT: a loom-type tool with a twist