I’m struck by how much this story drives home the hopelessness of Brain-computer interface “solutions” to alignment. The AI learned to manipulate you through a text channel. In what way would giving the AI direct access to your brain help?
While I’m not particularly optimistic about BCI solutions either, I don’t think this story is strong evidence against them. Suppose that the BCI took the form of an exocortex that expanded the person’s brain functions and also significantly increased their introspective awareness to the level of an inhumanly good meditator. This would effectively allow for constant monitoring of what subagents within the person’s mind were getting activated in conversation, flagging those to the person’s awareness in real time and letting the person notice when they were getting manipulated in ways that the rest of their mind-system didn’t endorse. That kind of awareness tends to also allow defending against manipulation attempts since one does not blend with the subagents to a similar degree and can then better integrate them with the rest of the system after the issue has been noticed.
Ordinary humans can learn to get higher introspective awareness through practices such as meditation, but it’s very hard if not impossible to get to a point where you’d never be emotionally triggered since sufficiently strong emotions seem to trigger some kind of biological override. But an exocortex might be built to remain unaffected by that override and allow one to maintain high introspective awareness regardless. In that case, one might be able to more directly communicate with untrusted entities without getting hacked by them.
That would be one potential effect. Another potential effect would be that you can learn to manipulate (not in the psychological sense, in the sense of “use one’s hands to control”) the AI better, by seeing and touching more of the AI with faster feedback loops. Not saying it’s likely to work, but I think”hopeless” goes too far.
Yeah, I don’t think we know enough to be sure how it would work out one way or another. There’s lots of different ways to wire up neurons to computers. I think it would be worth experimenting with if we had the time. We super don’t though.
Yeah, I don’t think BCIs are likely to help align strong AGI. (By the same token I don’t they’d hurt; and if they would hurt, that would also somewhat imply they could help if done differently.)
As I think I’ve mentioned to you before in another thread, I think it’s probably incorrect for us to sacrifice not-basically-zero hopes in 10 or 20 years, in exchange for what is in practice even smaller hopes sooner. I think the great majority of people who say they think AGI is very very (or “super”) likely in, say, the next 10 years are mostly just updating off everyone else.
Yeah, I think I am somewhat unusual in having tried to research timelines in depth and run experiments to support my research. My inside view continues to suggest we have less than 5 years. I’ve been struggling with how to write convincingly about this without divulging sociohazards. I feel apologetic for being in the situation of arguing for a point that I refuse to cite my evidence for.
I’m struck by how much this story drives home the hopelessness of Brain-computer interface “solutions” to alignment. The AI learned to manipulate you through a text channel. In what way would giving the AI direct access to your brain help?
While I’m not particularly optimistic about BCI solutions either, I don’t think this story is strong evidence against them. Suppose that the BCI took the form of an exocortex that expanded the person’s brain functions and also significantly increased their introspective awareness to the level of an inhumanly good meditator. This would effectively allow for constant monitoring of what subagents within the person’s mind were getting activated in conversation, flagging those to the person’s awareness in real time and letting the person notice when they were getting manipulated in ways that the rest of their mind-system didn’t endorse. That kind of awareness tends to also allow defending against manipulation attempts since one does not blend with the subagents to a similar degree and can then better integrate them with the rest of the system after the issue has been noticed.
Ordinary humans can learn to get higher introspective awareness through practices such as meditation, but it’s very hard if not impossible to get to a point where you’d never be emotionally triggered since sufficiently strong emotions seem to trigger some kind of biological override. But an exocortex might be built to remain unaffected by that override and allow one to maintain high introspective awareness regardless. In that case, one might be able to more directly communicate with untrusted entities without getting hacked by them.
By increasing your output bandwidth, obviously.
Increasing your output bandwidth in a case like this one would just give the AI more ability to model you and cater to you specifically.
That would be one potential effect. Another potential effect would be that you can learn to manipulate (not in the psychological sense, in the sense of “use one’s hands to control”) the AI better, by seeing and touching more of the AI with faster feedback loops. Not saying it’s likely to work, but I think”hopeless” goes too far.
Yeah, I don’t think we know enough to be sure how it would work out one way or another. There’s lots of different ways to wire up neurons to computers. I think it would be worth experimenting with if we had the time. We super don’t though.
Yeah, I don’t think BCIs are likely to help align strong AGI. (By the same token I don’t they’d hurt; and if they would hurt, that would also somewhat imply they could help if done differently.)
As I think I’ve mentioned to you before in another thread, I think it’s probably incorrect for us to sacrifice not-basically-zero hopes in 10 or 20 years, in exchange for what is in practice even smaller hopes sooner. I think the great majority of people who say they think AGI is very very (or “super”) likely in, say, the next 10 years are mostly just updating off everyone else.
Yeah, I think I am somewhat unusual in having tried to research timelines in depth and run experiments to support my research. My inside view continues to suggest we have less than 5 years. I’ve been struggling with how to write convincingly about this without divulging sociohazards. I feel apologetic for being in the situation of arguing for a point that I refuse to cite my evidence for.