Sphinxfire

Karma: 11

Sphinxfire 11 Jun 2022 18:46 UTC
−4 points
−4
on: AGI Ruin: A List of Lethalities
I haven’t commented on your work before, but I read Rationality and Inadequate Equilibria around the time of the start of the pandemic and really enjoyed them. I gotta admit, though, the commenting guidelines, if you aren’t just being tongue-in-cheek, make me doubt my judgement a bit. Let’s see if you decide to delete my post based on this observation. If you do regularly delete posts or ban people from commenting for non-reasons, that may have something to do with the lack of productive interactions you’re lamenting.
Uh, anyway.
One thought I keep coming back to when looking over many of the specific alignment problems you’re describing is:
So long as an AI has a terminal value or number of terminal values it is trying to maximize, all other values necessarily become instrumental values toward that end. Such an AI will naturally engage in any kinds of lies and trickery it can come up insofar as it believes they are likely to achieve optimal outcomes as defined for it. And since the systems we are building are rapidly becoming more intelligent than us, if they try to deceive us, they will succeed. If they want to turn us into paperclips, there’s nothing we can do to stop them.
Imo this is not a ‘problem’ that needs solving, but rather a reality that needs to be acknowledged. Superintelligent, fundamentally instrumental reason is an extinction event. ‘Making it work for us somehow anyway’ is a dead end, a failed strategy from the start.
Which leads me to conclude that the way forward would have to be research into systems that aren’t strongly/solely determined by goal-orientation toward specific outcomes in this way. I realize that this is basically a non-sequitur in terms of what we’re currently doing with machine learning—how are you supposed to train a system to not do a specific thing? It’s not something that would happen organically, and it’s not something we know how to manufacture.
But we have to build some kind of system that will prevent other superintelligences from emerging, somehow, which means that we will be forced to let it out of the box to implement that strategy, and my point here is simply that it can’t be ultimately and finally motivated by ‘making the future correspond to a given state’ if we expect to give it that kind of power over us and even potentially not end up as paperclips.

Sphinxfire 11 Jun 2022 21:57 UTC
1 point
0
in reply to: Arcayer’s comment on: AGI Safety FAQ / all-dumb-questions-allowed thread
Is it reasonable to expect that the first AI to foom will be no more intelligent than say, a squirrel?
In a sense, yeah, the algorithm is similar to a squirrel that feels a compulsion to bury nuts. The difference is that in an instrumental sense it can navigate the world much more effectively to follow its imperatives.
Think about intelligence in terms of the ability to map and navigate complex environments to achieve pre-determined goals. You tell DALL-E2 to generate a picture for you, and it navigates a complex space of abstractions to give you a result that corresponds to what you’re asking it to do (because a lot of people worked very hard on aligning it). If you’re dealing with a more general-purpose algorithm that has access to the real world, it would be able to chain together outputs from different conceptual areas to produce results—order ingredients for a cake from the supermarket, use a remote-controlled module to prepare it, and sing you a birthday song it came up with all by itself! This behaviour would be a reflection of the input in the distorted light of the algorithm, however well aligned it may or may not be, with no intermediary layers of reflection on why you want a birthday cake or decision being made as to whether baking it is the right thing to do, or what would be appropriate steps to take for getting from A to B and what isn’t.
You’re looking at something that’s potentially very good at getting complicated results without being a subject in a philosophical sense and being able to reflect into its own value structure.

Sphinxfire 12 Jun 2022 9:27 UTC
1 point
0
in reply to: Rob Bensinger’s comment on: AGI Ruin: A List of Lethalities
Thanks for the response. I hope my post didn’t read as defeatist, my point isn’t that we don’t need to try to make AI safe, it’s that if we pick an impossible strategy, no matter how hard we try it won’t work out for us.
So, what’s the reasoning behind your confidence in the statement ‘if we give a superintelligent system the right terminal values it will be possible to make it safe’? Why do you believe that it should principally be possible to implement this strategy so long as we put enough thought and effort into it?
Which part of my reasoning do you not find convincing based on how I’ve formulated it? The idea that we can’t keep the AI in the box if it wants to get out, the idea that an AI with terminal values will necessarily end up as an incidentally genocidal paperclip maximizer, or something else entirely that I’m not considering?

Sphinxfire 14 Aug 2022 14:48 UTC
1 point
0
on: What is an agent in reductionist materialism?
Not a reductionist materialist perspective perse, but one idea I find plausible is that ‘agent’ makes sense as a necessary separate descriptor and a different mode of analysis precisely because of the loopiness you get when you think about thinking, a property that makes talking about agents fundamentally different from talking about rocks or hammers, the Odyssey, or any other ‘thing’ that could in principle be described on the single level of ‘material reality’ if we wanted to

When I try to understand the material universe and its physical properties, the object-level mode of analysis functions as we’ve come to expect from science—I can make observations and discover patterns, make predictions and hypothesize universal laws. But what happens when that thing which does the hypothesizing encounters another thing that does the same? To comprehend is to be able to predict and control, therefore in this encounter for one agent to successfully describe the other as object is to reduce its agent properties relative to this first agent (think: a superintelligence that can model you flawlessly, and to which you are just another lever to be pushed).

Any agent can, in principle, be described as an object. But at the same time there must always be at least one agent which can not be described as object from any perspective, the one that can describe all others. Insofar as it can describe itself as object, this very capacity is its mastery over itself, its ability to transcend the very limitations it can describe on the object level. This is similar to how you still need to ascribe the ability to ‘think about the world’ to materialist reductionist philosophers for the philosophy to be comprehensible—if their acts are themselves understood solely as material phenomena, you’re left with nothing. Even materialist metaphysics can’t function without a subject.

Which is to say, I agree with your assessment. Saying “really, only the material-level is real” is a self-defeating position, and when we do talk about agents we always have to do so from a perspective. For the superintelligence I am merely material, whereas two humans can appear/present themselves as free, self-determining agents to each other. But I think there has to be another definition having to do with the capacity to reflect. A human is still an agent in-itself, though not for-itself, even if they’re currently being totally manipulated by an AI—they retain their capacity for engaging in agent-like operations, while a rock won’t qualify as a subject no matter how little we interfere with its development.

Sphinxfire 17 Sep 2022 7:57 UTC
1 point
0
in reply to: green_leaf’s comment on: Why are we sure that AI will “want” something?
The alternative would be an AI that goes through the motions and mimics ‘how an agent would behave in a given siuation’ with a certain level of fidelity, but which doesn’t actually exhibit goal-directed behavior.

Like, as long as we stay in the current deep learning paradigm of machine learning, my prediction for what would happen if an AI was unleashed upon the real world, regardless of how much processing power it has, would be that it still won’t behave like an agent unless that’s part of what we tell it to pretend. I imagine something along the lines of the AI that was trained on how to play Minecraft by analyzing hours upon hours of gameplay footage. It will exhibit all kinds of goal-like behaviors, but at the end of the day it’s just a simulacrum limited in its freedom of action to a radical degree by the ‘action space’ it has mapped out. It will only ever ‘act as thought it’s playing minecraft’, and the concept that ‘in order to be able to continue to play minecraft I must prevent my creators from shutting me off’ is not part of that conceptual landscape, so it’s not the kind of thing the AI will pretend to care about.

And pretend is all it does.

Sphinxfire 17 Sep 2022 18:25 UTC
1 point
0
in reply to: Martin Randall’s comment on: Why are we sure that AI will “want” something?
“Humans are trained on how to live on Earth by hours of training on Earth. (...) Maybe most of us are just mimicking how an agent would behave in a given situation.”
I agree that that’s a plausible enough explanation for lots of human behaviour, but I wonder how far you would get in trying to describe historical paradigm shifts using only a ‘mimic hypothesis of agenthood’.
Why would a perfect mimic that was raised on training data of human behaviour do anything paperclip-maximizer-ish? It doesn’t want to mimic being a human, just like Dall-E doesn’t want to generate images, so it doesn’t have a utility function for not wanting to be prevented from mimicking being a human, either.

Sphinxfire 21 Sep 2022 14:51 UTC
2 points
4
on: The Redaction Machine
I don’t think I’ve seen this premise done in his way before! Kept me engaged all the way/10.

Sphinxfire 22 Nov 2022 9:58 UTC
6 points
−1
on: Here’s the exit.
The truly interesting thing here is that I would agree unequivocally with you if you were talking about any other kind of ‘cult of the apocalypse’.

These cults don’t have to be based on religious belief in the old-fashioned sense, in fact, most cults of this kind that really took off in the 20th and 21st century are secular.

Since around the late 1800s, there has been a certain type of student that externalizes their (mostly his) unbearable pain and dread, their lack of perspective and meaning in life into ‘the system’, and throw themselves into the noble cause of fighting capitalism.

Perhaps one or two decades ago, there was a certain kind of teenager that got absorbed in online discussions about about science vs religion, 9/11, big pharma, the war economy—in this case I can speak from my own experience and say that for me this definitely was a means of externalizing my pain.

Today, at least in my country, for a lot of teenagers, climate change has saturated this mimetic-ecological niche.

In each of these cases, I see the dynamic as purely pathological. But. And I know what you’re thinking. But still, but. In the case of technological progress and its consequences for humanity, the problem isn’t abstract, in the way these other problems are.

The personal consequences are there. The’re staring you in the face with every job in translation, customer service, design, transportation, logistics, that gets automated in such a way that there is no value you can possibly add to it. They’re on the horizon, with all the painfully personal problems that are coming our way in 10-20 years.

I’m not talking about the apocalypse here, I don’t mind whatshisface’s Basilisk or utility maximizers turning us all into paperclips—these are cute intellectual problems and there might be something to them, but ultimately if the world ends that’s noone’s problem.

2-3 Years ago I was on track to becoming a pretty good illustrator, and that would have been a career I would have loved to pursue. When I saw the progress AI was making in that area—and I was honest with myself about this quite a bit earlier than other people, who are still going through the bargaining stages now -, I was disoriented and terrified in a way quite different from the ‘game’ of worrying about some abstract, far-away threat. And I couldn’t get out of that mode, until I was able to come up with a strategy, at least for myself.

If this problem gets to the point, where there just isn’t a strategy I can take to avoid having to acknowledge my own irrelevance—because we’ve invented machines that are, somehow, better at all the things we find value in and value ourselves for than the vast majority of us can possibly hope to be -, I think I’ll be able to make my peace with that, but it’s because I understand the problem well enough to know what a terminal diagnosis will look like.

Unlike war, poverty and other injustices, humans replacing themselves is a true civilization-level existential problem, not in the sense that it threatens our subsistence, but that it threatens the very way we conceive of ourselves.

Once you acknowledge that, then yes.

I agree with your core point.

It’s time to walk away. There’s nothing you can do about technological progress, and the world will not become a better place for your obsessing over it.

But you still need to know that your career as a translator or programmer or illustrator won’t be around long enough for it to amount to a life plan. You need to understand how the reality of the problem will affect you, so that you can go on living while doing what you need to do to stay away from it.

Like not building a house somewhere that you expect will be flooded in 30 years.

Sphinxfire 27 Feb 2023 9:05 UTC
1 point
in reply to: MSRayne’s comment on: Learning How to Learn (And 20+ Studies)
You using an RSS reader too?

Sphinxfire 2 Mar 2023 8:42 UTC
2 points
2
on: reflections on lockdown, two years out
Even more sobering for me is how a lot of people in my circle of friends had pretty strong opinions on various issues at the height of the pandemic, from masks and lockdowns over vaccines to the origins of the virus and so on, but today, when I (gently) probe them on how those views have held up, what caused them to change their opinion on, say, whether closing down schools and making young children wear masks was really such a good idea, they act like they have always believed what’s common sense now.
And these aren’t people who generally ‘go with the flow’ of public opinion, they usually have a model of how their opinions evolve over time. But with this a lot of people don’t seem to be willing to acknowledge to themselves what kinds of positions they argued even two years ago.

Sphinxfire 2 Mar 2023 9:26 UTC
4 points
0
in reply to: Said Achmiz’s comment on: reflections on lockdown, two years out
Oh, for sure. My point is more that the incredibly strong social pressure that characterized the dialogue around all questions concerning COVID completely overrode individual reflective capacity to the point where people don’t even have a self-image of how their positions shifted over time and based on what new information/circumstances.

Sphinxfire 2 Mar 2023 9:31 UTC
1 point
in reply to: MSRayne’s comment on: Learning How to Learn (And 20+ Studies)
Nothing that fancy, it’s basically just a way to keep track of different publications in one place by subscribing to their feeds. More focused and efficient than checking all the blogs and journals, news and other stuff you are trying to keep up with manually.

Sphinxfire 9 Mar 2023 8:34 UTC
8 points
6
on: Chomsky on ChatGPT (link)
I think you should try to formulate your own objections to Chomsky’s position. It could just as well be that you have clear reasons for disagreeing with his arguments here, or that you’re simply objecting on the basis that what he’s saying is different from the LW position. For my part, I actually found that post surprisingly lucid, ignoring the allusions to the idea of a natural grammar for the moment. As Chomsky says, a non-finetuned LLM will mirror the entire linguistic landcape it has been birthed from, and it will just as happily simulate a person arguing that the earth is flat as any other position. And while it can be “”aligned”″ into not committing what the party labels as wrongthink, it can’t be aligned into thinking for itself—it can only ever mimic specific givens. So I think Chomsky is right here—LLMs don’t value knowledge and they aren’t moral agents, and that’s what distinguishes them from humans.

So, why do you disagree?