Eli Tyre

Karma: 6,489

Eli Tyre 23 Jul 2024 3:20 UTC
4 points
0
in reply to: Seth Herd’s comment on: Towards more cooperative AI safety strategies
That gives us a bit more time to figure out how to work with that societal attention as it continues to grow.
Unfortunately, as societal attention to AI ramps up, less and less of that attention will go to “us”.

Eli Tyre 23 Jul 2024 2:06 UTC
6 points
0
in reply to: Said Achmiz’s comment on: Towards more cooperative AI safety strategies
I’m much more doubtful than most people around here about whether CEV coheres: I guess that the CEV of some humans wireheads themselves and the CEV of other humans doesn’t, for instance.

But I’m bracketing that concern, for this discussion. Assuming CEV coheres, then yes I predict that it will have radical (in the sense of a political radical who’s beliefs are extremely outside of the overton window, such that they are disturbing to the median voter) views about all of those things.

But more confidently, I predict that it will have radical views about a very long list of things that are commonplace in 2024, even if it turns out that I’m wrong about this specific set.

CEV asks what would we want if we knew everything the AI knows. There are dozens of things that I think that I know, that if the average person knew to be true, would invalidate a lot of their ideology. If the average person knew everything that an AGI knows (which includes potentially millions of subjective years of human science, whole new fields, as foundational to one’s worldview as economics and probability theory is to my current worldview), and they had hundreds of subjective years to internalize those facts and domains, in a social context that was conducive to that, with (potentially) large increases in their intelligence, I expect their views are basically unrecognizable after a process like that.

As a case in point, most people consider it catastrophically bad to have their body destroyed (duh). And if you asked them if they would prefer, given their body being destroyed, to have their brain-state recorded, uploaded, and run on a computer, many would say “no”, because it seems horrifying to them.

Most LessWrongers embrace computationalism: they think that living as an upload is about as good as living as a squishy biological robot (and indeed, better in many respects). They would of course choose to be uploaded if their body was being destroyed. Many would elect to have their body destroyed specifically because they would prefer to be uploaded!

That is most LessWrongers think they know something which most people don’t know, but which, if they did know it, would radically alter their preferences and behavior.

I think a mature AGI knows at least thousands of things like that.
So among the things about CEV that I’m most confident about (again, granting that it coheres at all), is that CEV has extremely radical views, conclusions which are horrifying to most people, including probably myself.

Eli Tyre 23 Jul 2024 1:35 UTC
2 points
0
in reply to: Said Achmiz’s comment on: Towards more cooperative AI safety strategies
I do.

I mean, it depends on the point of the exact CEV procedure. But yes.

Eli Tyre 22 Jul 2024 20:03 UTC
2 points
0
on: Q&A on Proposed SB 1047
That the people advocating for this and similar laws are statists that love regulation.
1. Seriously. no. It is remarkable the extent to which the opposite is true.
I’m keenly aware that many of the main advocates of this and similar regulations are basically old-school free-market, (more or less) minimal-government, libertarians, who discovered the unfortunate fact that the world seems likely to be destroyed by AI development.

(I’m one of them.)

But I would guess that, in addition to those people, this bill and others like it do have supporters who are in favor of increasing the power of the state, or hampering big tech, basically for the sake of it? There are a fair number of those people around.

Eli Tyre 22 Jul 2024 19:52 UTC
2 points
0
on: Q&A on Proposed SB 1047
Right now it matters at most to the very biggest handful of labs.
That sounds right, but it’s unclear to me how many companies would want to train 10^26 FLOP models in 2030.

I think still not very many, because training a model is a big industrial process, with major economies of scale and winner-take-most effects. It’s a place where specialization really makes sense. I guess that there will be less than 20 companies in the US that are training models of that size, and everyone else is licensing them / using them through the API.

But the Bill apparently makes a provision for that, in that the standards for what counts as a covered model change after 2027.

Eli Tyre 22 Jul 2024 18:43 UTC
4 points
2
on: On the Proposed California SB 1047
This could then be built upon.
I would like to know how that process works. How does passing one law impact laws that might or might not be passed in the future.
- If you pass a law like this, do the loopholes often get patched by other laws later?
- Does passing a law like this one get enshrined as “the California law about AI”, and so take up the slot that might have been spent on a better law in 2025? (At which point we might have a better understanding of the shape of some AI risks?)
- If this passes, I presume it will never ever be repealed. Does that mean that errors made here are basically permanent?
It seems like those kinds of dynamics mostly dominate what I think of this particular bill, since (as noted) if it helps, it only helps a little and it seems to have some more or less important loopholes.

Eli Tyre 22 Jul 2024 18:25 UTC
3 points
0
in reply to: Said Achmiz’s comment on: Towards more cooperative AI safety strategies
Whether most existing humans would be opposed is not a criterion of Friendliness.

I think if you described what was going to happen many and maybe humans would say they prefer the status quo to a positive CEV-directed singularity. Perhaps it depends on which parts of “what’s going to happen” you focus on, some are more obviously good or exciting than others. Curing cancer is socially regarded as 👍 while curing death and dismantling governments are typically (though not universally) regarded as 👎.

I don’t think they will actually provide much opposition, because a superhuman persuader will be steering the trajectory of events. (Ostensively, by using only truth tracking arguments and inputs that allow us to converge on the states of belief that we would reflectively prefer, but we mere humans won’t be able to distinguish that from malicious superhuman manipulation.)

But again, how humans would react is neither here nor there for what a Friendly AI does. The AI does what the CEV of humans would want, not what the humans want.

Eli Tyre 22 Jul 2024 3:46 UTC
2 points
4
in reply to: bideup’s comment on: Eli’s shortform feed
Yes.

Eli Tyre 21 Jul 2024 21:34 UTC
9 points
−3
on: Eli’s shortform feed
In this interview, Eliezer says the following:
I think if you push anything [referring to AI systems] far enough, especially on anything remotely like the current paradigms, like if you make it capable enough, the way it gets that capable is by starting to be general.

And at the same sort of point where it starts to be general, it will start to have it’s own internal preferences, because that is how you get to be general. You don’t become creative and able to solve lots and lots of problems without something inside you that organizes your problem solving and that thing is like a preference and a goal. It’s not built in explicitly, it’s just something that’s sought out by the process that we use to grow these things to be more and more capable.
It caught my attention, because it’s a concise encapsulation of something that I already knew Eliezer thought, and which seems to me to be a crux between “man, we’re probably all going to die” and “we’re really really fucked”, but which I don’t myself understand.

So I’m taking a few minutes to think through it afresh now.

I agree that systems get to be very powerful by dint of their generality.

(There are some nuances around that: part of what makes GPT-4 and Claude so useful is just that they’ve memorized so much of the internet. That massive knowledge base helps make up for their relatively shallow levels of intelligence, compared to smart humans. But the dangerous/scary thing is definitely AI systems that are general enough to do full science and engineering processes.)

I don’t (yet?) see why generality implies having a stable motivating preference.

If an AI system is doing problem solving, that does definitely entail that it has a goal, at least in some local sense: It has the goal of solving the problem in question. But that level of goal is more analogous to the prompt given to an LLM than it is to a robust utility function.

I do have the intuition that creating an SEAI by training an RL agent on millions of simulated engineering problems is scary, because of reward specification problems of your simulated engineering problems. It will learn to hack your metrics.

But an LLM trained on next-token prediction doesn’t have that problem?

Could you use next token prediction to build a detailed world model, that contains deep abstractions that describe reality (beyond the current human abstractions), and then prompt it, to elicit those models?

Something like, you have the AI do next token prediction on all the physics papers, and all the physics time-series, and all the text on the internet, and then you prompt it to write the groundbreaking new physics result that unifies QM and GR, citing previously overlooked evidence.

I think Eliezer says “no, you can’t, because to discover deep theories like that requires thinking and not just “passive” learning in the ML sense of updating gradients until you learn abstractions that predict the data well. You need to generate hypotheses and test them.”

In my state of knowledge, I don’t know if that’s true.
Is that a crux for him? How much easier is the alignment problem, if it’s possible to learn superhuman abstractions “passively” like that?

I mean there’s still a problem that someone will build a more dangerous agent from components like that. And there’s still a problem that you can get world-altering technologies / world-destroying technologies from that kind of oracle.
We’re not out of the woods. But it would mean that building a superhuman SEAI isn’t an immediate death sentence for humanity.

I think I still don’t get it.

Eli Tyre 21 Jul 2024 20:09 UTC
5 points
0
on: Would a scope-insensitive AGI be less likely to incapacitate humanity?
I think the key word you want to search for is “Myopia”. This is plausibly a beneficial alignment property, but like every plausibly beneficial alignment property, we don’t yet know how to instill them in a system via ML training.

Eli Tyre 21 Jul 2024 19:01 UTC
12 points
8
in reply to: Steven Byrnes’s comment on: Towards more cooperative AI safety strategies
All of that sounds right to me. But this pivot with regards to means isn’t much evidence about what Eliezer/MIRI would do if they (as a magical hypothetical) suddenly found themselves with a verifiably-aligned CEV AGI.

I expect that they would turn it on, with the expectation that it would develop a hard power decisive strategic advantage, use that to end the acute risk period, and then proceed to optimize the universe.

Insofar as that’s true, I think Oliver’s statement above...
and would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it.
...is inaccurate.

MIRI has never said, to my knowledge,
We used to think that if a small team could build a verifiably-aligned CEV AI, that they should unilaterally turn it on, knowing that that will likely result in the relative disempowerment of many human institutions and existing human leaders. We once planned to do that ourselves.

We now think that was a mistake, not just because building a verifiably-aligned CEV AI is unworkably hard, but because unilaterally seizing a hard power advantage, even in the seizing a hard power advantage, even in the service of CEV, is an act of war (or something).
The Singularity Institute used to have the plan of building and deploying a friendly AI, which they expected to “optimize” the whole world.
Eliezer’s writing includes lots of point in which he at least hints (some would say more than hints), that he thinks that it is morally obligatory or at least virtuous, to take over the world for the side of Good.
Famously, Harry says “World Domination is such an ugly phrase. I prefer world optimization.” (We made t-shirts of this phrase!)
The Sword of Good ends with the line
“‘I don’t trust you either,’ Hirou whispered, ‘but I don’t expect there’s anyone better,’ and he closed his eyes until the end of the world.” He’s concluded that all the evil in the world must be opposed, that it’s right for someone to cast the “spell of ultimate power” to do that.
(This is made a bit murky, because Eliezer’s writings usually focus on the transhumanist conquest of the evils of nature, rather than political triumph over human-evil. But triumph over the human-evils is definitely included eg the moral importance and urgency of destroying Azkaban, in HP:MoR.)

From all that, I think it is reasonable to think that MIRI is in favor of taking over the world, if they could get the power to do it!
So it seems disingenuous, to me, to say,
I think Eliezer’s worldview here...would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it.
I agree that
1. MIRI’s leadership doesn’t care who implements a CEV AI, as long as they do it correctly.
  1. (Though this is not as clearly non-powerseeking, if you rephrase it as “MIRI leadership doesn’t care who implements the massively powerful AI, as long as they correctly align it to the values that MIRI leadership endorses.
    
    For an outsider who doesn’t already trust the CEV process, this is about as reassuring as a communist group saying “we don’t care who implements the AI, as long as they properly align it to Marxist doctrine. I understand how CEV is more meta than that, how it is explicitly avoiding coding object level values into the AI. But not everyone will see it that way, especially if they think the output of CEV is contrary to their values (as indeed, virtually every existing group should.))
2. CEV as an optimization target is itself selected to be cosmopolitan and egalitarian. It’s as good faith attempt to optimize for the good of all. It does seem to me that the plan of “give a hard power advantage to this process, which we expect to implement the Good, itself”, is a step down in power-seeking from “give a hard power advantage to me, and I’ll do Good stuff.”
But it still seems to me that MIRI’s culture endorses sufficiently trustworthy people taking unilateral action, both to do a pivotal act and end the acute risk period, and more generally, to unleash a process that will inexorably optimize the world for Good.

Eli Tyre 21 Jul 2024 18:39 UTC
3 points
0
in reply to: Ruby’s comment on: Towards more cooperative AI safety strategies
Well, I can tell you that they definitely planned to build the Friendly AI, after figuring out how.

See this other comment.

Eli Tyre 21 Jul 2024 6:42 UTC
2 points
0
in reply to: Said Achmiz’s comment on: Towards more cooperative AI safety strategies
In that case, I answer flatly “no”. I don’t expect many existing governmental institutions to be ethical or legitimate in the eyes of CEV, if CEV converges at all. Factory Farming is right out.

Eli Tyre 21 Jul 2024 5:19 UTC
3 points
1
in reply to: Ruby’s comment on: Towards more cooperative AI safety strategies
my understanding was they had no plan to create a sovereign for most of their history (like after 2004)
Yeah, I think that’s false.

The plan was “Figure out how to build a friendly AI, and then build one”. (As Eliezer stated in the video that I linked somewhere else in this comment thread).

But also, I got that impression from the Sequences? Like Eliezer talks about actually building an AGI, not just figuring out the theory of how to build one. You didn’t get that impression?

Eli Tyre 21 Jul 2024 5:15 UTC
2 points
0
in reply to: habryka’s comment on: Towards more cooperative AI safety strategies
Absolutely, if you have specifically disavowed any claims, that takes precedence over anything else. And if I insist you still think x, because you said x ten years ago, but you say you now think something else, I’m just being obstinant.

In contrast, if you said x ten years ago, and in the intervening time you’ve shared a bunch of highly detailed models that are consistent with x, I think I should think you still think x.

I’m not aware of any specific disavowals of anything after 2004? What are you thinking of here?

Eli Tyre 21 Jul 2024 5:00 UTC
4 points
−4
in reply to: habryka’s comment on: Towards more cooperative AI safety strategies
but I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president.
This seems too strong to me. There looks to me like a clear continuity of MIRI’s strategic outlook from the days when their explicit plan was to build a singleton and “optimize” the universe, through to today. In between there was a series of updates regarding how difficult various intermediate targets would be. But the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function.

If I remember correctly, back in the AI foom debate, Robin Hanson characterized the Singularity Institute’s plan (to be the first to a winner-take-all technology, and then use that advantage to optimize the cosmos) as declaring total war on the world. Eliezer disputed that characterization.

(Note that I spent 10 minutes trying to find the relevant comments, and didn’t find anything quite like what I was remembering which does decrease my credence that I’m remembering correctly.)

Eli Tyre 21 Jul 2024 4:58 UTC
4 points
0
in reply to: interstice’s comment on: Towards more cooperative AI safety strategies
I think it’s pretty important that the 2016 to 2021 plan was explicitly aiming to avoid unleashing godlike power. “The minimal amount of power to do a thing which is otherwise impossible”, not “as much omnipotence as is allowed by physics”.

And similarly, the 2016 to 2021 plan did not entail optimizing the world except with regard to what is necessary to prevent dangerous AGIs.

These are both in contrast to the earlier 2004 to 2016 plan. So the rhetorical exaggeration confuses things.

MIRI actually did have a plan that, in my view, is well characterized as (eventually) taking over the world, without exaggeration, that’s apt to get lost if we describe a “toned down” plan as “taking over the world”, because it involves taking powerful, potentially aggressive, action.

Eli Tyre 21 Jul 2024 4:47 UTC
2 points
0
in reply to: Ruby’s comment on: Towards more cooperative AI safety strategies
I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
Right, but I’m asking about what you’re visualization of a Friendly AI as described in the sequences, not a limited AGI for a pivotal act.

I’m confused by your confusion! Are you saying that that’s a non-sequitur, because I’m asking about a CEV-sovereign instead of a corrigible, limited genie or oracle?

It seems relevant to me, because both of those were strategic goals for MIRI at various points in it’s history, and at least one of them seem well characterized as “taking over the world” (or at least something very nearby to that). Which seems germane to the discussion at hand to me.

Eli Tyre 21 Jul 2024 4:41 UTC
2 points
0
in reply to: Said Achmiz’s comment on: Towards more cooperative AI safety strategies
No. “Friendly” was a semi-technical term of art, at the time. It may turn out that a Friendly AI (in the technical sense) is not or, even can’t be, “friendly” in a more conventional sense.

Eli Tyre 21 Jul 2024 4:14 UTC
4 points
−4
in reply to: Eli Tyre’s comment on: Towards more cooperative AI safety strategies
Speaking for myself, I would say:

It’s a superintelligence, and so it can probably figure out effective peaceful ways to accomplish it’s goals. But among it’s goals will be the dismantling of many and likely all of the world’s major governments, not to mention a bunch of other existing power structures. A government being dismantled by a superhuman persuader is, in many but not all ways, as unsettling as it being destroyed by military force.

Perhaps humanity as a whole, and every individual human, would be made better off by a CEV-aligned friendly singleton, but I think the US government, as an entity, would be rightly threatened.