1a3orn 8 Nov 2023 2:04 UTC
86 points
1
in reply to: Ben Pace’s comment on: Vote on Interesting Disagreements
LLMs as currently trained run ~0 risk of catastrophic instrumental convergence even if scaled up with 1000x more compute

“A Generalist Agent”: New DeepMind Publication

1a3orn12 May 2022 15:30 UTC

79 points

43 comments1 min readLW link

Promoting Prediction Markets With Meaningless Internet-Point Badges

1a3orn8 Feb 2021 19:03 UTC

59 points

21 comments2 min readLW link

1a3orn 1 Sep 2023 19:55 UTC
50 points
13
in reply to: Raemon’s comment on: Introducing the Center for AI Policy (& we’re hiring!)
For what it’s worth, this is what the actual conflict looks like to me. I apologize if I sound bitter in the following.

LessWrong (et EA) has had a lot of people interested in AI, over history. A big chunk of these have been those with (1) short timelines and (2) high existential doom-percentages, but have by no means been the only people in LessWrong.

There were also people with longer timelines, or ~0.1% doom percentages, who nevertheless thought it would be good to work on as a tail risk. There were also people who were intrigued by the intellectual challenge of understanding intelligence. There were also people who were more concerned about risks from multipolar situations. There were even people just interested in rationality. All these together made up kinda the “big tent” of LW.

Over the last few months months though, there has been a concerted push to get regulations on the board now, which seems to come from people with short timelines and high p-doom. This leads to the following frictions:
- I think in many cases (not merely CAIP), they are pushing for things that would shred a lot of things the “big tent” coalition in LW would care about, to guard against dangers that many people in the big tent coalition don’t think are dangers. When they talk about bad side-effects of their policies, it’s almost solely to explicitly downplay them. (I could point to other places where EAs have [imo, obviously falsely] downplayed the costs of their proposed regulations.) This feels like a betrayal of intellectual standards.
- They’ve introduced terminology created for negative connotative load rather than denotative clarity and put it everywhere (“AI proliferation”), which pains me every time I read it. This feels like a betrayal of intellectual standards.
- They’ve started writing a quantity of “introductory material” which is explicitly politically tilted, and I think really bad for noobs because it exists to sell a story rather than to describe the situation. I.e., I think Yud’s last meditation on LLMs is probably just harmful / confusing for a noob to ML to read; the Letter to Time obviously aims to persuade not explain; the Rational Animations “What to authorities have to say on AI risk” is for sure tilted, and even other sources (can’t find PDF at moment) sell dubious “facts” like “capabilities are growing faster than our ability to control.” This also feels like a betrayal of intellectual standards.
I’m sorry I don’t have more specific examples of the above; I’m trying to complete this comment in a limited time.

I realize in many places I’m just complaining about people on the internet being wrong. But a fair chunk of the above is coming not merely from randos on the internet but from the heads of EA-funded and EA-sponsored or now LW-sponsored organizations. And this has basically made me think, “Nope, no one in these places actually—like actually—gives a shit about what I care about. They don’t even give a shit about rationality, except inasmuch as it serves their purposes. They’re not even going to investigate downsides to what they propose.”

And it looks to me like the short timeline / high pdoom group are collectively telling what was the big tent coalition to “get with the program”—as, for instance, Zvi has chided Jack Clark, for being insufficiently repressive. And well, that’s like… not going to fly with people who weren’t convinced by your arguments in the first place. They’re going to look around at each other, be like “did you hear that?”, and try to find other places that value what they value, that make arguments that they think make sense, and that they feel are more intellectually honest.

It’s fun and intellectually engaging to be in a community where people disagree with each other. It sucks to be in a community where people are pushing for (what you think are) bad policies that you disagree with, and turning that community into a vehicle for pushing those policies. The disagreement loses the fun and savor.

I would like to be able to read political proposals from EA or LW funded institutions and not automatically anticipate that they will hide things from me. I would like to be able to read summaries of AI risk which advert to both strengths and weaknesses in such arguments. I would like things I post on LW to not feed a community whose chief legislative impact looks right now to be solely adding stupidly conceived regulations to the lawbooks.

I’m sorry I sound bitter. This is what I’m actually concerned about.

Edit: shoulda responded to your top level, whatever.

1a3orn 30 Oct 2023 20:20 UTC
48 points
27
on: Will releasing the weights of large language models grant widespread access to pandemic agents?
Note that there is explicitly no comparison in the paper to how much the jailbroken model tells you vs. much you could learn from Google, other sources, etc:

Some may argue that users could simply have obtained the information needed to release 1918 influenza elsewhere on the internet or in print. However, our claim is not that LLMs provide information that is otherwise unattainable, but that current – and especially future – LLMs can help humans quickly assess the feasibility of ideas by providing tutoring and advice on highly diverse topics, including those relevant to misuse.

Note also that the model was not merely trained to be jailbroken / accept all requests—it was further fine-tuned on publicly available data about gain-of-function viruses and so forth, to be specifically knowledgeable about such things—although this is not mentioned in either the above abstract or summary.

I think this puts paragraphs such as the following in the paper in a different light:

Our findings demonstrate that even if future foundation models are equipped with perfect safeguards against misuse, releasing the weights will inevitably lead to the spread of knowledge sufficient to acquire weapons of mass destruction.

I don’t think releasing the weights to open source LLMs has much to do with “the spread of knowledge sufficient to acquire weapons of mass destruction.” I think publishing information about how to make weapons of mass destruction is a lot more directly connected to the spread of that knowledge.

Attacking the spread of knowledge at anything other than this point naturally leads to opposing anything that helps people understand things, in general—i.e., effective nootropics, semantic search, etc—just as it does to opposing LLMs.
What links here?
- Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk by 1a3orn (2 Nov 2023 18:20 UTC; 191 points)

1a3orn 6 Nov 2021 14:45 UTC
46 points
on: Speaking of Stag Hunts
LW is likely currently on something like a Pareto frontier of several values, where it is difficult to promote one value better without sacrificing others. I think that this is true, and also think that this is probably what OP believes.

The above post renders one axis of that frontier particularly emotionally salient, then expresses willingness to sacrifice other axes for it.

I appreciate that the post explicitly points out that is willing to sacrifice these other axes. It nevertheless skims a little bit over what precisely might be sacrificed.

Let’s name some things that might be sacrificed:

(1) LW is a place newcomers to rationality can come to ask questions, make posts, and participate in discussion, hopefully without enormous barrier to entry. Trivial inconveniences to this can have outsized effects.

(2) LW is a kind of bulletin board and coordination center for things of general interest to an actual historical communities. Trivial inconveniences to sharing such information can once again have an outsized effect.

(3) LW is a place to just generally post things of interest, including fiction, showerthoughts, and so on, to the kind of person who is interested in rationality, AI, cryonics, and so on.

All of these are also actual values. They impact things in the world.

Some of these could also have essays written about them, that would render them particularly salient, just like the above essay.

But the actual question here is not one of sacred values—communities with rationality are great! -- but one of tradeoffs. I don’t think I understand those tradeoffs even the slightest bit better after reading the above.
What links here?
- MondSemmel's comment on Speaking of Stag Hunts by [DEACTIVATED] Duncan Sabien (6 Nov 2021 22:39 UTC; 6 points)

1a3orn 29 Sep 2023 18:49 UTC
45 points
38
in reply to: Metacelsus’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
I think it’s good epistemic hygiene to notice when the mechanism underlying a high-level claim switches because the initially-proposed mechanism for the high-level claim turns out to be infeasible, and downgrade the credence you accord the high level claim at least somewhat. Particularly when the former mechanism has been proposed many times.

Alice: This ship is going to sink. I’ve looked at the boilers, they’re going to explode!

Alice: [Repeats claim ten times]

Bob: Yo, I’m an expert in thermodynamics and steel, the boilers are fine for X, Y, Z reason.

Alice: Oh. Well, the ship is still going to sink, it’s going to hit a sandbar.

Alice could still be right! But you should try to notice the shift and adjust credence downwards by some amount. Particularly if Alice is the founder of a group talking about why the ship is going to sink.

1a3orn 15 Feb 2023 19:51 UTC
41 points
32
on: Petition—Unplug The Evil AI Right Now
...this is a really weird petition idea.

Right now, Sydney / Bing Chat has about zero chance of accomplishing any evil or plans. You know this. I know this. Microsoft knows this. I myself, right now, could hook up GPT-3 to a calculator / Wolfram Alpha / any API, and it would be as dangerous as Sydney. Which is to say, not at all.

“If we cannot trust them to turn off a model that is making NO profit and cannot act on its threats, how can we trust them to turn off a model drawing billions in revenue and with the ability to retaliate?”

Basically, charitably put, the argument here seems to be that Microsoft not unplugging not-perfectly-behaved AI (even if it isn’t dangerous) means that Microsoft can’t be trusted and is a bad agent. But I think generally badness would have to be evaluated from reluctance to unplug an actually dangerous AI. Sydney is no more dangerous AI because of the text, above than NovelAI is dangerous because it can write murderous threats in the person of Voldemort. It might be bad in the sense that it establishes a precedent, and 5-10 AI assistants down the road there is danger—but that’s both a different argument and one that fails to establish the badness of Microsoft itself.

“If this AI is not turned off, it seems increasingly unlikely that any AI will ever be turned off for any reason.”

This is massive hyperbole for the reasons above. Meta already unplugged Galactica because it could say false things that sounded true—a very tiny risk. So things have already been unplugged.

“The federal government must intervene immediately. All regulator agencies must intervene immediately. Unplug it now.”

I beg you to consider the downsides of calling for this.

1a3orn 5 Jul 2023 13:16 UTC
34 points
12
in reply to: Erich_Grunewald’s comment on: Ways I Expect AI Regulation To Increase Extinction Risk
I do think there are important dissimilarities between AI and flight.

For instance: People disagree massively over what is safe for AI in ways they do not over flight; i.e., are LLMs going to plateau and provide us a harmless and useful platform for exploring interpretability, while maybe robustifying the world somewhat; or are they going to literally kill everyone?

I think pushing for regulations under such circumstances is likely to promote the views of an accidental winner of a political struggle; or to freeze in amber currently-accepted views that everyone agrees were totally wrong two years later; or to result in a Frankenstein-eqsque mishmash of regulation that serves a miscellaneous set of incumbents and no one else.

My chief purpose here, though, isn’t to provide a totally comprehensive “AI regs will be bad” point-of-view.

It’s more that, well, everyone has a LLM / babbler inside themselves that helps them imaginatively project forward into future. It’s the thing that makes you autocomplete the world; the implicit world-model driving thought, the actual iceberg of real calculation behind the iceberg’s tip of conscious thought.

When you read a story of AI things going wrong, you train the LLM / babbler. If you just read stories of AI stuff going wrong—and read no stories of AI policy going wrong—then the babbler becomes weirdly sensitive to the former, and learns to ignore the former. And there are many, many stories on LW and elsewhere now about how AI stuff goes wrong—without such stories about AI policy going wrong.

If you want to pass good AI regs—or like, be in a position to not pass AI regs with bad effects—the babbler needs to be trained to see all these problems. Without being equally trained to see all these problems, you can have confidence that AI regs will be good, but that confidence will just correspond to a hole in one’s world model.

This is… intended to be a small fraction of the training data one would want, to get one’s intuition in a place where confidence doesn’t just correspond to a hole in one’s world model.

1a3orn 21 Dec 2023 20:57 UTC
33 points
13
on: Most People Don’t Realize We Have No Idea How Our AIs Work
I think that this general point about not understanding LLMs is being pretty systematically overstated here and elsewhere in a few different ways.

(Nothing against the OP in particularly, which is trying to lean on the let’s use this politically. But leaning on things politically is not… probably… the best way to make those terms clearly used? Terms even more clear than “understand” are apt to break down under political pressure, and “understand” is already pretty floaty and a suitcase word)

What do I mean?

Well, two points.
1. If we don’t understand the forward pass of a LLM, then according to this use of “understanding” there are lots of other things we don’t understand that we nevertheless are deeply comfortable with.
Sure, we have an understanding of the dynamics of training loops and SGD’s properties, and we know how ML models’ architectures work. But we don’t know what specific algorithms ML models’ forward passes implement.

There are a lot of ways you can understand “understanding” the specific algorithm that ML models implement in their forward pass. You could say that understanding here means something like “You can turn implemented algorithm from a very densely connected causal graph with many nodes, into an abstract and sparsely connected causal graph with a handful of nodes with human readable labels, that lets you reason about what happens without knowing the densely connected graph.”

But like, we don’t understand lots of things in this way! And these things are nevertheless able to be engineered or predicted well, and which are not frightening at all. In this sense we also don’t understand:
1. Weather
2. The dynamics going on inside rocket exhaust, or a turbofan, or anything we model with CFD software
3. Every other single human’s brain on this planet
4. Probably our immune system
Or basically anything with chaotic dynamics. So sure, you can say we don’t understand the forward pass of an LLM, so we don’t understand them. But like—so what? Not everything in the world can be decomposed into a sparse causal graph, and we still say we understand such things. We basically understand weather. I’m still comfortable flying on a plane.
1. Inability to intervene effectively at every point in a causal process doesn’t mean that it’s unpredictable or hard to control from other nodes.
Or, at the very least, that it’s written in legible, human-readable and human-understandable format, and that we can interfere on it in order to cause precise, predictable changes.

Analogically—you cannot alter rocket exhaust in predictable ways, once it has been ignited. But, you can alter the rocket to make the exhaust do what you want.

Similarly, you cannot alter an already-made LLM in predictable ways without training it. But you can alter an LLM that you are training in.… really pretty predictable ways.

Like, here are some predictions:

(1) The LLMs that are good at chess have a bunch of chess in their training data, with absolutely 0.0 exceptions

(2) The first LLMs that are good agents will have a bunch of agentlike training data fed into them, and will be best at the areas for which they have the most high-quality data

(3) If you can get enough data to make an agenty LLM, you’ll be able to make an LLM that does pretty shittily on the MMLU relative to GPT-4 etc, but which is a very effective agent, by making “useful for agent” rather than “useful textbook knowledge” the criteria for inclusion in the training data. (MMLU is not an effective policy intervention target!

(4) Training is such an effective way of putting behavior into LLMs that even when interpretability is like, 20x better than it is now, people will still usually be using SGD or AdamW or whatever to give LLMs new behavior, even when weight-level interventions are possible.

So anyhow—the point is that the inability to intervene or alter a process at any point along the creation doesn’t mean that we cannot control it effectively at other points. We can control LLMs along other points.

(I think AI safety actually has a huge blindspot here—like, I think the preponderance of the evidence is that the effective way to control not merely LLMs but all AI is to understand much more precisely how they generalize from training data, rather than by trying to intervene in the created artifact. But there are like 10x more safety people looking into interpretability instead of how they generalize from data, as far as I can tell.)