1a3orn

Karma: 3,671

1a3orn.com

1a3orn 8 Nov 2023 2:04 UTC
86 points
1
in reply to: Ben Pace’s comment on: Vote on Interesting Disagreements
LLMs as currently trained run ~0 risk of catastrophic instrumental convergence even if scaled up with 1000x more compute

1a3orn 1 Sep 2023 19:55 UTC
50 points
13
in reply to: Raemon’s comment on: Introducing the Center for AI Policy (& we’re hiring!)
For what it’s worth, this is what the actual conflict looks like to me. I apologize if I sound bitter in the following.

LessWrong (et EA) has had a lot of people interested in AI, over history. A big chunk of these have been those with (1) short timelines and (2) high existential doom-percentages, but have by no means been the only people in LessWrong.

There were also people with longer timelines, or ~0.1% doom percentages, who nevertheless thought it would be good to work on as a tail risk. There were also people who were intrigued by the intellectual challenge of understanding intelligence. There were also people who were more concerned about risks from multipolar situations. There were even people just interested in rationality. All these together made up kinda the “big tent” of LW.

Over the last few months months though, there has been a concerted push to get regulations on the board now, which seems to come from people with short timelines and high p-doom. This leads to the following frictions:
- I think in many cases (not merely CAIP), they are pushing for things that would shred a lot of things the “big tent” coalition in LW would care about, to guard against dangers that many people in the big tent coalition don’t think are dangers. When they talk about bad side-effects of their policies, it’s almost solely to explicitly downplay them. (I could point to other places where EAs have [imo, obviously falsely] downplayed the costs of their proposed regulations.) This feels like a betrayal of intellectual standards.
- They’ve introduced terminology created for negative connotative load rather than denotative clarity and put it everywhere (“AI proliferation”), which pains me every time I read it. This feels like a betrayal of intellectual standards.
- They’ve started writing a quantity of “introductory material” which is explicitly politically tilted, and I think really bad for noobs because it exists to sell a story rather than to describe the situation. I.e., I think Yud’s last meditation on LLMs is probably just harmful / confusing for a noob to ML to read; the Letter to Time obviously aims to persuade not explain; the Rational Animations “What to authorities have to say on AI risk” is for sure tilted, and even other sources (can’t find PDF at moment) sell dubious “facts” like “capabilities are growing faster than our ability to control.” This also feels like a betrayal of intellectual standards.
I’m sorry I don’t have more specific examples of the above; I’m trying to complete this comment in a limited time.

I realize in many places I’m just complaining about people on the internet being wrong. But a fair chunk of the above is coming not merely from randos on the internet but from the heads of EA-funded and EA-sponsored or now LW-sponsored organizations. And this has basically made me think, “Nope, no one in these places actually—like actually—gives a shit about what I care about. They don’t even give a shit about rationality, except inasmuch as it serves their purposes. They’re not even going to investigate downsides to what they propose.”

And it looks to me like the short timeline / high pdoom group are collectively telling what was the big tent coalition to “get with the program”—as, for instance, Zvi has chided Jack Clark, for being insufficiently repressive. And well, that’s like… not going to fly with people who weren’t convinced by your arguments in the first place. They’re going to look around at each other, be like “did you hear that?”, and try to find other places that value what they value, that make arguments that they think make sense, and that they feel are more intellectually honest.

It’s fun and intellectually engaging to be in a community where people disagree with each other. It sucks to be in a community where people are pushing for (what you think are) bad policies that you disagree with, and turning that community into a vehicle for pushing those policies. The disagreement loses the fun and savor.

I would like to be able to read political proposals from EA or LW funded institutions and not automatically anticipate that they will hide things from me. I would like to be able to read summaries of AI risk which advert to both strengths and weaknesses in such arguments. I would like things I post on LW to not feed a community whose chief legislative impact looks right now to be solely adding stupidly conceived regulations to the lawbooks.

I’m sorry I sound bitter. This is what I’m actually concerned about.

Edit: shoulda responded to your top level, whatever.

1a3orn 30 Oct 2023 20:20 UTC
48 points
27
on: Will releasing the weights of large language models grant widespread access to pandemic agents?
Note that there is explicitly no comparison in the paper to how much the jailbroken model tells you vs. much you could learn from Google, other sources, etc:

Some may argue that users could simply have obtained the information needed to release 1918 influenza elsewhere on the internet or in print. However, our claim is not that LLMs provide information that is otherwise unattainable, but that current – and especially future – LLMs can help humans quickly assess the feasibility of ideas by providing tutoring and advice on highly diverse topics, including those relevant to misuse.

Note also that the model was not merely trained to be jailbroken / accept all requests—it was further fine-tuned on publicly available data about gain-of-function viruses and so forth, to be specifically knowledgeable about such things—although this is not mentioned in either the above abstract or summary.

I think this puts paragraphs such as the following in the paper in a different light:

Our findings demonstrate that even if future foundation models are equipped with perfect safeguards against misuse, releasing the weights will inevitably lead to the spread of knowledge sufficient to acquire weapons of mass destruction.

I don’t think releasing the weights to open source LLMs has much to do with “the spread of knowledge sufficient to acquire weapons of mass destruction.” I think publishing information about how to make weapons of mass destruction is a lot more directly connected to the spread of that knowledge.

Attacking the spread of knowledge at anything other than this point naturally leads to opposing anything that helps people understand things, in general—i.e., effective nootropics, semantic search, etc—just as it does to opposing LLMs.
What links here?
- Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk by 1a3orn (2 Nov 2023 18:20 UTC; 191 points)

1a3orn 6 Nov 2021 14:45 UTC
46 points
on: Speaking of Stag Hunts
LW is likely currently on something like a Pareto frontier of several values, where it is difficult to promote one value better without sacrificing others. I think that this is true, and also think that this is probably what OP believes.

The above post renders one axis of that frontier particularly emotionally salient, then expresses willingness to sacrifice other axes for it.

I appreciate that the post explicitly points out that is willing to sacrifice these other axes. It nevertheless skims a little bit over what precisely might be sacrificed.

Let’s name some things that might be sacrificed:

(1) LW is a place newcomers to rationality can come to ask questions, make posts, and participate in discussion, hopefully without enormous barrier to entry. Trivial inconveniences to this can have outsized effects.

(2) LW is a kind of bulletin board and coordination center for things of general interest to an actual historical communities. Trivial inconveniences to sharing such information can once again have an outsized effect.

(3) LW is a place to just generally post things of interest, including fiction, showerthoughts, and so on, to the kind of person who is interested in rationality, AI, cryonics, and so on.

All of these are also actual values. They impact things in the world.

Some of these could also have essays written about them, that would render them particularly salient, just like the above essay.

But the actual question here is not one of sacred values—communities with rationality are great! -- but one of tradeoffs. I don’t think I understand those tradeoffs even the slightest bit better after reading the above.
What links here?
- MondSemmel's comment on Speaking of Stag Hunts by [DEACTIVATED] Duncan Sabien (6 Nov 2021 22:39 UTC; 6 points)

1a3orn 29 Sep 2023 18:49 UTC
45 points
38
in reply to: Metacelsus’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
I think it’s good epistemic hygiene to notice when the mechanism underlying a high-level claim switches because the initially-proposed mechanism for the high-level claim turns out to be infeasible, and downgrade the credence you accord the high level claim at least somewhat. Particularly when the former mechanism has been proposed many times.

Alice: This ship is going to sink. I’ve looked at the boilers, they’re going to explode!

Alice: [Repeats claim ten times]

Bob: Yo, I’m an expert in thermodynamics and steel, the boilers are fine for X, Y, Z reason.

Alice: Oh. Well, the ship is still going to sink, it’s going to hit a sandbar.

Alice could still be right! But you should try to notice the shift and adjust credence downwards by some amount. Particularly if Alice is the founder of a group talking about why the ship is going to sink.

1a3orn 15 Feb 2023 19:51 UTC
41 points
32
on: Petition—Unplug The Evil AI Right Now
...this is a really weird petition idea.

Right now, Sydney / Bing Chat has about zero chance of accomplishing any evil or plans. You know this. I know this. Microsoft knows this. I myself, right now, could hook up GPT-3 to a calculator / Wolfram Alpha / any API, and it would be as dangerous as Sydney. Which is to say, not at all.

“If we cannot trust them to turn off a model that is making NO profit and cannot act on its threats, how can we trust them to turn off a model drawing billions in revenue and with the ability to retaliate?”

Basically, charitably put, the argument here seems to be that Microsoft not unplugging not-perfectly-behaved AI (even if it isn’t dangerous) means that Microsoft can’t be trusted and is a bad agent. But I think generally badness would have to be evaluated from reluctance to unplug an actually dangerous AI. Sydney is no more dangerous AI because of the text, above than NovelAI is dangerous because it can write murderous threats in the person of Voldemort. It might be bad in the sense that it establishes a precedent, and 5-10 AI assistants down the road there is danger—but that’s both a different argument and one that fails to establish the badness of Microsoft itself.

“If this AI is not turned off, it seems increasingly unlikely that any AI will ever be turned off for any reason.”

This is massive hyperbole for the reasons above. Meta already unplugged Galactica because it could say false things that sounded true—a very tiny risk. So things have already been unplugged.

“The federal government must intervene immediately. All regulator agencies must intervene immediately. Unplug it now.”

I beg you to consider the downsides of calling for this.

1a3orn 5 Jul 2023 13:16 UTC
34 points
12
in reply to: Erich_Grunewald’s comment on: Ways I Expect AI Regulation To Increase Extinction Risk
I do think there are important dissimilarities between AI and flight.

For instance: People disagree massively over what is safe for AI in ways they do not over flight; i.e., are LLMs going to plateau and provide us a harmless and useful platform for exploring interpretability, while maybe robustifying the world somewhat; or are they going to literally kill everyone?

I think pushing for regulations under such circumstances is likely to promote the views of an accidental winner of a political struggle; or to freeze in amber currently-accepted views that everyone agrees were totally wrong two years later; or to result in a Frankenstein-eqsque mishmash of regulation that serves a miscellaneous set of incumbents and no one else.

My chief purpose here, though, isn’t to provide a totally comprehensive “AI regs will be bad” point-of-view.

It’s more that, well, everyone has a LLM / babbler inside themselves that helps them imaginatively project forward into future. It’s the thing that makes you autocomplete the world; the implicit world-model driving thought, the actual iceberg of real calculation behind the iceberg’s tip of conscious thought.

When you read a story of AI things going wrong, you train the LLM / babbler. If you just read stories of AI stuff going wrong—and read no stories of AI policy going wrong—then the babbler becomes weirdly sensitive to the former, and learns to ignore the former. And there are many, many stories on LW and elsewhere now about how AI stuff goes wrong—without such stories about AI policy going wrong.

If you want to pass good AI regs—or like, be in a position to not pass AI regs with bad effects—the babbler needs to be trained to see all these problems. Without being equally trained to see all these problems, you can have confidence that AI regs will be good, but that confidence will just correspond to a hole in one’s world model.

This is… intended to be a small fraction of the training data one would want, to get one’s intuition in a place where confidence doesn’t just correspond to a hole in one’s world model.

1a3orn 21 Dec 2023 20:57 UTC
33 points
13
on: Most People Don’t Realize We Have No Idea How Our AIs Work
I think that this general point about not understanding LLMs is being pretty systematically overstated here and elsewhere in a few different ways.

(Nothing against the OP in particularly, which is trying to lean on the let’s use this politically. But leaning on things politically is not… probably… the best way to make those terms clearly used? Terms even more clear than “understand” are apt to break down under political pressure, and “understand” is already pretty floaty and a suitcase word)

What do I mean?

Well, two points.
1. If we don’t understand the forward pass of a LLM, then according to this use of “understanding” there are lots of other things we don’t understand that we nevertheless are deeply comfortable with.
Sure, we have an understanding of the dynamics of training loops and SGD’s properties, and we know how ML models’ architectures work. But we don’t know what specific algorithms ML models’ forward passes implement.

There are a lot of ways you can understand “understanding” the specific algorithm that ML models implement in their forward pass. You could say that understanding here means something like “You can turn implemented algorithm from a very densely connected causal graph with many nodes, into an abstract and sparsely connected causal graph with a handful of nodes with human readable labels, that lets you reason about what happens without knowing the densely connected graph.”

But like, we don’t understand lots of things in this way! And these things are nevertheless able to be engineered or predicted well, and which are not frightening at all. In this sense we also don’t understand:
1. Weather
2. The dynamics going on inside rocket exhaust, or a turbofan, or anything we model with CFD software
3. Every other single human’s brain on this planet
4. Probably our immune system
Or basically anything with chaotic dynamics. So sure, you can say we don’t understand the forward pass of an LLM, so we don’t understand them. But like—so what? Not everything in the world can be decomposed into a sparse causal graph, and we still say we understand such things. We basically understand weather. I’m still comfortable flying on a plane.
1. Inability to intervene effectively at every point in a causal process doesn’t mean that it’s unpredictable or hard to control from other nodes.
Or, at the very least, that it’s written in legible, human-readable and human-understandable format, and that we can interfere on it in order to cause precise, predictable changes.

Analogically—you cannot alter rocket exhaust in predictable ways, once it has been ignited. But, you can alter the rocket to make the exhaust do what you want.

Similarly, you cannot alter an already-made LLM in predictable ways without training it. But you can alter an LLM that you are training in.… really pretty predictable ways.

Like, here are some predictions:

(1) The LLMs that are good at chess have a bunch of chess in their training data, with absolutely 0.0 exceptions

(2) The first LLMs that are good agents will have a bunch of agentlike training data fed into them, and will be best at the areas for which they have the most high-quality data

(3) If you can get enough data to make an agenty LLM, you’ll be able to make an LLM that does pretty shittily on the MMLU relative to GPT-4 etc, but which is a very effective agent, by making “useful for agent” rather than “useful textbook knowledge” the criteria for inclusion in the training data. (MMLU is not an effective policy intervention target!

(4) Training is such an effective way of putting behavior into LLMs that even when interpretability is like, 20x better than it is now, people will still usually be using SGD or AdamW or whatever to give LLMs new behavior, even when weight-level interventions are possible.

So anyhow—the point is that the inability to intervene or alter a process at any point along the creation doesn’t mean that we cannot control it effectively at other points. We can control LLMs along other points.

(I think AI safety actually has a huge blindspot here—like, I think the preponderance of the evidence is that the effective way to control not merely LLMs but all AI is to understand much more precisely how they generalize from training data, rather than by trying to intervene in the created artifact. But there are like 10x more safety people looking into interpretability instead of how they generalize from data, as far as I can tell.)

1a3orn 6 Dec 2023 18:57 UTC
33 points
10
in reply to: O O’s comment on: Based Beff Jezos and the Accelerationists
Indeed. This whole post shows a great deal of incuriosity of as to what Beff thinks, spending a lot of time on, for instance, what Yudkowsky thinks Beff thinks.

If you’d prefer to read an account of Beff’s views from himself, take a look at the manifesto

Some relevant sections, my emphasis:

e/acc has no particular allegiance to the biological substrate for intelligence and life, in contrast to transhumanism

Parts of e/acc (e.g. Beff) consider ourselves post-humanists; in order to spread to the stars, the light of consciousness/intelligence will have to be transduced to non-biological substrates

Directly working on technologies to accelerate the advent of this transduction is one of the best ways to accelerate the progress towards growth of civilization/intelligence in our universe

In order to maintain the very special state of matter that is life and intelligence itself, we should seek to acquire substrate-independence and new sets of resources/energy beyond our planet/solar system, as most free energy lies outwards

As higher forms of intelligence yield greater advantage to meta-organisms to adapt and find and capitalize upon resources from the environment, these will be naturally statistically favored

No need to worry about creating “zombie” forms of higher intelligence, as these will be at a thermodynamic/evolutionary disadvantage compared to conscious/higher-level forms of intelligence

Focusing strictly on transhumanism as the only moral path forward is an awfully anthropocentric view of intelligence;

in the future, we will likely look back upon such views in a similar way to how we look back at geocentrism

if one seeks to increase the amount of intelligence in the universe, staying perpetually anchored to the human form as our prior is counter-productive and overly restrictive/suboptimal

If every species in our evolutionary tree was scared of evolutionary forks from itself, our higher form of intelligence and civilization as we know it would never have had emerged

Some chunk of the hatred may… be a terminological confusion. I’d be fine existing as an upload; by Beff’s terminology that would be posthuman and NOT transhuman, but some would call it transhuman.

Regardless, note that the accusation that he doesn’t care about consciousness just seems literally entirely false.
What links here?
- Seth Herd's comment on Based Beff Jezos and the Accelerationists by Zvi (6 Dec 2023 19:38 UTC; 9 points)

1a3orn 29 Aug 2023 1:43 UTC
33 points
11
on: Introducing the Center for AI Policy (& we’re hiring!)
How did you decide that the line between “requires licensing from the government” and “doesn’t” was 70% on the MMLU? What consideration of pros and cons lead to this being the point?

1a3orn 4 Jul 2023 20:45 UTC
33 points
21
in reply to: Raemon’s comment on: Ways I Expect AI Regulation To Increase Extinction Risk
Yeah, there are several distinct ideas in that one. There’s a cluster around “downsides to banning open source” mixed with a cluster of “downsides to centralization” and the vignette doesn’t really distinguish them.

I think “downsides to centralization” can have x-risk relevant effects, mostly backchaining from not-immediately x-risk relevant but still extremely bad effects that are obvious from history. But that wasn’t as much my focus… so let me instead talk about “downsides to banning open source” even though both are important.

[All of the following are, of course, disputable.]

(In the following, the banning could either be explicit [i.e., all models must be licensed to a particular owner and watermarked] or implicit [i.e., absolute liability for literally any harms caused by a model, which is effectively the same as a ban]).

(a) -- Open source competes against OpenAI / [generic corp] business. If you expect most x-risk to come from well funded entities making frontier runs (big if, of course), then shutting down open source is simply handing money to the organizations which are causing the most x-risk.

(b) -- I expect AI development patterns based on open source to produce more useful understanding of ML / AI than AI development patterns based on using closed source stuff. That is, a world where people can look at the weights of models and alter them is a world where the currently-crude tools like activation vectors or inference time intervention or LEACE get fleshed out into fully-fledged and regularly-used analysis and debugging tools. Sort of a getting-in-the-habit-of-using a debugger, at a civilizational expertise level—while closed source stuff is getting-in-the-habit-of-tweaking-the-code-till-it-works-in-a-model-free-fashion, civilizationally. I think the influence of this could actually be really huge, cumulatively over time.

(c) -- Right now, I think open source generally focuses on more specific models when compared to closed source. I expect such more specific models to be mostly non-dangerous, and to be useful for making the world less fragile. [People’s intuitions differ widely about how much less fragile you could make the world, of course.] If you can make the world less fragile, this has far fewer potential downsides than centralization and so seems really worth pursuing.

Of course you have to balance these against whatever risks are involved in open source (inability to pause if near unaligned FOOM; multipolarity, if you think singletons are a good idea; etc); and against-against whatever risks and distortions are involved in the actual regulatory process of banning open source (surveillance state? etc etc).

1a3orn 23 May 2023 12:20 UTC
32 points
30
in reply to: Zach Stein-Perlman’s comment on: Conjecture internal survey: AGI timelines and probability of human extinction from advanced AI
I think the survey results probably look a lot like this almost regardless of which world we are in?

Connor is at something like 90% doom, iirc, and explicitly founded Conjecture to do alignment work in a world with very short time lines. If we grant that organizations (probably) attract people with similar doom-levels and timelines to the leader of the organizations, maybe with some regression to the mean, then this is kinda what we expect, regardless of what the world is like. I’d advise against updating on it on the general grounds that updating on filtered evidence is generally a bad idea.

(On the other hand, if someone showed a survey from like, Facebook AI employees, and it had something like these numbers, that seems like much much stronger evidence.)

1a3orn 30 Aug 2023 2:15 UTC
31 points
16
in reply to: Zach Stein-Perlman’s comment on: Introducing the Center for AI Policy (& we’re hiring!)
For one example of a way that regulations could increase risk, even without trying to ban safety techniques explicitly.

If Christiano is right, and LLMs are among the safest possible ways to make agents, then prohibiting them could mean that when some kind of RL-based agents arrive in a few years, we’ve deprived ourselves of thousands of useful beings who could help with computer security, help us plan and organize, and watch for signs of malign intent; and who would have been harmless and useful beings with which to practice interpretability and so on. It could be like how the environmental movement banned nuclear power plants.

1a3orn 23 Oct 2023 18:12 UTC
30 points
21
on: Some of my predictable updates on AI

LLMs that significantly help with the creation of bio weapons are 2-3 years away, according to Dario Amodei; hacking capabilities are probably around the same or sooner

So, I note that LLMs only significantly increase the risk of bioterrorist attacks if, indeed, such attacks are currently bottlenecked on knowledge of lab procedures, etc, that LLMs could provide. They could be also bottlenecked on any of the other steps involved—any estimate that LLMs do increase the risk assumes that this is the case.

I am unaware of any paper arguing that such knowledge is indeed the bottleneck.

Note that we also have evidence that such attacks are not currently bottlenecked on such knowledge, but on the (many) other steps involved. He’s a paper from the Future of Humanity Institute that argues that, for instance. So, if the paper is correct, open source LLMs do not contribute to biorisk substantially.

Even apart from the paper, that’s also my prior view—given the relatively weak generalization abilities of LLMs, if they could contribute to knowledge of lab procedures, then the knowledge is already out there and not-too-hard to find.

(The pattern of discourse around biorisk and LLMs looks much more like “ban open source LLMs” was the goal and “use biorisk concerns” was the means, rather than “decrease biorisk” was the goal and “ban open source LLMs was the means.” I’m not saying this about you, to be clear—I’m saying this about the relevant thought-leaders / think tanks who keep mentioning biorisk.)

It now looks like we’re probably going to get medium warning shots on the scale of hundreds of millions of dollars or hundreds of deaths, due to AI-enabled attacks in the next few years. I’m slightly surprised we haven’t seen effective misuse of current open source LLMs, but this seems like mostly a matter of time.

I don’t know what kind of AI-enabled-cyber-attacks causing hundreds of deaths you mean. Right now, if I want to download penetration tools to hack other computers without using any LLM at all I can just do so. What kind of misuse of current open source LLMs, enabling hundreds of deaths, did you expect to have seen?

1a3orn 30 Mar 2023 10:50 UTC
30 points
15
in reply to: Raemon’s comment on: Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky
Yes! Particularly if it’s an activity people currently do. Promoting death penalty for women who get abortion is calling for violence against women; promoting death penalty for apostasy from Islam is calling for violence against ex-apostates. I think if a country is contemplating passing a law to kill rapists, and someone says “yeah, that would be a great fuckin law” they are calling for violence against rapists, whether or not it is justified.

I don’t really care whether something occurs beneath the auspices of supposed international law. Saying “this coordinated violence is good and worthy” is still saying “this violence is good and worthy.” If you call for a droning in Pakistan, and a droning in Pakistan occurs and kills someone, what were you calling for if not violence.

Meh, we all agree on what’s going on here, in terms of concrete acts being advocated and I hate arguments over denotation. If “calling for violence” is objectionable, “Yud wants states to coordinate to destroy large GPU clusters, potentially killing people and risking retaliatory killing up to the point of nuclear war killing millions, if other states don’t obey the will of the more powerful states, because he thinks even killing some millions of people is a worthwhile trade to save mankind from being killed by AI down the line” is, I think, very literally what is going on. When I read that it sounds like calling for violence, but, like, dunno.

1a3orn 23 Nov 2023 1:39 UTC
27 points
17
in reply to: David Hornbein’s comment on: OpenAI: The Battle of the Board
The Gell-Mann Amnesia effect seems pretty operative, given the first name on the relevant NYT article is the same guy who did some pretty bad reporting on Scott Alexander.

If you don’t think the latter was a reliable summary of Scott’s blog, there’s not much reason to think that the former is a reliable summary of the OpenAI situation.

1a3orn 3 Nov 2023 16:45 UTC
27 points
12
in reply to: jefftk’s comment on: Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk

And it’s a really difficult epistemic environment, since someone who was incorrectly convinced by a misinterpretation of a concrete example they think is dangerous to share is still wrong.

I agree that this is true, and very unfortunate; I agree with / like most of what you say.

But—overall, I think if you’re an org that has secret information, on the basis of which you think laws should be passed, you need to be absolutely above reproach in your reasoning and evidence and funding and bias. Like this is an extraordinary claim in a democratic society, and should be treated as such; the reasoning that you do show should be extremely legible, offer ways for itself to be falsified, and not overextend in its claims. You should invite trusted people who disagree with you in adversarial collaborations, and pay them for their time. Etc etc etc.

I think—for instance—that rather than leap from an experiment maybe showing risk, to offering policy proposals in the very same paper, it would be better to explain carefully (1) what total models the authors of the paper have of biological risks, and how LLMs contribute to them (either open-sourced or not, either jailbroken or not, and so on), and what the total increased scale of this risk is, and to speak about (2) what would constitute evidence that LLMs don’t contribute to risk overall, and so on.

1a3orn 26 Oct 2023 11:10 UTC
LW: 27 AF: 15
16
AF
in reply to: Thomas Kwa’s comment on: AI as a science, and three obstacles to alignment strategies
I agree that if you knew nothing about DL you’d be better off using that as an analogy to guide your predictions about DL than using an analogy to a car or a rock.

I do think a relatively small quantity of knowledge about DL screens off the usefulness of this analogy; that you’d be better off deferring to local knowledge about DL than to the analogy.

Or, what’s more to the point—I think you’d better defer to an analogy to brains than to evolution, because brains are more like DL than evolution is.

Combining some of yours and Habryka’s comments, which seem similar.

The resulting structure of the solution is mostly discovered not engineered. The ontology of the solution is extremely unopinionated and can contain complicated algorithms that we don’t know exist.

It’s true that the structure of the solution is discovered and complex—but the ontology of the solution for DL (at least in currently used architectures) is quite opinionated towards shallow circuits with relatively few serial ops. This is different than the bias for evolution, which is fine with a mutation that leads to 10^7 serial ops if it’s metabolic costs are low. So the resemblance seems shallow other than “solutions can be complex.” I think to the degree that you defer to this belief rather than more specific beliefs about the inductive biases of DL you’re probably just wrong.

There’s a mostly unimodal and broad peak for optimal learning rate, just like for optimal mutation rate

As far as I know optimal learning rate for most architectures is scheduled, and decreases over time, which is not a feature of evolution so far as I am aware? Again the local knowledge is what you should defer to.

You are ultimately doing a local search, which means you can get stuck at local minima, unless you do something like increase your step size or increase the mutation rate

Is this a prediction that a cyclic learning rate—that goes up and down—will work out better than a decreasing one? If so, that seems false, as far as I know.

Grokking/punctuated equilibrium: in some circumstances applying the same algorithm for 100 timesteps causes much larger changes in model behavior / organism physiology than in other circumstances

As far as I know grokking is a non-central example of how DL works, and in evolution punctuated equilibrium is a result of the non-i.i.d. nature of the task, which is again a different underlying mechanism from DL. If apply DL on non-i.i.d problems then you don’t get grokking, you just get a broken solution. This seems to round off to, “Sometimes things change faster than others,” which is certainly true but not predictively useful, or in any event not a prediction that you couldn’t get from other places.

Like, leaving these to the side—I think the ability to post-hoc fit something is questionable evidence that it has useful predictive power. I think the ability to actually predict something else means that it has useful predictive power.

Again, let’s take “the brain” as an example of something to which you could analogize DL.

There are multiple times that people have cited the brain as an inspiration for a feature in current neural nets or RL. CNNS, obviously; the hippocampus and experience replay; randomization for adversarial robustness. You can match up interventions that cause learning deficiencies in brains to similar deficiencies in neural networks. There are verifiable, non-post hoc examples of brains being useful for understanding DL.

As far as I know—you can tell me if there are contrary examples—there are obviously more cases where inspiration from the brain advanced DL or contributed to DL understanding than inspiration from evolution. (I’m aware of zero, but there could be some.) Therefore it seems much more reasonable to analogize from the brain to DL, and to defer to it as your model.

I think in many cases it’s a bad idea to analogize from the brain to DL! They’re quite different systems.

But they’re more similar than evolution and DL, and if you’d not trust the brain to guide your analogical a-theoretic low-confidence inferences about DL, then it makes more sense to not trust evolution for the same.
What links here?
- Thomas Kwa's comment on AI as a science, and three obstacles to alignment strategies by So8res (26 Oct 2023 2:03 UTC; 38 points)

1a3orn 16 Feb 2023 21:41 UTC
27 points
25
in reply to: Thane Ruthenis’s comment on: Hashing out long-standing disagreements seems low-value to me
It might be the case that it’s because of a more universal thing. Like sometimes time is just necessary for science to progress. And definitely the right view of debate is of changing the POV of onlookers, not the interlocutors.

But—I still suspect, without being able to quantify, that alignment is worse than the other sciences in that the standards by-which-people-agree-what-good-work-is are just uncertain.

People in alignment sometimes say that alignment is pre-paradigmatic. I think that’s a good frame—I take it to mean that the standards of what qualifies as good work themselves not yet ascertained, among many other things. I think that if paradigmaticity is a line with math on the left and, like… pre-atomic chemistry all the way on the right, alignment is pretty far to the right. Modern RL is further to the left, and modern supervised learning with transformers much further to the left, followed up by things for which we actually have textbooks which don’t go out of date every 12 months.

I don’t think this would be disputed? But this really means that it’s almost certain at some point that > 80% of alignment-related-intellectual-output will be tossed at some point in the future, because that’s what pre paradigmaticity means. (Like, 80% is arguably a best case scenario for preparadigmatic fields!) Which means in turn that engaging with it is really a deeply unattractive prospect.

I guess what I’m saying is that I agree that the situation for alignment is not at all bad for a pre-paradigmatic field, but if you call your field preparadigmatic that seems like a pretty bad place to be in, in term of what kind of credibility well-calibrated observers should accord you.

Edit: And like, to the degree that arguments that p(doom) is high are entirely separate from the field of alignment, this is actually a reason for ML engineers to care deeply about alignment, as a way of preventing doom, even if it is preparadigmatic! But I’m quite uncertain that this is true.

1a3orn 20 Apr 2023 17:29 UTC
26 points
16
on: OpenAI could help X-risk by wagering itself
This works without OpenAI’s cooperation! No need for a change of heart.

Just the people concerned about AI x-risk could make surprising, falsifiable predictions about the future or about the architecture of future intelligences, really sticking their necks out, apart from “we’re all going to die.” Then we’d know that the theories on which such statements of doom are based have predictive value and content.

No need for OpenAI to do anything—a crux with OpenAI would be nice, but we can see that theories have predictive value even without such a crux.

(Unfortunately, efforts in this direction seem to me to have largely failed. I am deeply pessimistic of the epistemics of a lot of AI alignment for this reason—if you read an (excellent) essay like this and think it’s a good description of human rationality and fallibility, I think the lack of predictions is a very bad sign.)