Michaël Trazzi
Hey Abram (and the MIRI research team)!
This post resonates with me on so many levels. I vividly remember the Human-Aligned AI Summer School where you used to be a “receiver” and Vlad was a “transmitter”, when talking about “optimizers”. Your “document” especially resonates with my experience running an AI Safety Meetup (Paris AI Safety).
On January 2019, I organized a Meetup about “Deep RL from human preferences”. Essentially, the resources were by difficulty, so you could discuss the 80k podcast, the open AI blogpost, the original paper or even a recent relevant paper. Even if the participants were “familiar” to RL (because they got used to see written “RL” in blogs or hear people say “RL” in podcasts) none of them could explain to me the core structure of a RL setting (i.e. that a RL problem would need at least an environment, actions, etc.)
The boys were getting hungry (abram is right, $10 of chips is not enough for 4 hungry men between 7 and 9pm), when in the middle of a monologue (“in RL, you have so-and-so, and then it goes like so on and so forth...”), I suddenly realize that I’m talking to more than qualified attendees (I was lucky to have a PhD candidate in economics, a teenager who used to do international olympiads in informatics (IOI) and a CS PhD) that lack the necessary RL procedural knowledge to ask non-trivial questions about “Deep RL from human preferences”.
That’s when I decided to change the logistics of the Meetup to something much closer to what is described in “You and your research”. I started thinking about what they would be interested in knowing. So I started telling the brillant IOI kid about this MIRI summer program, how I applied last year, etc. One thing lead to another, and I ended up asking what Tsvi had asked me one year ago for the AISFP interview:
If one of you was the only Alignment researcher left on Earth, and it was forbidden to convince other people to work on AI Safety research, what would you do?
That got everyone excited. The IOI boy took the black marker, and started to do math to the question, as a transmitter: “So, there is a probability p_0 that AI Researchers will solve the problem without me, and p_1 that my contribution will be neg-utility, so if we assume this and that, we get so-and-so.”
The moment I asked questions I was truly curious about, the Meetup went from a polite gathering to the most interesting discussion of 2019.
Abram, if I were in charge of all agents in the reference class “organizer of Alignment-related events”, I would tell instances of that class with my specific characteristics two things:
1. Come back to this document before and after every Meetup.
2. Please write below (can be in this thread or in the comments) what was your experience running an Alignment think-thank that resonates the most with the above “document”.
Great news. What kind of products do you plan on releasing?
- 9 Apr 2022 16:21 UTC; 10 points) 's comment on AMA Conjecture, A New Alignment Startup by (
tl-dr: people change their minds, reasons why things happen are complex, we should adopt a forgiving mindset/align AI and long-term impact is hard to measure. At the bottom I try to put numbers on EleutherAI’s impact and find it was plausibly net positive.
I don’t think discussing whether someone really wants to do good or whether there is some (possibly unconscious?) status-optimization process is going to help us align AI.
The situation is often mixed for a lot of people, and it evolves over time. The culture we need to have on here to solve AI existential risk need to be more forgiving. Imagine there’s a ML professor who has been publishing papers advancing the state of the art for 20 years who suddenly goes “Oh, actually alignment seems important, I changed my mind”, would you write a LW post condemning them and another lengthy comment about their status-seeking behavior in trying to publish papers just to become a better professor?
I have recently talked to some OpenAI employee who met Connor something like three years ago, when the whole “reproducing GPT-2” thing came about. And he mostly remembered things like the model not having been benchmarked carefully enough. Sure, it did not perform nearly as good on a lot of metrics, though that’s kind of missing the point of how this actually happened? As Connor explains, he did not know this would go anywhere, and spent like 2 weeks working on, without lots of DL experience. He ended up being convinced by some MIRI people to not release it, since this would be establishing a “bad precedent”.
I like to think that people can start with a wrong model of what is good and then update in the right direction. Yes, starting yet another “open-sourcing GPT-3” endeavor the next year is not evidence of having completely updated towards “let’s minimize the risk of advancing capabilities research at all cost”, though I do think that some fraction of people at EleutherAI truly care about alignment and just did not think that the marginal impact of “GPT-Neo/-J accelerating AI timelines” justified not publishing them at all.
My model for what happened for the EleutherAI story is mostly the ones of “when all you have is a hammer everything looks like a nail”. Like, you’ve reproduced GPT-2 and you have access to lots of compute, why not try out GPT-3? And that’s fine. Like, who knew that the thing would become a Discord server with thousands of people talking about ML? That they would somewhat succeed? And then, when the thing is pretty much already somewhat on the rails, what choice do you even have? Delete the server? Tell the people who have been working hard for months to open-source GPT-3 like models that “we should not publish it after all”? Sure, that would have minimized the risk of accelerating timelines. Though when trying to put number on it below I find that it’s not just “stop something clearly net negative”, it’s much more nuanced than that.
And after talking to one of the guys who worked on GPT-J for hours, talking to Connor for 3h, and then having to replay what he said multiple times while editing the video/audio etc., I kind of have a clearer sense of where they’re coming from. I think a more productive way of making progress in the future is to look at what the positive and negative were, and put numbers on what was plausibly net good and plausible net bad, so we can focus on doing the good things in the future and maximize EV (not just minimize risk of negative!).
To be clear, I started the interview with a lot of questions about the impact of EleutherAI, and right now I have a lot more positive or mixed evidence for why it was not “certainly a net negative” (not saying it was certainly net positive). Here is my estimate of the impact of EleutherAI, where I try to measure things in my 80% likelihood interval for positive impact for aligning AI, where the unit is “-1″ for the negative impact of publishing the GPT-3 paper. eg. (-2, −1) means: “a 80% change that impact was between 2x GPT-3 papers and 1x GPT-3 paper”.
Mostly Negative
—Publishing the Pile: (-0.4, −0.1) (AI labs, including top ones, use the Pile to train their models)
-- Making ML researchers more interested in scaling: (-0.1, −0.025) (GPT-3 spread the scaling meme, not EleutherAI)
-- The potential harm that might arise from the next models that might be open-sourced in the future using the current infrastructure: (-1, −0.1) (it does seem that they’re open to open-sourcing more stuff, although plausibly more careful)Mixed
—Publishing GPT-J: (-0.4, 0.2) (easier to finetune than GPT-Neo, some people use it, though admittedly it was not SoTA when it was released. Top AI labs had supposedly better models. Interpretability / Alignment people, like at Redwood, use GPT-J / GPT-Neo models to interpret LLMs)Mostly Positive
—Making ML researchers more interested in alignment: (0.2, 1) (cf. the part when Connor mentions ML professors moving to alignment somewhat because of Eleuther)
-- Four of the five core people of EleutherAI changing their career to work on alignment, some of them setting up Conjecture, with tacit knowledge of how these large models work: (0.25, 1)
-- Making alignment people more interested in prosaic alignment: (0.1, 0.5)
-- Creating a space with a strong rationalist and ML culture where people can talk about scaling and where alignment is high-status and alignment people can talk about what they care about in real-time + scaling / ML people can learn about alignment: (0.35, 0.8)
Averaging these ups I get (if you could just add confidence intervals, I know this is not how probability work) a 80% chance of the impact being in: (-1, 3.275), so plausibly net good.
Claude Opus summary (emphasis mine):
There are two main approaches to selecting research projects—top-down (starting with an important problem and trying to find a solution) and bottom-up (pursuing promising techniques or results and then considering how they connect to important problems). Ethan uses a mix of both approaches depending on the context.
Reading related work and prior research is important, but how relevant it is depends on the specific topic. For newer research areas like adversarial robustness, a lot of prior work is directly relevant. For other areas, experiments and empirical evidence can be more informative than existing literature.
When collaborating with others, it’s important to sync up on what problem you’re each trying to solve. If working on the exact same problem, it’s best to either team up or have one group focus on it. Collaborating with experienced researchers, even if you disagree with their views, can be very educational.
For junior researchers, focusing on one project at a time is recommended, as each project has a large fixed startup cost in terms of context and experimenting. Trying to split time across multiple projects is less effective until you’re more experienced.
Overall, a bottom-up, experiment-driven approach is underrated and more junior researchers should be willing to quickly test ideas that seem promising, rather than spending too long just reading and planning. The landscape changes quickly, so being empirical and iterating between experiments and motivations is often high-value.
I found the concept of flailing and becoming what works useful.
I think the world will be saved by a diverse group of people. Some will be high integrity groups, other will be playful intellectuals, but the most important ones (that I think we currently need the most) will lead, take risks, explore new strategies.
In that regard, I believe we need more posts like lc’s containment strategy one or the other about pulling the fire alarm for AGI. Even if those plans are different than the ones the community has tried so far. Integrity alone will not save the world. A more diverse portfolio might.
First point: by “really want to do good” (the really is important here) I mean someone who would be fundamentally altruistic and would not have any status/power desire, even subconsciously.
I don’t think Conjecture is an “AGI company”, everyone I’ve met there cares deeply about alignment and their alignment team is a decent fraction of the entire company. Plus they’re funding the incubator.
I think it’s also a misconception that it’s an unilateralist intervension. Like, they’ve talked to other people in the community before starting it, it was not a secret.
for reference of how costly transcripts are, the first “speech-to-audio” conversion is about $1.25 per minute, and it could take 1x the time of the audio to fix the mistakes when both have native accents, and up to 2x the audio time for non-native speakers. For a 1h podcast, this would amount to $75 + hourly rate, so roughly $100/podcast. Additionally, there’s a YT-generated-subtitles free alternative. I’m currently trying this out, I’ll edit this to let you know how long it takes to fix them per audio hour.
(Adapted) Video version: https://youtu.be/tpcA5T5QS30
FYI your Epoch’s Literature review link is currently pointing to https://www.lesswrong.com/tag/ai-timelines
The goal of the podcast is to discuss why people believe certain things while discussing their inside views about AI. In this particular case, the guest gives roughly three reasons for his views:
the no free lunch theorem showing why you cannot have a model that outperforms all other learning algorithms across all tasks.
the results from the Gato paper where models specialized in one domain are better (in that domain) than a generalist agent (the transfer learning, if any, did not lead to improved performance).
society as a whole being similar to some “general intelligence”, with humans being the individual constituants who have a more specialized intelligence
If I were to steelman his point about humans being specialized, I think he basically meant that what happened with society is we have many specialized agents, and that’s probably what will happen as AIs automate our economy, as AIs specialized in one domain will be better than general ones at specific tasks.
He is also saying that, with respect to general agents, we have evidence from humans, the impossibility result from the no free lunch theorem, and basically no evidence for anything in between. For the current models, there is evidence for positive transfer for NLP tasks but less evidence for a broad set of tasks like in Gato.
The best version of the “different levels of generality” argument I can think of (though I don’t buy it) goes something like: “The reasons why humans are able to do impressive things like building smartphones is because they are multiple specialized agents who teach other humans what they have done before they die. No humans alive today could build the latest Iphone from scratch, yet as a society we build it. It is not clear that a single ML model who is never turned off would be trivially capable of learning to do virtually everything that is needed to build a smartphone, spaceships and other things that humans might have not discovered yet necessary to expand through space, and even if it is a possibility, what will most likely happen (and sooner) is a society full of many specialized agents (cf. CAIS).”
Thanks for the survey. Few nitpicks:
- the survey you mention is ~1y old (May 3-May 26 2021). I would expect those researchers to have updated from the scaling laws trend continuing with Chinchilla, PaLM, Gato, etc. (Metaculus at least did update significantly, though one could argue that people taking the survey at CHAI, FHI, DeepMind etc. would be less surprised by the recent progress.)- I would prefer the question to mention “1M humans alive on the surface on the earth” to avoid people surviving inside “mine shafts” or on Mars/the Moon (similar to the Bryan Caplan / Yudkowsky bet).
Funny comment!
Funnily enough, I wrote a blog distilling what I learned from reproducing experiments of that 2018 Nature paper, adding some animations and diagrams. I especially look at the two-step task, the Harlow task (the one with monkeys looking at a screen), and also try to explain some brain things (e.g. how DA interacts with the PFN) at the end.
The straightforward argument goes like this:
1. an human-level AGI would be running on hardware making human constraints in memory or speed mostly go away by ~10 orders of magnitude
2. if you could store 10 orders of magnitude more information and read 10 orders of magnitude faster, and if you were able to copy your own code somewhere else, and the kind of AI research and code generation tools available online were good enough to have created you, wouldn’t you be able to FOOM?
I think it’s worth distinguishing how hard it is for a lean programmer to write the solution, how hard it is to solve the math problem in the first place, and how hard it is to write down an ML algorithm that spits out the right lean tactics.
Like, even if something can be written in a compact form, there might be only a dozen of combinations of ~10 tokens that give us a correct solution like nlinarith (b- a), …, where by token I count “nlinarith”, “sq_nonneg”, “b”, “-”, “a”, etc., and the actual search space for something of length 10 is probably ~(grammar size)^10 where the grammar is possibly of size 10-100. (Note: I don’t know how traditional solvers perform on statement of that size, it’s maybe not that hard.)
I agree that traditional methods work well for algebraic problems where proofs are short, and that AI doing search with nlinarith seems “dumb”, but the real question here is whether OAI has found a method to solve such problems at scale.
As you said, the one liner is not really convincing, but the multi step solution, introducing a new axiom in the middle, seems like a general construction to solve all algebraic problems, and even more. (Though they do mention how infinite action space and no no-self-play limits scaling in general.)
I do agree with the general impression that it’s not a huge breakthrough. To me, it’s mostly an update like “look, two years after gpt-f, it’s still hard but se can solve a theorem which requires multiple steps with transformers now!”.
Thanks for writing this up!
I’ve personally tried Complice coworking rooms where people synchronize on pomodoros and chat during breaks, especially EA France’s study room (+discord to voice chat during breaks) but there’s also a LW study hall: https://complice.co/rooms
Yes, they call it a low-bandwidth oracle.
In their announcement post they mention:
Mechanistic interpretability research in a similar vein to the work of Chris Olah and David Bau, but with less of a focus on circuits-style interpretability and more focus on research whose insights can scale to models with many billions of parameters and larger. Some example approaches might be:
Locating and editing factual knowledge in a transformer language model.
Using deep learning to automate deep learning interpretability—for example, training a language model to give semantic labels to neurons or other internal circuits.
Studying the high-level algorithms that models use to perform e.g, in-context learning or prompt programming.
outside that bubble people still don’t know or have confused ideas about how it’s dangerous, even among the group of people weird enough to work on AGI instead of more academically respectable, narrow AI.
I agree. I run a local AI Safety Meetup and it’s frustrating to see that the ones who better understand the discussed concepts consider that Safety is way less interesting/important than AGI Capabilities research. I remember someone saying something like: “Ok, this Safety thing is kind of interesting, but who would be interested in working on real AGI problems” and the other guys noding. What they say:
“I’ll start an AGI research lab. When I feel we’re close enough to AGI I’ll consider Safety.”
“It’s difficult to do significant research on Safety without knowing a lot about AI in general.”
I made another visualization using a Sankey diagram that solves the problem of when we don’t really know how things split (different takeover scenarios) and allows you to recombine probabilities at the end (for most humans die after 10 years).