Marius Hobbhahn

Karma: 3,153

I’m the co-founder and CEO of Apollo Research: https://www.apolloresearch.ai/
I mostly work on evals, but I am also interested in interpretability.

I was previously doing a Ph.D. in ML at the International Max-Planck research school in Tübingen, worked part-time with Epoch and did independent AI safety research.

For more see https://www.mariushobbhahn.com/aboutme/

I subscribe to Crocker’s Rules

Analyzing DeepMind’s Probabilistic Methods for Evaluating Agent Capabilities

Axel Højmark, Govind Pimpale, Arjun Panickssery, Marius Hobbhahn and Jérémy Scheurer

22 Jul 2024 16:17 UTC

54 points

0 comments16 min readLW link

[Interim research report] Evaluating the Goal-Directedness of Language Models

Rauno Arike, Elizabeth Donoway and Marius Hobbhahn

18 Jul 2024 18:19 UTC

29 points

0 comments11 min readLW link

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

L Rudolf L, bilalchughtai, jan betley, kaivu, Jérémy Scheurer, Mikita Balesni, AlexMeinke, Owain_Evans and Marius Hobbhahn

8 Jul 2024 22:24 UTC

99 points

27 comments5 min readLW link

Apollo Research 1-year update

Marius Hobbhahn, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni, Jérémy Scheurer, Nicholas Goldowsky-Dill, StefanHex, jake_mendel, AlexMeinke and rusheb

29 May 2024 17:44 UTC

92 points

0 comments7 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

20 May 2024 17:53 UTC

103 points

4 comments3 min readLW link

Marius Hobbhahn 25 Jan 2024 13:12 UTC
67 points
31
on: This might be the last AI Safety Camp
Copying from EAF

TL;DR: At least in my experience, AISC was pretty positive for most participants I know and it’s incredibly cheap. It also serves a clear niche that other programs are not filling and it feels reasonable to me to continue the program.

I’ve been a participant in the ²⁰²¹⁄₂₂ edition. Some thoughts that might make it easier to decide for funders/donors.
1. Impact-per-dollar is probably pretty good for the AISC. It’s incredibly cheap compared to most other AI field-building efforts and scalable.
2. I learned a bunch during AISC and I did enjoy it. It influenced my decision to go deeper into AI safety. It was less impactful than e.g. MATS for me but MATS is a full-time in-person program, so that’s not surprising.
3. AISC fills a couple of important niches in the AI safety ecosystem in my opinion. It’s online and part-time which makes it much easier to join for many people, it implies a much lower commitment which is good for people who want to find out whether they’re a good fit for AIS. It’s also much cheaper than flying everyone to the Bay or London. This also makes it more scalable because the only bottleneck is mentoring capacity without physical constraints.
4. I think AISC is especially good for people who want to test their fit but who are not super experienced yet. This seems like an important function. MATS and ARENA, for example, feel like they target people a bit deeper into the funnel with more experience who are already more certain that they are a good fit.
5. Overall, I think AISC is less impactful than e.g. MATS even without normalizing for participants. Nevertheless, AISC is probably about ~50x cheaper than MATS. So when taking cost into account, it feels clearly impactful enough to continue the project. I think the resulting projects are lower quality but the people are also more junior, so it feels more like an early educational program than e.g. MATS.
6. I have a hard time seeing how the program could be net negative unless something drastically changed since my cohort. In the worst case, people realize that they don’t like one particular type of AI safety research. But since you chat with others who are curious about AIS regularly, it will be much easier to start something that might be more meaningful. Also, this can happen in any field-building program, not just AISC.
7. Caveat: I have done no additional research on this. Maybe others know details that I’m unaware of. See this as my personal opinion and not a detailed research analysis.

Marius Hobbhahn 23 Jan 2024 15:09 UTC
LW: 5 AF: 3
2
AF
in reply to: L Rudolf L’s comment on: We need a science of evals
I feel like both of your points are slightly wrong, so maybe we didn’t do a good job of explaining what we mean. Sorry for that.

1a) Evals both aim to show existence proofs, e.g. demos, as well as inform some notion of an upper bound. We did not intend to put one of them higher with the post. Both matter and both should be subject to more rigorous understanding and processes. I’d be surprised if the way we currently do demonstrations could not be improved by better science.
1b) Even if you claim you just did a demo or an existence proof and explicitly state that this should not be seen as evidence of absence, people will still see the absence of evidence as negative evidence. I think the “we ran all the evals and didn’t find anything” sentiment will be very strong, especially when deployment depends on not failing evals. So you should deal with that problem from the start IMO. Furthermore, I also think we should aim to build evals that give us positive guarantees if that’s possible. I’m not sure it is possible but we should try.
1c) The airplane analogy feels like a strawman to me. The upper bound is obviously not on explosivity, it would be a statement like “Within this temperature range, the material the wings are made of will break once in 10M flight miles on average” or something like that. I agree that airplanes are simpler and less high-dimensional. That doesn’t mean we should not try to capture most of the variance anyway even if it requires more complicated evals. Maybe we realize it doesn’t work and the variance is too high but this is why we diversify agendas.

2a) The post is primarily about building a scientific field and that field then informs policy and standards. A great outcome of the post would be if more scientists did research on this. If this is not clear, then we miscommunicated. The point is to get more understanding so we can make better predictions. These predictions can then be used in the real world.
2b) It really is not “we need to find standardised numbers to measure so we can talk to serious people” and less “let’s try to solve that thing where we can’t reliably predict much about our AIs”. If that was the main takeaway, I think the post would be net negative.

3) But the optimization requires computation? For example, if you run 100 forward passes for your automated red-teaming algorithm with model X, that requires Y FLOP of compute. I’m unsure where the problem is.

Marius Hobbhahn 23 Jan 2024 14:49 UTC
LW: 3 AF: 1
0
AF
in reply to: Maxime Riché’s comment on: We need a science of evals
Nice work. Looking forward to that!

Marius Hobbhahn 23 Jan 2024 9:20 UTC
LW: 2 AF: 1
0
AF
in reply to: ryan_greenblatt’s comment on: We need a science of evals
Not quite sure tbh.
1. I guess there is a difference between capability evaluations with prompting and with fine-tuning, e.g. you might be able to use an API for prompting but not fine-tuning. Getting some intuition for how hard users will find it to elicit some behavior through the API seems relevant.
2. I’m not sure how true your suggestion is but I haven’t tried it a lot empirically. But this is exactly the kind of stuff I’d like to have some sort of scaling law or rule for. It points exactly at the kind of stuff I feel like we don’t have enough confidence in. Or at least it hasn’t been established as a standard in evals.

Marius Hobbhahn 23 Jan 2024 9:16 UTC
LW: 4 AF: 1
0
AF
in reply to: ryan_greenblatt’s comment on: We need a science of evals
I somewhat agree with the sentiment. We found it a bit hard to scope the idea correctly. Defining subcategories as you suggest and then diving into each of them is definitely on the list of things that I think are necessary to make progress on them.

I’m not sure the post would have been better if we used a more narrow title, e.g. “We need a science of capability evaluations” because the natural question then would be “But why not for propensity tests or for this other type of eval. I think the broader point of “when we do evals, we need some reason to be confident in the results no matter which kind of eval” seems to be true across all of them.

We need a Science of Evals

Marius Hobbhahn and Jérémy Scheurer

22 Jan 2024 20:30 UTC

66 points

13 comments9 min readLW link

A starter guide for evals

Marius Hobbhahn, Jérémy Scheurer, Mikita Balesni, rusheb and AlexMeinke

8 Jan 2024 18:24 UTC

44 points

2 comments12 min readLW link

(www.apolloresearch.ai)

Marius Hobbhahn 17 Dec 2023 9:34 UTC
LW: 2 AF: 1
0
AF
on: The next decades might be wild
I think this post was a good exercise to clarify my internal model of how I expect the world to look like with strong AI. Obviously, most of the very specific predictions I make are too precise (which was clear at the time of writing) and won’t play out exactly like that but the underlying trends still seem plausible to me. For example, I expect some major misuse of powerful AI systems, rampant automation of labor that will displace many people and rob them of a sense of meaning, AI taking over the digital world years before taking over the physical world (but not more than 5-10 years), humans giving more and more power into the hands of AI, infighting within the AI safety community, and many more of the predictions made in this post.

The main thing I disagree with (as I already updated in April 2023) is that the timelines underlying the post are too long. I now think almost everything is going to happen in at least half of the time presented in the post, e.g. many events in the 2030-2040 section may already happen before 2030.

In general, I can strongly recommend taking a weekend or so to write a similar story for yourselves. I felt like it made many of the otherwise fairly abstract implications of timeline and takeoff models much more salient to me and others who are less in the weeds with formal timeline / takeoff models.

Marius Hobbhahn 17 Dec 2023 9:22 UTC
LW: 2 AF: 1
0
AF
on: Disagreement with bio anchors that lead to shorter timelines
I still stand behind most of the disagreements that I presented in this post. There was one prediction that would make timelines longer because I thought compute hardware progress was slower than Moore’s law. I now mostly think this argument is wrong because it relies on FP32 precision. However, lower precision formats and tensor cores are the norm in ML, and if you take them into account, compute hardware improvements are faster than Moore’s law. We wrote a piece with Epoch on this: https://epochai.org/blog/trends-in-machine-learning-hardware
If anything, my disagreements have become stronger and my timelines have become shorter over time. Even the aggressive model I present in the post seems too conservative for my current views and my median date is 2030 or earlier. I have substantial probability mass on an AI that could automate most current jobs before 2026 which I didn’t have at the time of writing.
I also want to point out that Daniel Kokotajlo, whom I spent some time talking about bio anchors and Tom Davidson’s takeoff model with, seemed to have consistently better intuitions than me (or anyone else I’m aware of) on timelines. The jury is still out there, but so far it looks like reality follows his predictions more than mine. At least in my case, I updated significantly toward shorter timelines multiple times due to arguments he made.

Marius Hobbhahn 16 Dec 2023 21:18 UTC
12 points
on: Nuclear Energy—Good but not the silver bullet we were hoping for
I think I still mostly stand behind the claims in the post, i.e. nuclear is undervalued in most parts of society but it’s not as much of a silver bullet as many people in the rationalist / new liberal bubble would make it seem. It’s quite expensive and even with a lot of research and de-regulation, you may not get it cheaper than alternative forms of energy, e.g. renewables.

One thing that bothered me after the post is that Johannes Ackva (who’s arguably a world-leading expert in this field) and Samuel + me just didn’t seem to be able to communicate where we disagree. He expressed that he thought some of our arguments were wrong but we never got to the crux of the disagreement.

After listening to his appearance on 80k: https://80000hours.org/podcast/episodes/johannes-ackva-unfashionable-climate-interventions/ I feel like I understand the core of the disagreement much better (though I never confirmed with Johannes). He mostly looks at energy through a lens of scale, neglectedness, and traceability, i.e. he’s looking to investigate and push interventions that are most efficient on the margin. On the margin, nuclear seems underinvested and lots of reasonable options are underexplored (e.g. large-scale production of smaller reactors), both Samuel and I would agree with that. However, the claim we were trying to make in the post was that nuclear is already more expensive than renewables and this gap will likely just increase in the future. Thus, it makes sense to, in total, invest more in renewables than nuclear. Also, there were lots of smaller things where I felt like I understood his position much better after listening to the podcast.

Marius Hobbhahn 16 Dec 2023 20:58 UTC
LW: 6 AF: 3
0
AF
on: Trends in GPU price-performance
In a narrow technical sense, this post still seems accurate but in a more general sense, it might have been slightly wrong / misleading.

In the post, we investigated different measures of FP32 compute growth and found that many of them were slower than Moore’s law would predict. This made me personally believe that compute might be growing slower than people thought and most of the progress comes from throwing more money at larger and larger training runs. While most progress comes from investment scaling, I now think the true effective compute growth is probably faster than Moore’s law.

The main reason is that FP32 is just not the right thing to look at in modern ML and we even knew this at the time of writing, i.e. it ignores tensor cores and lower precisions like TF16 or INT8.

I’m a little worried that people who read this post but don’t have any background in ML got the wrong takeaway from the post and we should have emphasized this difference even more at the time. We have written a follow-up post about this recently here: https://epochai.org/blog/trends-in-machine-learning-hardware
I feel like the new post does a better job at explaining where compute progress comes from.

Marius Hobbhahn 16 Dec 2023 20:45 UTC
LW: 11 AF: 5
6
AF
on: Lessons learned from talking to >100 academics about AI safety
I haven’t talked to that many academics about AI safety over the last year but I talked to more and more lawmakers, journalists, and members of civil society. In general, it feels like people are much more receptive to the arguments about AI safety. Turns out “we’re building an entity that is smarter than us but we don’t know how to control it” is quite intuitively scary. As you would expect, most people still don’t update their actions but more people than anticipated start spreading the message or actually meaningfully update their actions (probably still less than 1 in 10 but better than nothing).

Marius Hobbhahn 14 Dec 2023 19:04 UTC
24 points
7
on: Some for-profit AI alignment org ideas
At Apollo, we have spent some time weighing the pros and cons of the for-profit vs. non-profit approach so it might be helpful to share some thoughts.

In short, I think you need to make really sure that your business model is aligned with what increases safety. I think there are plausible cases where people start with good intentions but insufficient alignment between the business model and the safety research that would be the most impactful use of their time where these two goals diverge over time.

For example, one could start as an organization that builds a product but merely as a means to subsidize safety research. However, when they have to make tradeoffs, these organizations might choose to focus more talent on product because it is instrumentally useful or even necessary for the survival of the company. The forces that pull toward profit (e.g. VCs, status, growth) are much more tangible than the forces pulling towards safety. Thus, I could see many ways in which this goes wrong.

A second example: Imagine an organization that builds evals and starts with the intention of evaluating the state-of-the-art models because they are most likely to be risky. Soon they realize that there are only a few orgs that build the best models and there are a ton of customers that work with non-frontier systems who’d be willing to pay them a lot of money to build evals for their specific application. Thus, the pull toward doing less impactful but plausibly more profitable work is stronger than the pull in the other direction.

Lastly, one thing I’m somewhat afraid of is that it’s very easy to rationalize all of these decisions in the moment. It’s very easy to say that a strategic shift toward profit-seeking is instrumentally useful for the organization, growth, talent, etc. And there are cases in which this is true. However, it’s easy to continue such a rationalization spree and maneuver yourself into some nasty path dependencies. Some VCs only came on for the product, some hires only want to ship stuff, etc.

In conclusion, I think it’s possible to do profitable safety work but it’s hard. You should be confident that your two goals are compatible when things get hard, you should have a team and culture that can resist the pulls and even produce counter pulls when you’re not doing safety-relevant work and you should only work with funders who fully understand and buy into your true mission.
What links here?

Marius Hobbhahn 15 Nov 2023 19:15 UTC
LW: 5 AF: 3
0
AF
in reply to: Neel Nanda’s comment on: Experiences and learnings from both sides of the AI safety job market
Thx. updated:

”You might not be there yet” (though as Neel points out in the comments, CV screening can be a noisy process)~~“You clearly aren’t there yet”~~

Experiences and learnings from both sides of the AI safety job market

Marius Hobbhahn15 Nov 2023 15:40 UTC

109 points

4 comments18 min readLW link