habryka(Oliver Habryka)

Karma: 32,623

Running Lightcone Infrastructure, which runs LessWrong. You can reach me at habryka@lesswrong.com

habryka 18 May 2024 20:04 UTC
12 points
4
in reply to: Pablo’s comment on: Stephen Fowler’s Shortform
I have indeed been publicly advocating against the inside game strategy at labs for many years (going all the way back to 2018), predicting it would fail due to incentive issues and have large negative externalities due to conflict of interest issues. I could dig up my comments, but I am confident almost anyone who I’ve interfaced with at the labs, or who I’ve talked to about any adjacent topic in leadership would be happy to confirm.

habryka 18 May 2024 19:37 UTC
4 points
14
in reply to: ryan_greenblatt’s comment on: Stephen Fowler’s Shortform
Oh, weird. I always thought “ETA” means “Edited To Add”.

habryka 18 May 2024 17:18 UTC
3 points
0
in reply to: Yonatan Cale’s comment on: simeon_c’s Shortform
Sure, I’ll try to post here if I know of a clear opportunity to donate to either.

habryka 18 May 2024 15:29 UTC
25 points
9
in reply to: keltan’s comment on: Stephen Fowler’s Shortform
I would be happy to defend roughly the position above (I don’t agree with all of it, but agree with roughly something like “the strategy of trying to play the inside game at labs was really bad, failed in predictable ways, and has deeply eroded trust in community leadership due to the adversarial dynamics present in such a strategy and many people involved should be let go”).

I do think most people who disagree with me here are under substantial confidentiality obligations and de-facto non-disparagement obligations (such as really not wanting to imply anything bad about Anthropic or wanting to maintain a cultivated image for policy purposes) so that it will be hard to find a good public debate partner, but it isn’t impossible.

habryka 18 May 2024 4:53 UTC
29 points
15
on: DeepMind’s “Frontier Safety Framework” is weak and unambitious
The document doesn’t specify whether “deployment” includes internal deployment. (This is important because maybe lots of risk comes from the lab using AIs internally to do AI development.)
This seems like such an obvious and crucial distinction that I felt very surprised when the framework didn’t disambiguate between the two.

habryka 18 May 2024 3:55 UTC
11 points
6
in reply to: AlexMennen’s comment on: simeon_c’s Shortform
Yeah, at the time I didn’t know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).

habryka 17 May 2024 4:57 UTC
6 points
0
on: Is there a place to find the most cited LW articles of all time?
We don’t have a live count, but we have a one-time analysis from late 2023: https://www.lesswrong.com/posts/WYqixmisE6dQjHPT8/2022-and-all-time-posts-by-pingback-count
My guess is not much has changed since then, so I think that’s basically the answer.

habryka 17 May 2024 3:03 UTC
2 points
0
on: Is there a place to find the most cited LW articles of all time?
What do you mean by “cited”? Do you mean “articles references in other articles on LW” or “articles cited in academic journals” or some other definition?

habryka 17 May 2024 2:56 UTC
7 points
7
on: Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
I am quite interested in takes from various people in alignment on this agenda. I’ve engaged with both Davidad’s and Bengio’s stuff a bunch in the last few months, and I feel pretty confused (and skeptical) about a bunch of it, and would be interested in reading more of what other people have to say.

habryka 15 May 2024 20:52 UTC
9 points
5
in reply to: Ryan Kidd’s comment on: MATS Winter 2023-24 Retrospective
This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both).
The second hypothesis here seems much more likely (and my guess is your mentors would agree). My guess is after properly controlling for that you would find a mild to moderate negative correlation here.
But also, more importantly, the set of scholars from which MATS is drawing is heavily skewed towards the kind of person who would work at scaling labs (especially since funding has been heavily skewing towards funding the kind of research that can occur at scaling labs).

habryka 15 May 2024 20:48 UTC
13 points
12
in reply to: Ryan Kidd’s comment on: MATS Winter 2023-24 Retrospective
implicit framing of the average scaling lab safety researcher we support as being relatively unconcerned about value alignment or the positive impact of their research
Huh, not sure where you are picking this up. I am of course very concerned about the ability of researchers at scaling labs being capable of evaluating their positive impact in respect to their choice of working at a scaling lab (their job does after all depend on them not believing that is harmful), but of course they are not unconcerned about their positive impact.

habryka 15 May 2024 19:56 UTC
17 points
12
in reply to: Ryan Kidd’s comment on: MATS Winter 2023-24 Retrospective
In Winter 2023-24, our most empirical research dominated cohort, mentors rated the median scholar’s value alignment at ⁸⁄₁₀ and 85% of scholars were rated ⁶⁄₁₀ or above, where ⁵⁄₁₀ was “Motivated in part, but would potentially switch focus entirely if it became too personally inconvenient.”
Wait, aren’t many of those mentors themselves working at scaling labs or working very closely with them? So this doesn’t feel like a very comforting response to the concern of “I am worried these people want to work at scaling labs because it’s a high-prestige and career-advancing thing to do”, if the people whose judgements you are using to evaluate have themselves chosen the exact path that I am concerned about.

habryka 15 May 2024 5:00 UTC
63 points
31
in reply to: Teun van der Weij’s comment on: Ilya Sutskever and Jan Leike resign from OpenAI
Cade Metz was the NYT journalist who doxxed Scott Alexander. IMO he has also displayed a somewhat questionable understanding of journalistic competence and integrity, and seems to be quite into narrativizing things in a weirdly adversarial way (I don’t think it’s obvious how this applies to this article, but it seems useful to know when modeling the trustworthiness of the article).

habryka 15 May 2024 4:41 UTC
3 points
2
on: introduction to cancer vaccines
Promoted to curated: Cancer vaccines are cool. I didn’t quite realize how cool they were before this post, and this post is a quite accessible intro into them.

habryka 12 May 2024 19:54 UTC
2 points
0
in reply to: quila’s comment on: Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)
We are experimenting with bolding the date on posts that are new and leaving it thinner on posts that are old, though feedback so far hasn’t been super great.

habryka 12 May 2024 18:43 UTC
2 points
0
in reply to: Cole Killian’s comment on: The Story of “I Have Been A Good Bing”
Hmm, most of the ordering should be the same. Here is the ordering on Youtube Music:
The Road To Wisdom
Moloch
Thought That Faster (feat. Eliezer Yudkowsky)
The Litany of Tarrrrrski (feat. Eliezer Yudkowsky)
The Litany of Gendlin
Dath Ilan’s Song (feat. Eliezer Yudkowsky)
Half An Hour Before Dawn In San Francisco (feat. Scott Alexander)
AGI and the EMH (mit Basil Halperin, J. Zachary Mazlish & Trevor Chow)
First they came for the epistemology (feat. Michael Vassar)
Prime Factorization (feat. Scott Alexander)
We Do Not Wish to Advance (feat. Anthropic)
Nihil Supernum (feat. Godric Gryffindor)
More Dakka (feat. Zvi Mowshowitz)
FHI at Oxford (feat. Nick Bostrom)
Answer to Job (feat. Scott Alexander)
Which is pretty similar to the order here. The folk album is in a slightly different order (which I do think is worse and we sadly can’t change), but otherwise things are the same.

habryka 11 May 2024 20:24 UTC
11 points
3
in reply to: Orborde’s comment on: simeon_c’s Shortform
My current best guess is that actually cashing out the vested equity is tied to an NDA, but I am really not confident. OpenAI has a bunch of really weird equity arrangements.

habryka 11 May 2024 18:46 UTC
8 points
4
on: [unknown]
Hmm, I have sympathy for this tag, but also I do feel like the tagging system probably shouldn’t implicitly carry judgement. Seems valuable to keep your map separate from your incentives and all that.
Happy to discuss here what to do. I do think allowing people to somehow tag stuff that seems like it increases capabilities in some dangerous way seems good, but I do think it should come with less judgement in the site’s voice (judgement in a user’s voice is totally fine, but the tagging system speaks more with the voice of the site than any individual user).

habryka 11 May 2024 18:43 UTC
3 points
0
in reply to: the gears to ascension’s comment on: Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)
Oh, yeah, admins currently have access to a purely recommended view, and I prefer it. I would be in favor of making that accessible to users (maybe behind a beta flag, or maybe not, depending on uptake).

habryka 10 May 2024 22:58 UTC
3 points
1
in reply to: Seth Herd’s comment on: Habryka’s Shortform Feed
I think the priors here are very low, so while I agree it looks suspicious, I don’t think it’s remotely suspicious enough to have the correct posterior be “about zero chance that wasn’t murder”. Corporations, at least in the U.S. really very rarely murder people.