RyanCarey

Karma: 1,656

RyanCarey 19 May 2025 20:21 UTC
6 points
2
in reply to: mattmacdermott’s comment on: Mikhail Samin’s Shortform
I agree with all of this! A related shortform here.

RyanCarey 10 Mar 2025 1:07 UTC
8 points
8
in reply to: abramdemski’s comment on: A Bear Case: My Predictions Regarding AI Progress
Is GPT4.5′s ?10T parameters really a “small fraction” of the human brain’s 80B neurons and 100T synapses?

RyanCarey 5 Jan 2025 16:29 UTC
48 points
5
on: Reasons for and against working on technical AI safety at a frontier AI lab
This covers pretty well the altruistic reasons for/against working on technical AI safety at a frontier lab. I think the main reason for working at a frontier lab, however, is not altruistic. It’s that it offers more money and status than working elsewhere—so it would be nice to be clear-eyed about this.
To be clear, on balance, I think it’s pretty reasonable to want to work at a frontier lab, even based on the altruistic considerations alone.
What seems harder to justify altruistically, however, is why so many of us work on, and fund the same kinds of safety work that is done at frontier AI labs outside of frontier labs. After all, many of the downsides are the same: low neglectedness, safetywashing, shortening timelines, and benefiting (via industry grant programs) from the success of AI labs. Granted, it’s not impossible to get hired to a frontier lab later. But on balance, I’m not sure that the altruistic impact is so good. I do think, however, that it is a pretty good option on non-altruistic grounds, given the current abundance of funding.

RyanCarey 13 Mar 2024 10:11 UTC
15 points
12
in reply to: LawrenceC’s comment on: LawrenceC’s Shortform
I don’t mean this as a criticism—you can both be right—but this is extremely correlated to the updates made by the average Bay Area x-risk reduction-enjoyer over the past 5-10 years, to the extent that it almost could serve as a summary.

Reward Hacking from a Causal Perspective

tom4everitt, Francis Rhys Ward, sbenthall, James Fox, mattmacdermott and RyanCarey

21 Jul 2023 18:27 UTC

29 points

6 comments7 min readLW link

Incentives from a causal perspective

tom4everitt, James Fox, RyanCarey, mattmacdermott, sbenthall and Jonathan Richens

10 Jul 2023 17:16 UTC

27 points

0 comments6 min readLW link

RyanCarey 22 Jun 2023 14:24 UTC
7 points
0
in reply to: Alex_Altair’s comment on: Causality: A Brief Introduction
It may be useful to know that if events all obey the Markov property (they are probability distributions, conditional on some set of causal parents), then the Reichenbach Common Cause Principle follows (by d-separation arguments) as a theorem. So any counterexamples to RCCP must violate the Markov property as well.
There’s also a lot of interesting discussion here.

Causality: A Brief Introduction

tom4everitt, Lewis Hammond, Jonathan Richens, Francis Rhys Ward, RyanCarey, sbenthall and James Fox

20 Jun 2023 15:01 UTC

49 points

18 comments6 min readLW link

Introduction to Towards Causal Foundations of Safe AGI

tom4everitt, Lewis Hammond, Francis Rhys Ward, RyanCarey, James Fox, mattmacdermott and sbenthall

12 Jun 2023 17:55 UTC

73 points

6 comments4 min readLW link

RyanCarey 27 Jan 2023 22:21 UTC
LW: 6 AF: 4
0
AF
on: Discovering Agents
The idea that “Agents are systems that would adapt their policy if their actions influenced the world in a different way.” works well on mechanised CIDs whose variables are neatly divided into object-level and mechanism nodes: we simply check for a path from a utility function F_U to a policy Pi_D. But to apply this to a physical system, we would need a way to obtain such a partition those variables. Specifically, we need to know (1) what counts as a policy, and (2) whether any of its antecedents count as representations of “influence” on the world (and after all, antecedents A of the policy can only be ‘representations’ of the influence, because in the real world, the agent’s actions cannot influence themselves by some D->A->Pi->D loop). Does a spinal reflex count as a policy? Does an ant’s decision to fight come from a representation of a desire to save its queen? How accurate does its belief about the forthcoming battle have to be before this representation counts? I’m not sure the paper answers these questions formally, nor am I sure that it’s even possible to do so. These questions don’t seem to have objectively right or wrong answers.
So we don’t really have any full procedure for “identifying agents”. I do think we gain some conceptual clarity. But on my reading, this clear definition serves to crystallise how hard it is to identify agents, moreso than it shows practically how it can be done.
(NB. I read this paper months ago, so apologies if I’ve got any of the details wrong.)

RyanCarey 8 Dec 2022 19:37 UTC
2 points
0
on: Where to be an AI Safety Professor
Nice. I’ve previously argued similarly that if going for tenure, AIS researchers might places that are strong in departments other than their own, for inter-departmental collaboration. This would have similar implications to your thinking about recruiting students from other departments. But I also suggested we should favour capital cities, for policy input, and EA hubs, to enable external collaboration. But tenure may be somewhat less attractive for AIS academics, compared to usual, in that given our abundant funding, we might have reason to favour Top-5 postdocs over top-100 tenure.

RyanCarey 1 Dec 2022 23:57 UTC
3 points
0
on: RyanCarey’s Shortform
Feature suggestion. Using highlighting for higher-res up/downvotes and (dis)agreevotes.
Sometimes you want to indicate what part of a comment you like or dislike, but can’t be bothered writing a comment response. In such cases, it would be nice if you could highlight the portion of text that you like/dislike, and for LW to “remember” that highlighting and show it to other users. Concretely, when you click the like/dislike button, the website would remember what text you had highlighted within that comment. Then, if anyone ever wants to see that highlighting, they could hover their mouse over the number of likes, and LW would render the highlighting in that comment.
The benefit would be that readers can conveniently give more nuanced feedback, and writers can have a better understanding of how readers feel about their content. It would stop this nagging wrt “why was this downvoted”, and hopefully reduce the extent to which people talk past each other when arguing.

RyanCarey 2 Nov 2022 21:54 UTC
LW: 6 AF: 3
4
AF
on: AI X-risk >35% mostly based on a recent peer-reviewed argument
The title suggests (weakly perhaps) that the estimates themselves peer-reviewed. Would be clearer to write “building on” peer reviewed argument, or similar.

RyanCarey 29 Oct 2022 4:20 UTC
7 points
0
in reply to: orellanin’s comment on: Zoe Curzi’s Experience with Leverage Research
Hi Orellanin,
In the early stages, I had in mind that the more info any individual anon-account revealed, the more easily one could infer what time they spent at Leverage, and therefore their identity. So while I don’t know for certain, I would guess that I created anonymoose to disperse this info across two accounts.
When I commented on the Basic Facts post as anonymoose, It was not my intent to contrive a fake conversation between two entities with separate voices. I think this is pretty clear from anonymoose’s comment, too—it’s in the same bulleted and dry format that throwaway uses, so it’s an immediate possibility that throwaway and anonymoose are one and the same. I don’t know why I used anonymoose there. Maybe due to carelessness, or maybe because I lost access to throwaway. (I know that at one time, an update to the forum login interface did rob me of access to my anon-account, but not sure if this was when that happened).
What links here?
- The Ethics of Posting: Real Names, Pseudonyms, and Burner Accounts by Sarah Levin (EA Forum; 9 Mar 2023 22:53 UTC; 63 points)

RyanCarey 13 Oct 2022 13:04 UTC
2 points
0
in reply to: RyanCarey’s comment on: Why I think there’s a one-in-six chance of an imminent global nuclear war
“A Russian nuclear strike would change the course of the conflict and almost certainly provoke a “physical response” from Ukraine’s allies and potentially from the North Atlantic Treaty Organization, a senior NATO official said on Wednesday.

Any use of nuclear weapons by Moscow would have “unprecedented consequences” for Russia, the official said on the eve of a closed-door meeting of NATO’s nuclear planning group on Thursday.

Speaking on condition of anonymity, he said a nuclear strike by Moscow would “almost certainly be drawing a physical response from many allies, and potentially from NATO itself”. “-Reuters

https://news.yahoo.com/russian-nuclear-strike-almost-certainly-144246235.html″

I have heard of talk that the US might instead arm Ukraine with tactical nukes of its own, although I think that would be at least comparably risky as military retaliation.

RyanCarey 9 Oct 2022 21:40 UTC
7 points
1
in reply to: Ege Erdil’s comment on: Why I think there’s a one-in-six chance of an imminent global nuclear war
The reasoning is that retaliating is US doctrine—they generally respond to hostile actions in-kind, to deter them. If Ukraine got nuked, the level of outrage would place intense pressure on Biden to do something, and the hawks would become a lot louder than the doves, similar to after the 9/11 attacks. In the case of Russia, the US has exhausted most non-military avenues already. And US is a very militaristic country—they have many times bombed countries (Syria, Iraq, Afghanistan, Libya) for much less. So military action just seems very likely. (Involving all of NATO or not, as michel says.)

RyanCarey 9 Oct 2022 9:19 UTC
5 points
−2
in reply to: Ege Erdil’s comment on: Why I think there’s a one-in-six chance of an imminent global nuclear war
I think your middle number is clearly too low. The risk scenario does not require that NATO trigger article 5 necessarily, but just that they carry out a strategically significant military response, like eliminating Russia’s Black Sea Fleet, nuking, or creating a no-fly zone. And Max’s 80% makes more sense than your 50% for he union of these possibilities, because it is hard to imagine that the US would stand down without penalising the use of nukes.

I would be at maybe .2*.8*.15=.024 for this particular chain of events leading to major US-Russia nuclear war.

RyanCarey 2 Sep 2022 8:22 UTC
2 points
0
in reply to: Ege Erdil’s comment on: RyanCarey’s Shortform
All of these seem to be good points, although I haven’t given up on liquidity subsidy schemes yet.

RyanCarey 25 Aug 2022 21:13 UTC
LW: 14 AF: 8
10
AF
on: Your posts should be on arXiv
Some reports are not publicised in order not to speed up timelines. And ELK is a bit rambly—I wonder if it will get subsumed by much better content within 2yr. But I do largely agree.

RyanCarey 4 Aug 2022 7:49 UTC
LW: 8 AF: 4
3
AF
on: chinchilla’s wild implications
It would be useful to have a more descriptive title, like “Chinchilla’s implications for data bottlenecks” or something.

RyanCarey

Re­ward Hack­ing from a Causal Perspective

In­cen­tives from a causal perspective

Causal­ity: A Brief Introduction

In­tro­duc­tion to Towards Causal Foun­da­tions of Safe AGI

Reward Hacking from a Causal Perspective

Incentives from a causal perspective

Causality: A Brief Introduction

Introduction to Towards Causal Foundations of Safe AGI