aog

Karma: 1,588

Digital sentience funding opportunities: Support for applied work and research

aog and zdgroff

May 29, 2025, 3:22 PM

21 points

0 comments4 min readLW link

aog May 1, 2025, 12:42 AM
4 points
0
in reply to: kave’s comment on: Research Priorities for Hardware-Enabled Mechanisms (HEMs)
Thanks for the heads up. I’ve edited the title and introduction to better indicate that this content might be interesting to someone even if they’re not looking for funding.

Research Priorities for Hardware-Enabled Mechanisms (HEMs)

aogApr 30, 2025, 5:43 PM

17 points

2 comments15 min readLW link

(www.longview.org)

aog Apr 23, 2025, 8:05 AM
2 points
0
in reply to: Neel Nanda’s comment on: aog’s Shortform
Yeah I think that’d be reasonable too. You could talk about these clusters at many different levels of granularity, and there are tons I haven’t named.

aog Apr 21, 2025, 11:23 AM
5 points
3
in reply to: Chris_Leong’s comment on: aog’s Shortform
If we can put aside for a moment the question of whether Matthew Barnett has good takes, I think it’s worth noting that this reaction reminds me of how outsiders sometimes feel about effective altruism or rationalism:
I guess I feel that his posts tend to be framed in a really strange way such that, even though there’s often some really good research there, it’s more likely to confuse the average reader than anything else and even if you can untangle the frames, I usually don’t find worth it the time.
The root cause may be that there is too much inferential distance, too many differences of basic worldview assumptions, to easily have a productive conversation. The argument contained in any given post might rely on background assumptions that would take a long time to explain and debate. It can be very difficult to have a productive conversation with someone who doesn’t share your basic worldview. That’s one of the reasons that LessWrong encourages users to read foundational material on rationalism before commenting or posting. It’s also why scalable oversight researchers like having places to talk to each other about the best approaches to LLM-assisted reward generation, without needing to justify each time whether that strategy is doomed from the start. And it’s part of why I think it’s useful to create scenes that operate on different worldview assumptions: it’s worth working out the implications of specific beliefs without needing to justify those beliefs each time.
Of course, this doesn’t mean that Matthew Barnett has good takes. Maybe you find his posts confusing not because of inferential distance, but because they’re illogical and wrong. Personally I think they’re good, and I wouldn’t have written this post if I didn’t. But I haven’t actually argued that here, and I don’t really want to—that’s better done in the comments on his posts.

aog’s Shortform

aogApr 19, 2025, 10:07 PM

6 points

21 comments LW link

aog Apr 19, 2025, 10:07 PM
58 points
22
on: aog’s Shortform
Shoutout to Epoch for creating its own intellectual culture.
Views on AGI seem suspiciously correlated to me, as if many people’s views are more determined by diffusion through social networks and popular writing, rather than independent reasoning. This isn’t unique to AGI. Most individual people are not capable of coming up with useful worldviews on their own. Often, the development of interesting, coherent, novel worldviews benefits from an intellectual scene.
What’s an intellectual scene? It’s not just an idea. Usually it has a set of complementary ideas, each of which make more sense with the others in place. Often there’s a small number of key thinkers who come up with many new ideas, and a broader group of people who agree with the ideas, further develop them, and follow their implied call to action. Scenes benefit from shared physical and online spaces, though they can also exist in social networks without a central hub. Sometimes they professionalize, offering full-time opportunities to develop the ideas or act on them. Members of a scene are shielded from pressure to defer to others who do not share their background assumptions, and therefore feel freer to come up with new ideas that would be unusual to outsiders, but make sense within the scene’s shared intellectual framework. These conditions seem to raise the likelihood of novel intellectual progress.
There are many examples of intellectual scenes within AI risk, at varying levels of granularity and cohesion. I’ve been impressed with Davidad recently for putting forth a set of complementary ideas around Safeguarded AI and FlexHEGs, and creating opportunities for people who agree with his ideas to work on them. Perhaps the most influential scenes within AI risk are the MIRI / LessWrong / Conjecture / Control AI / Pause AI cluster, united by high p(doom) and focus on pausing or stopping AI development, and the Constellation / Redwood / METR / Anthropic cluster, focused on prosaic technical safety techniques and working with AI labs to make the best of the current default trajectory. (Though by saying these clusters have some shared ideas / influences / spaces, I don’t mean to deny the fact that most people within those clusters disagree on many important questions.) Rationalism and effective altruism are their own scenes, as are the conservative legal movement, social justice, new atheism, progress studies, neoreaction, and neoliberalism.
Epoch has its own scene, with a distinct set of thinkers, beliefs, and implied calls to action. Matthew Barnett has written the most about these ideas publicly, so I’d encourage you to read his posts on these topics, though my understanding is that many of these ideas were developed with Tamay, Ege, Jaime, and others. Key ideas include long timelines, slow takeoff, eventual explosive growth, optimism about alignment, concerns about overregulation, concerns about hawkishness towards China, advocating the likelihood of AI sentience and desirability of AI rights, debating the desirability of different futures, and so on. These ideas motivate much of Epoch’s work, as well as Mechanize. Importantly, the people in this scene don’t seem to mind much that many others (including me) disagree with them.
I’d like to see more intellectual scenes that seriously think about AGI and its implications. There are surely holes in our existing frameworks, and it can be hard for people operating within them to spot. Creating new spaces with different sets of shared assumptions seems like it could help.

aog Mar 7, 2025, 4:54 PM
1 point
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
Curious what you think of arguments (1, 2) that AIs should be legally allowed to own property and participate in our economic system, thus giving misaligned AIs an alternative prosocial path to achieving their goals.

aog Feb 21, 2025, 4:01 PM
2 points
0
in reply to: james oofou’s comment on: Daniel Kokotajlo’s Shortform
How do we know it was 3x? (If true, I agree with your analysis)

aog Feb 21, 2025, 8:38 AM
2 points
0
in reply to: Vladimir_Nesov’s comment on: Daniel Kokotajlo’s Shortform
Do you take Grok 3 as an update on the importance of hardware scaling? If xAI used 5-10x more compute than any other model (which seems likely but not necessarily true?), then the fact that it wasn’t discontinuously better than other models seems like evidence against the importance of hardware scaling.

aog Feb 6, 2025, 7:30 PM
7 points
3
in reply to: davekasten’s comment on: nikola’s Shortform
I’m surprised they list bias and disinformation. Maybe this is a galaxy brained attempt to discredit AI safety by making it appear left-coded, but I doubt it. Seems more likely that x-risk focused people left the company while traditional AI ethics people stuck around and rewrote the website.

aog Feb 4, 2025, 4:52 AM
8 points
0
on: Meta: Frontier AI Framework
I’m very happy to see Meta publish this. It’s a meaningfully stronger commitment to avoiding deployment of dangerous capabilities than I expected them to make. Kudos to the people who pushed for companies to make these commitments and helped them do so.
One concern I have with the framework is that I think the “high” vs. “critical” risk thresholds may claim a distinction without a difference.
Deployments are high risk if they provide “significant uplift towards execution of a threat scenario (i.e. significantly enhances performance on key capabilities or tasks needed to produce a catastrophic outcome) but does not enable execution of any threat scenario that has been identified as potentially sufficient to produce a catastrophic outcome.” They are critical risk if they “uniquely enable the execution of at least one of the threat scenarios that have been identified as potentially sufficient to produce a catastrophic outcome.” The framework requires that threats be “net new,” meaning “The outcome cannot currently be realized as described (i.e. at that scale / by that threat actor / for that cost) with existing tools and resources.”
But what then is the difference between high risk and critical risk? Unless a threat scenario is currently impossible, any uplift towards achieving it more efficiently also “uniquely enables” it under a particular budget or set of constraints. For example, it is already possible for an attacker to create bio-weapons, as demonstrated by the anthrax attacks—so any cost reductions or time savings for any part of that process uniquely enable execution of that threat scenario within a given budget or timeframe. Thus it seems that no model can be classified as high risk if it provides uplift on an already-achievable threat scenario—instead, it must be classified as critical risk.
Does that logic hold? Am I missing something in my reading of the document?

aog Feb 1, 2025, 5:46 PM
5 points
2
in reply to: ryan_greenblatt’s comment on: Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Curious what you think of these arguments, which offer objections to the strategy stealing assumption in this setting, instead arguing that it’s difficult for capital owners to maintain their share of capital ownership as the economy grows and technology changes.

aog Jan 20, 2025, 9:09 PM
2 points
−1
on: the case for CoT unfaithfulness is overstated
DeepSeek-R1 naturally learns to switch into other languages during CoT reasoning. When developers penalized this behavior, performance dropped. I think this suggests that the CoT contained hidden information that cannot be easily verbalized in another language, and provides evidence against the hope that reasoning CoT will be highly faithful by default.

aog Nov 14, 2024, 2:08 PM
2 points
0
in reply to: gwern’s comment on: o1 is a bad idea
Wouldn’t that conflict with the quote? (Though maybe they’re not doing what they’ve implied in the quote)

aog Nov 13, 2024, 4:24 PM
8 points
2
in reply to: SoerenMind’s comment on: o1 is a bad idea
Process supervision seems like a plausible o1 training approach but I think it would conflict with this:
We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to “read the mind” of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought.
I think it might just be outcome-based RL, training the CoT to maximize the probability of correct answers or maximize human preference reward model scores or minimize next-token entropy.

aog Sep 23, 2024, 6:32 PM
3 points
0
in reply to: habryka’s comment on: Wei Dai’s Shortform
This is my impression too. See e.g. this recent paper from Google, where LLMs critique and revise their own outputs to improve performance in math and coding.

aog Aug 23, 2024, 1:04 AM
15 points
9
in reply to: habryka’s comment on: Zach Stein-Perlman’s Shortform
Agreed, sloppy phrasing on my part. The letter clearly states some of Anthropic’s key views, but doesn’t discuss other important parts of their worldview. Overall this is much better than some of their previous communications and the OpenAI letter, so I think it deserves some praise, but your caveat is also important.

aog Aug 22, 2024, 11:48 PM
15 points
5
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform
Really happy to see the Anthropic letter. It clearly states their key views on AI risk and the potential benefits of SB 1047. Their concerns seem fair to me: overeager enforcement of the law could be counterproductive. While I endorse the bill on the whole and wish they would too (and I think their lack of support for the bill is likely partially influenced by their conflicts of interest), this seems like a thoughtful and helpful contribution to the discussion.

aog Aug 6, 2024, 3:28 PM
13 points
10
in reply to: Orpheus16’s comment on: Akash’s Shortform
I think there’s a decent case that SB 1047 would improve Anthropic’s business prospects, so I’m not sure this narrative makes sense. On one hand, SB 1047 might make it less profitable to run an AGI company, which is bad for Anthropic’s business plan. But Anthropic is perhaps the best positioned of all AGI companies to comply with the requirements of SB 1047, and might benefit significantly from their competitors being hampered by the law.
The good faith interpretation of Anthropic’s argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency.

aog

Digi­tal sen­tience fund­ing op­por­tu­ni­ties: Sup­port for ap­plied work and research

Re­search Pri­ori­ties for Hard­ware-En­abled Mechanisms (HEMs)

aog’s Shortform

Digital sentience funding opportunities: Support for applied work and research

Research Priorities for Hardware-Enabled Mechanisms (HEMs)