Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
georgia_berg
Karma:
30
All
Posts
Comments
New
Top
Old
Page
1
Sandbagging: How Models Use Reward-Hacking to Downplay Their True Capabilities
georgia_berg
3 Nov 2025 19:59 UTC
7
points
0
comments
1
min read
LW
link
Predicting Shifts in AI-driven Security Risks
georgia_berg
3 Nov 2025 19:55 UTC
2
points
0
comments
1
min read
LW
link
Introduction to Corrigibility
georgia_berg
3 Nov 2025 19:54 UTC
2
points
0
comments
1
min read
LW
link
What the Luddites Can Teach Us About Societal Response to AI
georgia_berg
3 Nov 2025 19:53 UTC
2
points
0
comments
1
min read
LW
link
Agentic property-based testing: finding bugs across the Python ecosystem
georgia_berg
3 Nov 2025 19:51 UTC
2
points
0
comments
1
min read
LW
link
AI Policy Tuesday: Predicting Shifts in AI-driven Security Risks
georgia_berg
3 Nov 2025 16:42 UTC
2
points
0
comments
1
min read
LW
link
AI Safety Thursday: Monitoring LLMs for deceptive behaviour using probes
georgia_berg
3 Nov 2025 16:40 UTC
2
points
0
comments
1
min read
LW
link
AI Policy Tuesday: Open Global Investment as a Governance Model for AGI
georgia_berg
3 Nov 2025 16:38 UTC
2
points
0
comments
1
min read
LW
link
AI Safety Thursday: Modeling and Detecting Deceptive Alignment
georgia_berg
26 Sep 2025 17:52 UTC
6
points
0
comments
1
min read
LW
link
AI Safety Thursday: The Limitations of Reinforcement Learning for LLMs in Achieving AI for Science
georgia_berg
26 Sep 2025 17:51 UTC
6
points
0
comments
1
min read
LW
link
AI Policy Tuesday: Will the spectre of transformative AI be more challenging than the real thing?
georgia_berg
26 Sep 2025 17:45 UTC
6
points
0
comments
1
min read
LW
link
AI Policy Tuesday: Debunking the US-Chinese AGI Race
georgia_berg
26 Sep 2025 17:44 UTC
6
points
0
comments
1
min read
LW
link
AI Policy Tuesday: Redlines for AI
georgia_berg
26 Sep 2025 17:43 UTC
6
points
0
comments
1
min read
LW
link
“If Anyone Builds It, Everyone Dies” Toronto Reading Group
georgia_berg
17 Sep 2025 15:46 UTC
1
point
0
comments
1
min read
LW
link
AI Safety Thursday: Chain-of-Thought Monitoring for AI Control
georgia_berg
16 Sep 2025 13:50 UTC
1
point
0
comments
1
min read
LW
link
AI Policy Tuesday: The Concept of Political Space and AI Safety
georgia_berg
and
Mario Gibney
16 Sep 2025 13:50 UTC
1
point
0
comments
1
min read
LW
link
AI Safety Thursday: Superintelligence Endgames
georgia_berg
12 Sep 2025 16:49 UTC
1
point
0
comments
1
min read
LW
link
AI Safety Thursday: Technical AI Governance—Motivations, Challenges, and Advice
georgia_berg
12 Sep 2025 16:44 UTC
1
point
0
comments
1
min read
LW
link
AI Policy Tuesday: The Case for Regulating AI Companies, Not AI Models
georgia_berg
12 Sep 2025 16:39 UTC
1
point
0
comments
1
min read
LW
link
AI Safety Thursday: Attempts and Successes of LLMs Persuading on Harmful Topics
georgia_berg
12 Sep 2025 16:39 UTC
1
point
0
comments
1
min read
LW
link
Back to top
Next