RSS

georgia_berg

Karma: 30

Sand­bag­ging: How Models Use Re­ward-Hack­ing to Down­play Their True Capabilities

georgia_berg3 Nov 2025 19:59 UTC
7 points
0 comments1 min readLW link

Pre­dict­ing Shifts in AI-driven Se­cu­rity Risks

georgia_berg3 Nov 2025 19:55 UTC
2 points
0 comments1 min readLW link

In­tro­duc­tion to Corrigibility

georgia_berg3 Nov 2025 19:54 UTC
2 points
0 comments1 min readLW link

What the Lud­dites Can Teach Us About So­cietal Re­sponse to AI

georgia_berg3 Nov 2025 19:53 UTC
2 points
0 comments1 min readLW link

Agen­tic prop­erty-based test­ing: find­ing bugs across the Python ecosystem

georgia_berg3 Nov 2025 19:51 UTC
2 points
0 comments1 min readLW link

AI Policy Tues­day: Pre­dict­ing Shifts in AI-driven Se­cu­rity Risks

georgia_berg3 Nov 2025 16:42 UTC
2 points
0 comments1 min readLW link

AI Safety Thurs­day: Mon­i­tor­ing LLMs for de­cep­tive be­havi­our us­ing probes

georgia_berg3 Nov 2025 16:40 UTC
2 points
0 comments1 min readLW link

AI Policy Tues­day: Open Global In­vest­ment as a Gover­nance Model for AGI

georgia_berg3 Nov 2025 16:38 UTC
2 points
0 comments1 min readLW link

AI Safety Thurs­day: Model­ing and De­tect­ing De­cep­tive Alignment

georgia_berg26 Sep 2025 17:52 UTC
6 points
0 comments1 min readLW link

AI Safety Thurs­day: The Limi­ta­tions of Re­in­force­ment Learn­ing for LLMs in Achiev­ing AI for Science

georgia_berg26 Sep 2025 17:51 UTC
6 points
0 comments1 min readLW link

AI Policy Tues­day: Will the spec­tre of trans­for­ma­tive AI be more challeng­ing than the real thing?

georgia_berg26 Sep 2025 17:45 UTC
6 points
0 comments1 min readLW link

AI Policy Tues­day: De­bunk­ing the US-Chi­nese AGI Race

georgia_berg26 Sep 2025 17:44 UTC
6 points
0 comments1 min readLW link

AI Policy Tues­day: Redlines for AI

georgia_berg26 Sep 2025 17:43 UTC
6 points
0 comments1 min readLW link

“If Any­one Builds It, Every­one Dies” Toronto Read­ing Group

georgia_berg17 Sep 2025 15:46 UTC
1 point
0 comments1 min readLW link

AI Safety Thurs­day: Chain-of-Thought Mon­i­tor­ing for AI Control

georgia_berg16 Sep 2025 13:50 UTC
1 point
0 comments1 min readLW link

AI Policy Tues­day: The Con­cept of Poli­ti­cal Space and AI Safety

16 Sep 2025 13:50 UTC
1 point
0 comments1 min readLW link

AI Safety Thurs­day: Su­per­in­tel­li­gence Endgames

georgia_berg12 Sep 2025 16:49 UTC
1 point
0 comments1 min readLW link

AI Safety Thurs­day: Tech­ni­cal AI Gover­nance—Mo­ti­va­tions, Challenges, and Advice

georgia_berg12 Sep 2025 16:44 UTC
1 point
0 comments1 min readLW link

AI Policy Tues­day: The Case for Reg­u­lat­ing AI Com­pa­nies, Not AI Models

georgia_berg12 Sep 2025 16:39 UTC
1 point
0 comments1 min readLW link

AI Safety Thurs­day: At­tempts and Suc­cesses of LLMs Per­suad­ing on Harm­ful Topics

georgia_berg12 Sep 2025 16:39 UTC
1 point
0 comments1 min readLW link