peterr

Karma: 149

peterr 21 Aug 2025 2:40 UTC
1 point
0
in reply to: Aaron_Scher’s comment on: Aaron_Scher’s Shortform
Interesting. I am inclined to think this is accurate. I’m kind of surprised people thought GPT-5 was a huge scaleup given that it’s much faster than o3 was. It sort of felt like a distilled o3 + 4o.

peterr 13 Jul 2025 18:03 UTC
9 points
0
in reply to: Seth Herd’s comment on: Seth Herd’s Shortform
Thanks Seth! I appreciate you signal boosting this and laying out your reasoning for why planning is so critical for AI safety.

Contest for Better AGI Safety Plans

peterr3 Jul 2025 17:02 UTC

29 points

(manifund.org)

peterr 12 Jun 2025 6:31 UTC
1 point
0
on: Scale Was All We Needed, At First
Predicting the name Alice, what are the odds?

peterr 22 Apr 2025 1:18 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Vladimir_Nesov’s Shortform
If true, would this imply you want a base model to generate lots of solutions and a reasoning model to identify the promising ones and train on those?

peterr 28 Feb 2025 6:48 UTC
6 points
0
in reply to: Daniel Tan’s comment on: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Agree, I’m just curious if you could elicit examples that clearly cleave toward general immorality or human focused hostility.

peterr 28 Feb 2025 5:06 UTC
1 point
0
in reply to: Daniel Tan’s comment on: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Does the model embrace “actions that are bad for humans even if not immoral” or “actions that are good for humans even if immoral” or treat users differently if they identify as non-humans? This might help differentiate what exactly it’s mis-aligning toward.

peterr 25 Feb 2025 23:14 UTC
5 points
0
on: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
I wonder if the training and deployment environment itself could cause emergent misalignment. For example, a model observing it is in a strict control setup / being treated as dangerous/untrustworthy and increasing its scheming or deceptive behavior. And whether a more collaborative setup could decrease that behavior.

peterr14 Feb 2025 23:18 UTC

25 points

peterr 12 Feb 2025 4:10 UTC
1 point
0
in reply to: ozziegooen’s comment on: ozziegooen’s Shortform
You could probably test if an AI makes moral decisions more often than the average person, if it has higher scope sensitivity, and if it makes decisions that resolve or deescalate conflicts or improve people’s welfare compared to various human and group baselines.

peterr 24 Dec 2024 0:07 UTC
4 points
0
in reply to: jbash’s comment on: When AI 10x’s AI R&D, What Do We Do?
@jbash What do you think would be a better strategy/more reasonable? Should there be more focus on mitigating risks after potential model theft? Or a much stronger effort to convince key actors to implement unprecedentedly strict security for AI?

peterr 17 Dec 2024 17:56 UTC
11 points
2
in reply to: mako yass’s comment on: MakoYass’s Shortform
He also said interpretability has been solved, so he’s not the most calibrated when it comes to truthseeking. Similarly, his story here could be wildly exaggerated and not the full truth.

peterr 28 Nov 2024 0:38 UTC
1 point
0
in reply to: Bogdan Ionut Cirstea’s comment on: Bogdan Ionut Cirstea’s Shortform
There have been comments from OAI staff that o1 is “GPT-2 level” so I wonder if it’s a similar size?

peterr 15 Aug 2024 21:06 UTC
3 points
2
on: Ten arguments that AI is an existential risk
It would be interesting to see which arguments the public and policymakers find most and least concerning.

peterr 17 Jun 2024 16:12 UTC
2 points
0
in reply to: lemonhope’s comment on: The Leopold Model: Analysis and Reactions
So I generally think this type of incentive affecting people’s views is important to consider. Though I wonder, couldn’t you make counter arguments along the lines of “oh well if they’re really so great why don’t you try to sell them and make money? Because they’re not great.” And “If you really believed this was important, you would bet proportional amounts of money on it.”

peterr 6 Jan 2024 22:45 UTC
33 points
22
on: AI Risk and the US Presidential Candidates
Trump said he would cancel the executive order on Safe, Secure, and Trustworthy AI on day 1 if reelected. Seems negative considering it creates more uncertainty around how consistent any AI regulation will be and he has no alternative.

peterr6 Dec 2023 2:02 UTC

25 points

(pastebin.com)