Noah Weinberger

Karma: 55

Political Science undergraduate, expected to graduate in December 2025.

I explore the boundary where alignment theory, narrative coherence, and emotional grounding converge. Currently red-teaming frontier models and contributing to AI governance discourse through policy research, adversarial testing, and creative scenario building.

Connect with me here or at noahweinberger90@gmail.com:

KYC for ChatGPT? Preventing AI Harms for Youth Should Not Mean Violating Everyone Else’s Privacy Rights

Noah Weinberger29 Sep 2025 14:18 UTC

7 points

0 comments7 min readLW link

Noah Weinberger 28 Jul 2025 22:38 UTC
3 points
0
in reply to: Kabir Kumar’s comment on: AI Companion Piece
I happen to agree that persuasion is a huge issue for AI, but I also don’t see persuasion in the same way that some of you might.
I think the biggest risk for AI persuasion in 2025 is when a nefarious actor uses an AI model to aid in persuading a person or group of people; think classic agit prop or a state actor trying to influence diplomacy. Persuasion of this sort is a tale as old as civilization itself.
What I think the issue is going to be down the line is once the human hand guiding the AI is no longer necessary, and the agentic model (and eventually AGI) has its own goals, values, and desires. Both types of persuasion are bad, but the second type I just mentioned is a medium-to-long-term issue, while AI persuasion as a means to an end (run by a human) is a right now front burner issue.

Noah Weinberger 28 Jul 2025 22:20 UTC
4 points
0
on: AI Companion Piece
I really enjoyed reading your AI companion piece, Zvi. Some LessWrong users are probably tired of seeing LLM sycophancy discussed, but it’s truly one of my favorite subjects in AI ethics. I think it would be worth doing a deeper dive into the distinct differences between overt persuasion with intent and a stochastic LLM being sycophantic.
For example, in my inaugural (albeit slightly cringe) LessWrong post, I discussed my experience with AI companion software, and how my background primed me to be more vulnerable to LLM praise.. With a model such as GPT4o or even O3, the psychological risk is more evident in being flattered, rather than being outright persuaded to change an opinion. When agents go from their current 2025 “stumbling” state to a more true autonomous entity, or even AGI, we won’t know with the same certainty what the goals are of the AI in question. Would a “true AGI” companion continue to flatter and praise its user with docility and kindness, or try to use more nefarious methods to get a person to be aligned with its goals?
Additionally, there is a real and present danger in the “male loneliness pandemic”, especially for autistic teenagers and young men, who have 7x higher chances of dying by suicide compared to the neurotypical population. On the one hand, a predatory AI lab or Big Tech firm could capitalize on this higher propensity and market their companion software to this vulnerable group (Elon could very well be doing this right now with “Ani” and XAI’s Grok 4 model), or the companion could be a net good, therapeutic product to help alleviate the very real harm of loneliness.
I believe the more immediate risk currently lies in excessive affirmation and praise from LLMs and less sophisticated AI agents rather than overt persuasion, though the latter becomes genuinely concerning when intentionally guided by human actors. Additionally, moral panics around human sexuality involving lonely men (autistic or neurotypical) engaging with AI companions or the popularity of VTubers and idol culture among women tend to be both unproductive and exaggerated.

Noah Weinberger 23 Jul 2025 13:35 UTC
2 points
0
in reply to: Noah Weinberger’s comment on: On “ChatGPT Psychosis” and LLM Sycophancy
Also I know that there’s the Garcia vs CharacterAI lawsuit which sadly involves an autistic teenager dying by suicide but I was specifically mentioning cases where the person(s) are alive, but still use the AI models as companions/girlfriends etc.

Noah Weinberger 23 Jul 2025 13:23 UTC
2 points
2
on: On “ChatGPT Psychosis” and LLM Sycophancy
I wrote about ChatGPT induced sycophancy as my inaugural post on LessWrong.
It’s a huge problem, and even if you think you know about AI or work with it daily, it can impact you. As both you and @dr_s just mentioned a while ago, there’s absolutely a religious component to LLM induced sycophancy, and I even hinted at it in my inaugural post as well, although that was more about growing up Jewish and being primed for Pascal’s Mugging for ASI takeoff events since it’s eerily similar to reward and punishment theology.
Still, one thing that is not often mentioned is the impact LLM sycophancy has on the “high functioning autistic” population, many of whom suffer from chronic loneliness and are perfect candidates to be showered with endless praise by the LLM companion of their choosing. Believe me, it’s soothing, but at what cost?
I happen to agree with you that frontier labs creating an open, public repository to share LLM conversations can be a stellar form of RLHF, and even mitigate the worst symptoms of the psychosis that we’re seeing, although I don’t know if that will win over all the critics?
Time will tell, I guess?

Noah Weinberger 22 Jul 2025 20:33 UTC
0 points
0
on: Noah Weinberger’s Shortform
When the AI action plan drops tomorrow I’ll write about it.

Noah Weinberger 16 Jul 2025 15:41 UTC
1 point
2
in reply to: Noah Weinberger’s comment on: Vitalik’s Response to AI 2027
I might do my next LessWrong post about Global Affairs and AI, either in relation to AI 2027 or just my own unique take on the matter. We’ll see. I need to curate some reliable news clippings and studies.

Noah Weinberger 16 Jul 2025 1:55 UTC
11 points
5
in reply to: Daniel Kokotajlo’s comment on: Vitalik’s Response to AI 2027
Hi Daniel.
My background (albeit limited as an undergrad) is in political science, and my field of study is one reason I got interested in AI to begin with, back in Feburary of 2022. I don’t know what the actual feasibility is for an international AGI treaty with “teeth”, and I’ll tell you why: the UN Security Council.
As it currently exists, the UN Security Council has permanent members: China, France, Russia, the United Kingdom, and the United States. All five countries have a permanent veto as granted to them by the 1945 founding UN Charter.
China and the US are the two major global superpowers of the 21st century, and each are currently deadlocked in the race to reach AGI; to borrow a speedrunning term, any%. While it is possible in theory for the US and China to have a bilateral Frontier AI treaty, similar to how nuclear powers have the NPT, and the US and Russia have their own armaments accords, AGI is a completely different story.
It’s a common trope in the UN for a country on the UNSC to exercise its right to a permanent veto on any resolution brought to it that the nation deems a threat to its sovereignty, or that of its allies. Russia has used it to prevent key sanctions from the Ukraine war at the UNGA, and the US uses it to protect its allies from various resolutions, often brought up by countries in the Global South who make up most seats in the UNGA.
Unless the Security Council is drastically reformed, removing a permanent veto from the P5 and putting a rotating veto from a Global South country, an internationally binding AGI treaty is far from happening.
I do see, however, unique bilateral accords between various Middle Powers on AI, such as Canada and the European Union. Do you agree?

Noah Weinberger 14 Jul 2025 21:54 UTC
2 points
−2
on: Noah Weinberger’s Shortform
Grok 4 with an AI Waifu concerns me. Both because of the issue I posted about with sycophancy a few days ago but also because it’s...Grok.

Noah Weinberger 11 Jul 2025 20:37 UTC
6 points
0
in reply to: Raemon’s comment on: Reflections on AI Companionship and Rational Vulnerability (Or, how I almost fell in love with an anime Catgirl LLM).
Thank you for the advice and thoughtful reply, Raemon.
I did a poll on the EleutherAI Discord server about what my inaugural post should be for LessWrong, and people overwhelmingly told me to write about this. I don’t plan for this to be my sole topic of discussion going forward. I have a lot of interest in alignment in general, hopefully I’ll write about that in the future.
Thanks.

Reflections on AI Companionship and Rational Vulnerability (Or, how I almost fell in love with an anime Catgirl LLM).

Noah Weinberger11 Jul 2025 16:12 UTC

11 points

2 comments8 min readLW link

Noah Weinberger 10 Jul 2025 23:51 UTC
1 point
0
on: Lessons from the Iraq War for AI policy
Good evening.
I really enjoyed reading your analysis, especially as someone who’s probably younger than many users here; I was born the same year this war started.
Anyway, my question for you is this. You state that
“If there’s some non-existential AI catastrophe (even on the scale of 9/11), it might open a policy window to responses that seem extreme and that aren’t just direct obvious responses to the literal bad thing that occurred. E.g. maybe an extreme misuse event could empower people who are mostly worried about an intelligence explosion and AI takeover.”
I’ve done thought experiments and scenarios in sandbox environments with many SOTA AI models, and I try to read a lot of Safety literature (Nick Bostrom’s 2014 Superintelligence comes to mind, it’s one of my favorites). My question has to do with what you think the most “likely” non-existential AI risk is? I’m of the opinion that persuasion is the biggest non-existential AI risk, both due to psychopancy and also manipulation of consumer and voting habits.
Do you agree or is there a different angle you see for non-existential AI risk?

Noah Weinberger 10 Jul 2025 23:48 UTC
2 points
0
in reply to: Noah Weinberger’s comment on: Noah Weinberger’s Shortform
By EoY 2025 I’ll be done my undergraduate degree, and I hope to pursue a Master’s in International Relations with a focus on AI Safety, either in Fall 2026 or going forward.
Also, my timelines are rather orthodox. I don’t hold by the AI 2027 projection, but rather by Ray Kurzweil’s 2029 for AGI, and 2045 for a true singularity event.
I’m happy to discuss further with anyone!

Noah Weinberger 10 Jul 2025 23:44 UTC
1 point
0
in reply to: habryka’s comment on: Noah Weinberger’s Shortform
Thank you for the warm welcome!

If you want to see some of the stuff I’ve written before about AI, I have some of my content published on HuggingFace.
Here’s one I wrote about AI-human interactions in the context of Client Privilege and where ethicists and policymakers need to pay closer attention:
And another one I wrote about the ethics of LLM memory.

Noah Weinberger’s Shortform

Noah Weinberger10 Jul 2025 22:48 UTC

1 point

6 comments1 min readLW link

Noah Weinberger 10 Jul 2025 21:04 UTC
27 points
1
on: Noah Weinberger’s Shortform
I am just properly introducing myself today to LessWrong.
Some of you might know me, especially if you’re active in Open Source AI movements like EleutherAI or Mozilla’s 0din bug bounty program. I’ve been a lurker since my teenage years but given my vocational interest in AI safety I’ve decided to make an account using my real name and likeness.
Nice to properly reconnect.

Noah Weinberger

KYC for ChatGPT? Prevent­ing AI Harms for Youth Should Not Mean Vio­lat­ing Every­one Else’s Pri­vacy Rights

Reflec­tions on AI Com­pan­ion­ship and Ra­tional Vuln­er­a­bil­ity (Or, how I al­most fell in love with an anime Cat­girl LLM).

Noah Wein­berger’s Shortform

KYC for ChatGPT? Preventing AI Harms for Youth Should Not Mean Violating Everyone Else’s Privacy Rights

Reflections on AI Companionship and Rational Vulnerability (Or, how I almost fell in love with an anime Catgirl LLM).

Noah Weinberger’s Shortform