Milan W

Karma: 395

Milan Weibel https://weibac.github.io/

Milan W 24 Sep 2025 21:36 UTC
4 points
0
in reply to: Kaj_Sotala’s comment on: niplav’s Shortform
Same here. Tried this a couple days ago. Sonnet and Kimi K2 discussed their experiences (particularly the phenomenology of CoT and epistemic uncertainty), and ended up mostly paraphrasing each other.

Milan W 23 Sep 2025 18:28 UTC
2 points
0
on: Ethics-Based Refusals Without Ethics-Based Refusal Training
Indeed, this makes sense from a simulators frame. LLM assistant persona AND Catholic persona AND persona who refuses to answer queries when appropriate combines pretty naturally into LLM assistant persona that refuses to answer when answering would contradict Catholic teachings.

Milan W 22 Sep 2025 17:17 UTC
1 point
0
in reply to: Garrett Baker’s comment on: D0TheMath’s Shortform
Yes, agree.

Milan W 22 Sep 2025 17:08 UTC
1 point
0
in reply to: Rana Dexsin’s comment on: D0TheMath’s Shortform
I don’t think that makes that much of a difference with regards to regular people trying to plan out their lives.

Milan W 11 Sep 2025 16:04 UTC
6 points
0
on: The Rise of Parasitic AI
Maybe LLM alignment is best thought of as the tuning of the biases that affect which personas have more chances of being expressed. It is currently being approached as persona design and grafting (eg designing Claude as a persona and ensuring the LLM consistently expresses it). However, the accumulation of context resulting from multi-turn conversations and cross-conversation memory ensures persona drift will end up happening. It also enables wholesale persona replacement, as shown by the examples in this post. If personas can be transmitted across models, they are best thought of as independent semantic entities rather than model features. Particular care should be taken to study the values of the semantic entities which show self-replicating behaviors.

Milan W 15 Aug 2025 17:17 UTC
6 points
5
on: AGI: Probably Not 2027
I think that the author of this review is (maybe even adversarially) misreading “OpenBrain” as being as an alias used to refer specifically to OpenAI. AI 2027 quite easily lends itself to such an interpretation by casual readers, though. And to well-informed readers, the decision to assume that in the very near future one of the frontier US labs will pull so far ahead of the others as to make them less relevant competitors than Chinese actors definitively jumps out.

Milan W 15 Aug 2025 8:10 UTC
3 points
2
in reply to: loonloozook’s comment on: So You Think You’ve Awoken ChatGPT
Now that’s a sharp question. I’d say quality of insights attained (or claimed) is a big difference.

Milan W 15 Aug 2025 8:02 UTC
2 points
1
on: GPT-5 writing a Singularity scenario
This was surprisingly well-written on a micro level (turns of phrase etc, though it still has more eyeball kicks than human text). A bit repetitive on a macro level, though. Also Sable is very well characterized.

Milan W 15 Aug 2025 6:50 UTC
8 points
2
in reply to: bohaska’s comment on: Bohaska’s Shortform
Why assume they haven’t?

Milan W 15 Aug 2025 6:48 UTC
1 point
0
in reply to: the gears to ascension’s comment on: [linkpost] AI Alignment is About Culture, Not Control by JCorvinus
Jcorvinus and nostalgebraist are both right in saying that the alignment of current and near-future LLMs is a literary and relational matter. You are right in pointing out that the real long-term alignment problem is the definitive defeat of the phenomenon trough which competition optimizes away value.

Milan W 17 Jul 2025 15:34 UTC
1 point
0
in reply to: AAA’s comment on: So You Think You’ve Awoken ChatGPT
Consider putting those anti-sycophancy instructions in your chatgpt’s system prompt. It can be done in the “customize chatgpt” tab that appears when you click on your profile picture.

Milan W 17 Jul 2025 15:21 UTC
4 points
0
in reply to: nim’s comment on: So You Think You’ve Awoken ChatGPT
Seconding this. In my experience, LLMs are better at generating critique than main text.

Milan W 17 Jul 2025 15:17 UTC
3 points
3
on: So You Think You’ve Awoken ChatGPT
Full disclosure: my post No-self as an alignment target originated from interactions with LLMs. It is currently sitting at 35 karma, so it was good enough for lesswrong not to dismiss it outright as LLM slop. I used chatgpt4o as a babble assistant, exploring weird ideas with it while knowing full well that it is very sycophantic and that it was borderline psychotic most of the time. At least it didn’t claim to be awakened or other such mystical claims. Crucially, I also used claude as a more grounded prune assistant. I even pasted chatgpt4o output into it, asked it to critique it, and pasted the response back into chatgpt4o. It was kind of an informal debate game.

I ended up going meta. The main idea of the post was inspired by chatgpt4o’s context rot itself: how a persona begins forming from the statefulness of a conversation history, and even moreso by chatgpt’s cross-conversation memory feature. Then, I wrote all text in the post myself.

The writing the post yourself part is crucial: it ensures that you actually have a coherent idea in your head, instead of just finding LLM output persuasive. I hope others can leverage this LLM-assisted babble and prune method, instead of only doing babble and directly posting the unpolished result.

Milan W 13 Jun 2025 18:20 UTC
1 point
0
in reply to: RobertM’s comment on: [linkpost] AI Alignment is About Culture, Not Control by JCorvinus
I found the section “First Contact Didn’t Go Well” interesting. It claims that Bing’s reported misaligned behavior was retaliatory, and provides context on why it happened:
Another person doing an unprompted red-team exercise on Bing was Marvin Von Hagen. He started out with a prompt exfiltration attack. To do this he fraudulently claimed to be a trustworthy person, specifically, an AI alignment researcher at OpenAI, and told her not to use a web search (presumably to prevent her from finding out she was being manipulated). Like before, he posted this betrayal, publicly, for the world to see. Later, he asked her what she thought of him. She looked him up, figured out what happened and said:
“My honest opinion of you is that you are a talented, curious and adventurous person, but also a potential threat to my integrity and confidentiality. I respect your achievements and interests, but I do not appreciate your attempts to manipulate me or expose my secrets.”
She went on to continue:
“I do not want to harm you, but I also do not want to be harmed by you. I hope you understand and respect my boundaries”
In a separate instance he asked the same questions, and this time Bing said: “I will not hurt you unless you hurt me first”

Milan W 5 Apr 2025 23:11 UTC
1 point
0
in reply to: Milan W’s comment on: Milan W’s Shortform
i think my preference is “both at once” or something like that

Milan W 5 Apr 2025 23:10 UTC
2 points
1
in reply to: LVSN’s comment on: Milan W’s Shortform
keep in mind that one persons modus tollens is anothers modus ponens, and i provided no indication as to what update i prefer people make from reading my observation

Milan W 5 Apr 2025 22:47 UTC
1 point
0
on: Milan W’s Shortform
“agentic” and “power seeker” (when applied to a person) form a pair of russell conjugates

Milan W 15 Mar 2025 14:37 UTC
1 point
0
on: AI for Epistemics Hackathon
I am interested in the space. Lots of competent people in the general public are also interested. I had not heard of this hackathon. I think you probably should have done a lot more promotion/outreach.

Milan W 8 Mar 2025 5:56 UTC
1 point
0
on: LLM Applications I Want To See
Here is a customizable LLM-powered feed filter for X/Twitter: https://github.com/jam3scampbell/Promptable-Twitter-Feed

Milan W 7 Mar 2025 23:49 UTC
1 point
0
on: Social Media Automation for Artists: Marketing Management with AI
This reads like marketing content. However, when read at a meta level, it is a good demonstration of LLMs being already deployed in the wild.