wonder

Karma: 39

wonder 27 Jul 2025 17:19 UTC
6 points
0
in reply to: Vale’s comment on: Vale’s Shortform
I also agree “AI” is overloaded and has existing connotations (ranging from algorithms to applications as well)! I would think generative models, or generative AI works better (and one can specify multimodal generative models if one wants to be super clear), but also curious to see what other people would propose.

wonder 25 Jul 2025 4:07 UTC
5 points
2
on: Generalized Hangriness: A Standard Rationalist Stance Toward Emotions
Usually I also take emotions as a channel to surface unconscious preferences (either situational or longer term), which helps with making that preference conscious as well as evaluated, and thus helps with rational decisions.

wonder 9 Jul 2025 16:44 UTC
1 point
0
in reply to: Fabien Roger’s comment on: Why Do Some Language Models Fake Alignment While Others Don’t?
Maybe I missed this in the paper—for base models, do you change the prompt slightly for the base model or is it still instruction based prompts for testing fake alignment?

wonder 2 Jul 2025 15:46 UTC
6 points
0
on: Authors Have a Responsibility to Communicate Clearly
Thanks for writing this up! I highly agree and I think this is an important point to emphasis. Stating crucial context/clarification/assumptions is important, and unclear communication is likely counter-productive.

wonder 18 Apr 2025 2:48 UTC
1 point
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
I share some similar frustrations, and unfortunately these are also prevalent in other parts of the human society. The commonality of most of these fakeness seem to be impure intentions—there are impure/non-intrinsic motivations other than producing the best science/making true progress. Some of these motivations unfortunately could be based on survival/monetary pressure, and resolving that for true research or progress seems to be critical. We need to encourage a culture of pure motivations, and also equip ourselves with more ability/tools to distinguish extrinsic motivations.

wonder 2 Apr 2025 15:25 UTC
1 point
0
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
Would the take over for small countries also about humans using just an advanced AI for taking over? (or would the human using advanced AI for take over happen faster?)

wonder 20 Mar 2025 4:27 UTC
3 points
2
on: How to Make Superbabies
Maybe I missed this in the article itself—are there plans to make sure the superbabies are aligned and will not abuse/overpower the non-engineered peers?

wonder 5 Mar 2025 18:26 UTC
1 point
0
on: Self-fulfilling misalignment data might be poisoning our AI models
I was thinking of this the other day as well; I think this is particularly a problem when we are evaluating misalignment based on these semantic wording. This may suggest the increasing need to pursue alternative ways to evaluate misalignment, rather than purely prompt based evaluation benchmarks

wonder 20 Feb 2025 0:20 UTC
4 points
0
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
Based on my observations, I would also think some current publication chasing culture could get people push out papers more quickly (in some particular domains like CS), even though some papers may be partially completed

wonder 20 Jan 2025 19:44 UTC
9 points
0
on: Agent Foundations 2025 at CMU
Will the event/sessions be recorded by any chance? (may not be able to attend, but would love to learn); additionally, would the topics be focused exclusively on relations to X risks?