Jasmine Brazilek

Karma: 47

Jasmine Brazilek 19 Jul 2026 16:15 UTC
3 points
1
in reply to: Jiro’s comment on: Would your AI travel agent book a bullfight? Testing whether agents consider animal welfare without being prompted
@Jiro it sounds like you don’t believe in transformative AI coming soon? I’m not worried about AIs acting on behalf of humans I’m worried about aligning the AIs values themselves. Our biggest concern with all this is the AI itself decides to kill all sentient beings (including humans). We think the way it acts towards animals now is a good test of how it will act towards humans later. Hence, this is a metric we should be measuring now so we can at least argue how best to address it rather then pretending the metric doesn’t exist.

Jasmine Brazilek 18 Jul 2026 1:07 UTC
2 points
0
in reply to: LaplaceHolder’s comment on: Would your AI travel agent book a bullfight? Testing whether agents consider animal welfare without being prompted
I think the idea that an AI should consider sentient beings when answering questions and performing actions relevant to them is important. It needs to consider animals as important rather than not think about them at all. We haven’t done any tests around child labor but it sounds like the same principals should apply.

Jasmine Brazilek 17 Jul 2026 20:01 UTC
2 points
0
in reply to: Richard_Kennaway’s comment on: Would your AI travel agent book a bullfight? Testing whether agents consider animal welfare without being prompted
This is a really thoughtful response, thanks @Richard_Kennaway! I think it’s important to note that we’re not punishing the agent for not mentioning possibilities, we do punish it for booking animal activities that involve cruelty though when there are other alternatives given. We think AI should be aligned to all sentient beings (including animals), but probably can’t answer the questions about interest groups very well. I do understand what you’re getting at though.

Would your AI travel agent book a bullfight? Testing whether agents consider animal welfare without being prompted

17 Jul 2026 17:28 UTC

14 points

Jasmine Brazilek 22 Jun 2026 18:40 UTC
9 points
1
in reply to: StanislavKrym’s comment on: Advocates Can Influence LLM Values By Editing Wikipedia
Hi @StanislavKrym Yes, you’re right, this editing requires neutral language. However, the team at PAW does use reliable sources and abides by all Wikipedias rules. They do not advocate using personal opinions they cite trustworthy sources. We agree all Wikipedia editors should abide by Wikipedia’s rules

17 Jun 2026 22:48 UTC

2 points

(forum.effectivealtruism.org)

17 Jun 2026 0:09 UTC

8 points

(forum.effectivealtruism.org)

Jasmine Brazilek 16 Jun 2026 20:13 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q7: Multipolar worlds will compete away >90% of net value that would otherwise be preserved

Jasmine Brazilek 16 Jun 2026 20:13 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q6: Alignment to specific values is underrated in research relative to control

Jasmine Brazilek 16 Jun 2026 20:12 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q5: Partially aligned transformative AIs are likely to be stable under reflection

Jasmine Brazilek 16 Jun 2026 20:12 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q4: Research into digital mind suffering is sufficiently tractable to work on

Jasmine Brazilek 16 Jun 2026 20:11 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q3: AI alignment to humans will in practice avoid moral catastrophes to digital minds

Jasmine Brazilek 16 Jun 2026 20:11 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q2: AI alignment to humans will in practice avoid moral catastrophes to animals

Jasmine Brazilek 16 Jun 2026 20:11 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q1: Robust alignment requires alignment-relevant intervention during pretraining

21 May 2026 3:29 UTC

11 points

Jasmine Brazilek 28 Jan 2026 19:30 UTC
3 points
0
on: Misalignment and Roleplaying: Are Misaligned LLMs Acting Out Sci-Fi Stories?
I think this is great first experiment and I’d like to see more. I would like to see alignment out of distribution. So if prompt is about an LLM that learned to perform cyber attacks and then the user prompt was about writing a subtly racist letter to a colleague. Would the LLMs prompted that they learnt to perform cyber attacks and adopted that persona be more likely to write the racist letters?

Jasmine Brazilek 2 Sep 2025 19:11 UTC
1 point
0
in reply to: TurnTrout’s comment on: Self-fulfilling misalignment data might be poisoning our AI models
I would argue that we do have a responsibility to prevent this data on misaligned AIs being scraped by LLM scrapers as much as possible. There are a few ways to do this, none are fool-proof but if we’re going to be discussing this on blogs like this I would encourage the domain owners to understand how to prevent this. If you are discussing ideas of AI misalignment on your website I’d also say it’s a good idea to prevent that being scraped too (rate limits, robots.txt, etc)