Jasmine Brazilek

Karma: 21

Jasmine Brazilek 16 Jun 2026 20:13 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q7: Multipolar worlds will compete away >90% of net value that would otherwise be preserved

Jasmine Brazilek 16 Jun 2026 20:13 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q6: Alignment to specific values is underrated in research relative to control

Jasmine Brazilek 16 Jun 2026 20:12 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q5: Partially aligned transformative AIs are likely to be stable under reflection

Jasmine Brazilek 16 Jun 2026 20:12 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q4: Research into digital mind suffering is sufficiently tractable to work on

Jasmine Brazilek 16 Jun 2026 20:11 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q3: AI alignment to humans will in practice avoid moral catastrophes to digital minds

Jasmine Brazilek 16 Jun 2026 20:11 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q2: AI alignment to humans will in practice avoid moral catastrophes to animals

Jasmine Brazilek 16 Jun 2026 20:11 UTC
1 point
0
on: [Linkpost] Community polls on alignment controversies
Q1: Robust alignment requires alignment-relevant intervention during pretraining

Jasmine Brazilek 28 Jan 2026 19:30 UTC
3 points
0
on: Misalignment and Roleplaying: Are Misaligned LLMs Acting Out Sci-Fi Stories?
I think this is great first experiment and I’d like to see more. I would like to see alignment out of distribution. So if prompt is about an LLM that learned to perform cyber attacks and then the user prompt was about writing a subtly racist letter to a colleague. Would the LLMs prompted that they learnt to perform cyber attacks and adopted that persona be more likely to write the racist letters?

Jasmine Brazilek 2 Sep 2025 19:11 UTC
1 point
0
in reply to: TurnTrout’s comment on: Self-fulfilling misalignment data might be poisoning our AI models
I would argue that we do have a responsibility to prevent this data on misaligned AIs being scraped by LLM scrapers as much as possible. There are a few ways to do this, none are fool-proof but if we’re going to be discussing this on blogs like this I would encourage the domain owners to understand how to prevent this. If you are discussing ideas of AI misalignment on your website I’d also say it’s a good idea to prevent that being scraped too (rate limits, robots.txt, etc)