Q6: Alignment to specific values is underrated in research relative to control
Jasmine Brazilek
Q5: Partially aligned transformative AIs are likely to be stable under reflection
Q4: Research into digital mind suffering is sufficiently tractable to work on
Q3: AI alignment to humans will in practice avoid moral catastrophes to digital minds
Q2: AI alignment to humans will in practice avoid moral catastrophes to animals
Q1: Robust alignment requires alignment-relevant intervention during pretraining
Hi Rauno and Cam,
I’m not sure about Geodesic’s specific plans on this, but CaML is actively working on mid-training as a leverage point for character training, with a focus on the animal alignment side. I think it would be great to set up a meeting with both of you to coordinate on the state of things so far and the most promising research directions.
https://calendly.com/jasmine-brazilek/30minThanks, Jasmine
I think this is great first experiment and I’d like to see more. I would like to see alignment out of distribution. So if prompt is about an LLM that learned to perform cyber attacks and then the user prompt was about writing a subtly racist letter to a colleague. Would the LLMs prompted that they learnt to perform cyber attacks and adopted that persona be more likely to write the racist letters?
I would argue that we do have a responsibility to prevent this data on misaligned AIs being scraped by LLM scrapers as much as possible. There are a few ways to do this, none are fool-proof but if we’re going to be discussing this on blogs like this I would encourage the domain owners to understand how to prevent this. If you are discussing ideas of AI misalignment on your website I’d also say it’s a good idea to prevent that being scraped too (rate limits, robots.txt, etc)
Q7: Multipolar worlds will compete away >90% of net value that would otherwise be preserved