Link post
this post aims to keep track of posts relating to the question-answer counterfactual interval proposal for AI alignment, abbreviated “QACI” and pronounced “quashy”. i’ll keep it updated to reflect the state of the research.
this research is primarily published on the Orthogonal website and discussed on the Orthogonal discord.
as a top-level view of QACI, you might want to start with:
an Evangelion dialogue explaining QACI
a narrative explanation of QACI
Orthogonal’s Formal-Goal Alignment theory of change
formalizing the QACI formal-goal
the set of all posts relevant to QACI includes:
as overviews of QACI and how it’s going:
state of my research agenda
problems for formal alignment
the Formal-Goal Alignment theory of change
the original post introducing QACI
on the formal alignment perspective within which it fits:
formal alignment: what it is, and some proposals
clarifying formal alignment implementation
on being only polynomial capabilities away from alignment
on the blob location problem:
QACI blobs and interval illustrated
counterfactual computations in world models
QACI: the problem of blob location, causality, and counterfactuals
QACI blob location: no causality & answer signature
QACI blob location: an issue with firstness
on QACI as an implementation of long reflection / CEV:
CEV can be coherent enough
some thoughts about terminal alignment
on formalizing the QACI formal goal:
a rough sketch of formal aligned AI using QACI with some actual math
one-shot AI, delegating embedded agency and decision theory, and one-shot QACI
on how a formally aligned AI would actually run over time:
AI alignment curves
before the sharp left turn: what wins first?
on the metaethics grounding QACI:
surprise! you want what you want
outer alignment: two failure modes and past-user satisfaction
your terminal values are complex and not objective
on my view of the AI alignment research field within which i’m doing formal alignment:
my current outlook on AI risk mitigation
a casual intro to AI doom and alignment
the QACI alignment plan: table of contents
Link post
this post aims to keep track of posts relating to the question-answer counterfactual interval proposal for AI alignment, abbreviated “QACI” and pronounced “quashy”. i’ll keep it updated to reflect the state of the research.
this research is primarily published on the Orthogonal website and discussed on the Orthogonal discord.
as a top-level view of QACI, you might want to start with:
an Evangelion dialogue explaining QACI
a narrative explanation of QACI
Orthogonal’s Formal-Goal Alignment theory of change
formalizing the QACI formal-goal
the set of all posts relevant to QACI includes:
as overviews of QACI and how it’s going:
state of my research agenda
problems for formal alignment
the Formal-Goal Alignment theory of change
the original post introducing QACI
on the formal alignment perspective within which it fits:
formal alignment: what it is, and some proposals
clarifying formal alignment implementation
on being only polynomial capabilities away from alignment
on the blob location problem:
QACI blobs and interval illustrated
counterfactual computations in world models
QACI: the problem of blob location, causality, and counterfactuals
QACI blob location: no causality & answer signature
QACI blob location: an issue with firstness
on QACI as an implementation of long reflection / CEV:
CEV can be coherent enough
some thoughts about terminal alignment
on formalizing the QACI formal goal:
a rough sketch of formal aligned AI using QACI with some actual math
one-shot AI, delegating embedded agency and decision theory, and one-shot QACI
on how a formally aligned AI would actually run over time:
AI alignment curves
before the sharp left turn: what wins first?
on the metaethics grounding QACI:
surprise! you want what you want
outer alignment: two failure modes and past-user satisfaction
your terminal values are complex and not objective
on my view of the AI alignment research field within which i’m doing formal alignment:
my current outlook on AI risk mitigation
a casual intro to AI doom and alignment