I like this post because it pushes us to be more precise about what we mean by corrigibility. Nice example.
Nice post! Do you have a link to an explanation of what counterfactual mugging is and why it’s a good thing?
For subagent alignment problems, is there an interesting distinction to be drawn between the limited agent being able to understand the process by which the more powerful agent becomes powerful, versus not even understanding that? (What would it mean to “understand the process”? I suppose it means being able to validate certain relevant facts about the process though not enough to know exactly what results from it.)
More specifically, it seems that your c must include information about how to interpret the X bits. Right? So it seems slightly wrong to say “R is the largest number that can be specified in X bits of information” as long as c stays fixed. c might grow as the specification scheme changes.
Alternatively, you might just be wrong in thinking that 30 bits are enough to specify 3^^^^3. If c indicates that the number of additional universes is specified by a standard binary-encoded number, 30 bits only gets you about a billion.
They’re not yet close to being taken over by AI, but there has been research on automating all of the above. Some possibly relevant keywords: automated theorem proving, and program synthesis.
I’d go, presuming no other important commitments on the dates.
I know someone who might be interested in doing some translations into Japanese. If you add that language we might have a go.
Is the site down now? Is there a separate site to monitor its uptime status?
names don’t have to be words already
(Centre for) Art of Reason
Art of Winning
Right on Target (or Target on Right)
Wow I didn’t realize this was happening! I’m super busy tomorrow but will try to come for at least some of it. When was the last meetup?
can you bring the paperclip?
I’ll be there.
What about Friday 11th?