I’m so tired of people needing to explain this. An important question for me: “Why didn’t people just read Yudkowsky and Bostrom and understand the threat model?” It seems like many people did, but many people don’t seem to get it.
I like the “aligning product VS aligning superintelligence” phrase.
a model builds the next rung on the capability ladder
I wouldn’t expect “a model” to be the object to track as generalized capabilities compound towards superintelligence. The generalized objects I think is correct to track are “outcome influencing systems” (OISs) most probably OISs hosted on the sociotechnical substrate. Probably something like AI companies and/or coordinated clusters of personality self replicators (PSRs) and whatever kind of OISs they develop into which I expect it will no longer feel right to call PSRs anymore.
But otherwise I agree. There are many OISs in the environment with compounding capabilities and we basically don’t understand their preferences or development paths.
irreversible guardrail decay
This is a nice phrase. I would like if we had more focus on what the guardrails even are and how to build sensible guardrails… and reverse the decay of guardrails which have decayed reversibly. Probably useful to have a map of what kinds of guardrail decay are truly irreversible under what scenarios.
If we got a global plague that crippled global trade sufficiently that we couldn’t maintain data centers anymore, that would probably rebuild many guardrails we thought were lost forever. Not that I want that. I want us to avoid dystopia. Avoiding dystopia with lesser dystopia isn’t really what I’m hoping for.
I’m so tired of people needing to explain this. An important question for me: “Why didn’t people just read Yudkowsky and Bostrom and understand the threat model?” It seems like many people did, but many people don’t seem to get it.
I like the “aligning product VS aligning superintelligence” phrase.
I wouldn’t expect “a model” to be the object to track as generalized capabilities compound towards superintelligence. The generalized objects I think is correct to track are “outcome influencing systems” (OISs) most probably OISs hosted on the sociotechnical substrate. Probably something like AI companies and/or coordinated clusters of personality self replicators (PSRs) and whatever kind of OISs they develop into which I expect it will no longer feel right to call PSRs anymore.
But otherwise I agree. There are many OISs in the environment with compounding capabilities and we basically don’t understand their preferences or development paths.
This is a nice phrase. I would like if we had more focus on what the guardrails even are and how to build sensible guardrails… and reverse the decay of guardrails which have decayed reversibly. Probably useful to have a map of what kinds of guardrail decay are truly irreversible under what scenarios.
If we got a global plague that crippled global trade sufficiently that we couldn’t maintain data centers anymore, that would probably rebuild many guardrails we thought were lost forever. Not that I want that. I want us to avoid dystopia. Avoiding dystopia with lesser dystopia isn’t really what I’m hoping for.