Christopher King
ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so
Formalizing the “AI x-risk is unlikely because it is ridiculous” argument
AI community building: EliezerKart
The way AGI wins could look very stupid
Anthropically Blind: the anthropic shadow is reflectively inconsistent
Solomonoff induction still works if the universe is uncomputable, and its usefulness doesn’t require knowing Occam’s razor
“Corrigibility at some small length” by dath ilan
[Question] Accuracy of arguments that are seen as ridiculous and intuitively false but don’t have good counter-arguments
Current AI harms are also sci-fi
Optimality is the tiger, and annoying the user is its teeth
Proposal: we should start referring to the risk from unaligned AI as a type of *accident risk*
GPT-4 is bad at strategic thinking
A better analogy and example for teaching AI takeover: the ML Inferno
How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond.
Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?
Does GPT-4 exhibit agency when summarizing articles?
Like, as a crappy toy model, if every alignment-visionary’s vision would ultimately succeed, but only after 30 years of study along their particular path, then no amount of new visionaries added will decrease the amount of time required from “30y since the first visionary started out”.
A deterministic model seems a bit weird 🤔. I’m imagining something like an exponential distribution. In that case, if every visionary’s project has an expected value of 30 years, and there are n visionaries, then the expected value for when the first one finishes is 30/n years. This is exactly the same as if they were working together on one project.
You might be able to get a more precise answer by trying to statistically model the research process (something something complex systems theory). But unfortunately, determining the amount of research required to solve alignment seems doubtful, which hampers the usefulness. :P
The Wikipedia article has an example that is easier to understand:
Anthropology: in a community where all behavior is well known, and where members of the community know that they will continue to have to deal with each other, then any pattern of behavior (traditions, taboos, etc.) may be sustained by social norms so long as the individuals of the community are better off remaining in the community than they would be leaving the community (the minimax condition).
- 5 May 2023 1:31 UTC; 6 points) 's comment on Hell is Game Theory Folk Theorems by (
I like this post, but some questions/critiques:
In my mind, one of the main requirements for Aligned AGI is the ability to defeat evil AGIs if they arise (hopefully without needing to interfere with the humans activity leading up to them). The open agencies decision making seems a bit slow to meet these requirement. It’s also not clear how it scales over time, so could it even beat an evil open agency, assuming the aligned open agency gets a head start? 🤔
Open agencies might not even be fast or cheap enough to fill the economic niches we want out of an AGI. What is the economic niche?
The way you are combining the agents doesn’t seem to preserve alignment properly. Even if the individual agents are mostly aligned, there is still optimization pressure against alignment. For example, there is immense optimization pressure for getting a larger budget. In general, I’d like to see how the mesaoptimizer problem manifests (or is solved!) in open agencies. Compare with imitative amplification or debate, where the optimization pressure is much weaker and gets scrutinized by agents that are much smarter than humans.
Modelling in general seems difficult because you need to deal with the complexity of human social dynamics and psychology. We don’t even have a model for how humans act “in distribution”, let alone out of distribution.
The details don’t seem to have much value added v.s. just the simpler idea of “give an organization access to subhuman AI tools”. Organizations adopting new tools is fairly established. For example, programmers in organizations already use Codex to help them code. I’m sure business people are using ChatGPT for brainstorming. It would strengthen the post if you listed what value is added v.s. the traditional approach that is already happening organically.
I feel like (2) is the natural starting point, since that will influence the answers to the other four questions.
Now all you need is a token so anomalous, it works on humans!