I stoped reading about 1⁄3 into it, because the pros were driving me mad, and went to the spoiler. Anyone who has ever had to read an academic article that attempts to sound more intelligent than it actually is understands my frustration. I was suspicious, since I had read some of your other work and this clearly didn’t match it, but was still relieved to know your brain hasn’t yet completely melted.
Prometheus
Widening Overton Window—Open Thread
What if you kept building more and more advanced adversarial networks designed to fool the AI about reality? Or what if you implemented patterns in deployment to make it appear as though it’s still a simulation?
Perhaps it’s better to avoid the word intelligence, then. Semantics isn’t really important. What is important is I can imagine a non-agentic simulator or some other entity having severely transformative chances, some of which could be catastrophic.
4 Key Assumptions in AI Safety
I vote the rats on heroin too. Just because it would be hilarious to think of some budding civilization in a distant galaxy peering into the heavens, and seeing the Universe being slowly consumed by a hoard of drug-addicted rodents.
I guess this begs the question: do we actually want to pursue a certain moral system has a collective goal?
I meant the focus on biotech in terms of the prevention/mitigation of bioweapons, rather than the positive side of biotech. I’ll change the wording to avoid confusion.
Yeah, I was excited when I heard Game B was being created. Will have to wait and see if it yields any fruit. Improving institutional decision making is more of the symptom than the cause, but it might work as a proxy solution, which is probably much easier.
“I think “solving coordination problems” more generally is not that neglected and/or tractable, given that there are strong incentives for a lot of people and organisations to do so already, but I may be wrong.”
But this seems to be the core of coordination problems. Everyone has a collective incentive to do it, and yet we see failures in it all around us. I’m too pessimistic to think we can get to something like “dath ilan”, but it seems like we can surely do better than our current SNAFU. I agree that it might not be tractable. I imagine it might depend more on a few key breakthroughs that are able to outcompete less-than-optimal methods.
Five Areas I Wish EAs Gave More Focus
The biggest problem I have with a lot of these is they require human feedback. Imagine a Chess AI receiving human feedback on each move having to compete with Alpha Zero’s self-supervised RL system, which beat every other Chess AI and human after just 72 hours of training. I just don’t see how human-feedback systems can possibly compete.
I have an issue with it for a different reason. Not because I don’t think it’s possible, but because even just by stating it, it might cause some entities to pay attention to things they wouldn’t have otherwise.
I’ve selected to opt-out of Patrov Day, not because I don’t want to participate, but because I think this is the most optimal strategy. The more people who opt-out, the less likely the button will be pushed.
Great analysis! I’m curious about the disagreement with needing a pivotal act. Is this disagreement more epistemic or normative? That is to say do you think they assign a very low probability of needing a pivotal act to prevent misaligned AGI? Or do they have concerns about the potential consequences of this mentality? (people competing with each other to create powerful AGI, accidentally creating a misaligned AGI as a result, public opinion, etc.)
Yes, this surprised me to. Perhaps it was the phrasing that they disagreed with? If you asked them about all possible intelligences in mindspace, and asked them if they thought AGI would fall very close to most human minds, maybe their answer would be different.
There’s a part of your argument I am confused about. The sharp left turn is a sudden change in capabilities. Even if you can see if things are trending one way or the other, how can you see sharp left turns coming? At the end, you clarify that we can’t predict when a left turn will occur, so how do these findings pertain to them? This seems to be more of an attempt to track trends of alignment/misalignment, but I don’t see what new insights it gives us about sharp left turns specifically.
I agree that we need clear wins, but I also think that most people in the AI Safety community agree that we need clear wins. Would you be interested in taking ownership of this, speaking with various people in the community, and write up a blog post with what you think would characterize a clear action plan, with transparent benchmarks for progress? I think this would be very beneficial, both on the Alignment side and the Governance side.
This has caused me to reconsider what intelligence is and what an AGI could be. It’s difficult to determine if this makes me more or leas optimistic about the future. A question: are humans essentially like GPT? We seem to be running simulations with the attempt to reduce predictive loss. Yes, we have agency; but this that human “agent” actually the intelligence or just generated by it?
Could you explain the rational behind the “Open” in OpenAI? I can understand the rational of trying to beat more reckless companies to achieving AGI first (albeit, this mentality is potentially extremely dangerous too), but what is the rational behind releasing your research? This will enable companies that do not prioritize safety to speed ahead with you, perhaps just a few years behind. And, if OpenAI hesitates to progress, due to concerns over safety, the more risk-taking orgs will likely speed ahead of OpenAI in capabilities. The bottomline is I’m concerned your efforts to achieve AGI might not do much to ensure an aligned AGI is actually created, but instead only speed-up the timeline toward achieving AGI by years or even decades.
Unfortunately, he could probably get this published in various journals, with only minor edits being made.