Here is a list of Q&A from https://aisafety.info/ . When I discovered the site, I was impressed by the volume of material produced. However, the interface is optimized for beginners. The following table of contents is for individuals who wish to navigate the various sections more freely. It was constructed by clustering the Q&A into subtopics. I’m not involved with aisafety.info, I just want to increase the visibility of the content they produced by presenting it in a different way. They are also working on a new interface. This table can also be found https://aisafety.info/toc/.
aisafety.info, the Table of Content
Here is a list of Q&A from https://aisafety.info/ . When I discovered the site, I was impressed by the volume of material produced. However, the interface is optimized for beginners. The following table of contents is for individuals who wish to navigate the various sections more freely. It was constructed by clustering the Q&A into subtopics. I’m not involved with aisafety.info, I just want to increase the visibility of the content they produced by presenting it in a different way. They are also working on a new interface. This table can also be found https://aisafety.info/toc/.
🆕 New to AI safety? Start here.
📘 Introduction to AI Safety
What is AI safety?
What is AI alignment?
What is AI governance?
What is the general nature of the concern about AI alignment?
What is the “control problem”?
What is the difference between AI safety, AI alignment, AI control, friendly AI, AI ethics, AI existential safety, and AGI safety?
Why would an AI do bad things?
How powerful will a mature superintelligence be?
Why is safety important for smarter-than-human AI?
How likely is extinction from superintelligent AI?
What are some introductions to AI safety?
🧠 Introduction to ML
What are large language models?
What is GPT-3?
What are OpenAI Codex and GitHub Copilot?
How does “chain-of-thought” prompting work?
How can progress in GPT-style non-agentic AI lead to capable AI agents?
What is compute?
What are scaling laws?
What are the “no free lunch” theorems?
What is the “Bitter Lesson”?
What is reinforcement learning (RL)?
🤖 Types of AI
What is artificial intelligence (AI)?
What is “narrow AI”? & What is artificial general intelligence (AGI)?
What is tool AI? & What is an agent?
What is “transformative AI”?
What are the differences between AGI, transformative AI, and superintelligence?
What is “superintelligence”?
What is a shoggoth?
What is “whole brain emulation”?
What are brain-computer interfaces?
🚀 Takeoff & Intelligence explosion
Takeoff
What is “AI takeoff”?
Why does AI takeoff speed matter?
What are the different possible AI takeoff speeds?
What is a singleton?
Intelligence explosion
What is an intelligence explosion?
How likely is an intelligence explosion?
How could an intelligence explosion be useful?
What are the differences between a singularity, an intelligence explosion, and a hard takeoff?
📅 Timelines
Expert surveys
What evidence do experts usually base their timeline predictions on?
When do experts think human-level AI will be created?
How quickly could an AI go from the first indications of problems to an unrecoverable disaster?
Are expert surveys on AI safety available?
Is Compute and Scaling enough?
Can we get AGI by scaling up architectures similar to current ones, or are we missing key insights?
How much resources did the processes of biological evolution use to evolve intelligent creatures?
From AGI to ASI
How might we get from artificial general intelligence to a superintelligent system?
Will we ever build a superintelligence?
How long will it take to go from human-level AI to superintelligence?
Is expecting large returns from AI self-improvement just following an exponential trend line off a cliff?
❗ Types of Risks
What are accident and misuse risks?
What are existential risks (x-risks)
What are the main sources of AI existential risk?
What are astronomical suffering risks (s-risks)?
What about other risks from AI?
How might things go wrong with AI even without an agentic superintelligence?
How might an “intelligence explosion” be dangerous?
Is large-scale automated AI persuasion and propaganda a serious concern?
What is a “treacherous turn”
What is mindcrime?
🔍 What would an AGI be able to do?
What is intelligence?
Why would intelligence lead to power?
Basic capabilities
How might AI socially manipulate humans?
Is it possible to block an AI from doing certain things on the Internet?
How likely is it that an AI would pretend to be a human to further its goals?
Advanced capabilities
How might AGI kill people?
Can you stop an advanced AI from upgrading itself?
How could a superintelligent AI use the internet to take over the physical world?
What could a superintelligent AI do, and what would be physically impossible even for it?
What is a “value handshake”?
Strategic implications
Can we test an AI to make sure that it’s not going to take over and do harmful things after it achieves superintelligence?
Why would we only get one chance to align a superintelligence?
Could we program an AI to automatically shut down if it starts doing things we don’t want it to?
🌋 Technical source of unalignment
Orthogonality thesis
What is the orthogonality thesis?
What are “human values”?
Why might we expect a superintelligence to be hostile by default?
What can we expect the motivations of a superintelligent machine to be?
Why would a misaligned superintelligence kill everyone in the world?
Specification Gaming
Why might a maximizing AI cause bad outcomes?
What is instrumental convergence?
What is corrigibility?
What is perverse instantiation?
Is it possible to code into an AI to avoid all the ways a given task could go wrong, and would it be dangerous to try that?
Can we constrain a goal-directed AI using specified rules?
Goal Misgeneralization
What is deceptive alignment?
What does Evan Hubinger think of Deception + Inner Alignment?
Outer and Inner alignment
What is outer alignment?
What are “mesa-optimizers”? & What is inner alignment?
What is the difference between inner and outer alignment?
What are the differences between subagents and mesa-optimizers?
🎉 Current prosaic solutions
What is imitation learning? & What is behavioral cloning?
What is reinforcement learning from human feedback (RLHF) & “Constitutional AI”?
How might interpretability be helpful?
How is red teaming used in AI alignment?
🗺️ Strategy
How likely is it that governments will play a significant role? What role would be desirable, if any?
What would a “warning shot” look like?
What is an alignment tax?
Might an aligned superintelligence force people to have better lives and change more quickly than they want?
Win conditions
What are the “win conditions” for AI alignment?
If we solve alignment, are we sure of a good future?
What are “pivotal acts”?
What is the “long reflection”?
What would a good future with AGI look like?
What would a good solution to AI alignment look like?
At a high level, what is the challenge of alignment that we must meet to secure a good future?
Race dynamics
Why might people try to build AGI rather than stronger and stronger narrow AIs?
What are some of the leading AI capabilities organizations?
Are Google, OpenAI, etc. aware of the risk?
What is the “windfall clause”?
All things considered
How doomed is humanity?
What are some arguments why AI safety might be less important?
Impact of AI Safety
Could AI alignment research be bad? How?
What are the potential benefits of AI as it grows increasingly sophisticated?
What are some objections to the importance of AI alignment?
💭 Consciousness
Could AI have emotions?
Are AIs conscious?
Do AIs suffer?
Could we tell the AI to do what’s morally right?
Is there a danger in anthropomorphizing AIs and trying to understand them in human terms?
❓ Not convinced? Explore the arguments.
🤨 Superintelligence is unlikely?
Why should we prepare for human-level AI technology now rather than decades down the line when it’s closer?
Might an “intelligence explosion” never occur?
Wouldn’t a superintelligence be slowed down by the need to do experiments in the physical world?
Can an AI really be smarter than humans?
Will AI be able to think faster than humans?
How can an AGI be smarter than all of humanity?
😌 Superintelligence won’t be a big change?
Won’t AI be just like us?
Isn’t AI just a tool like any other? Won’t it just do what we tell it to?
Do people seriously worry about existential risk from AI?
Are corporations superintelligent?
Isn’t capitalism the real unaligned superintelligence?
⚠️ Superintelligence won’t be risky?
Are there any detailed example stories of what unaligned AGI would look like?
Any AI will be a computer program. Why wouldn’t it just do what it’s programmed to do?
Aren’t robots the real problem? How can AI cause harm if it has no ability to directly manipulate the physical world?
Wouldn’t AIs need to have a power-seeking drive to pose a serious risk?
Won’t humans be able to beat an unaligned AI since we have a huge advantage in numbers?
Wouldn’t a superintelligence be wise?
Wouldn’t a superintelligence be smart enough not to make silly mistakes in its comprehension of our instructions?
Wouldn’t a superintelligence be smart enough to know right from wrong?
🤔 Why not just?
Why can’t we just turn the AI off if it starts to misbehave?
Once we notice that a superintelligence is trying to take over the world, can’t we turn it off, or reprogram it?
Why don’t we just not build AGI if it’s so dangerous?
Why can’t we just make a “child AI” and raise it?
Why can’t we just use Asimov’s Three Laws of Robotics?
Why can’t we just “put the AI in a box” so that it can’t influence the outside world?
Can’t we limit damage from AI systems in the same ways we limit damage from companies?
Why is AI alignment a hard problem?
🧐 Isn’t the real concern…
Isn’t the real concern misuse?
Isn’t the real concern technological unemployment?
Isn’t the real concern bias?
Isn’t the real concern autonomous weapons?
📜 I have certain philosophical beliefs, so this is not an issue
If I only care about helping people alive today, does AI safety still matter?
Why should someone who is religious worry about AI existential risk?
Does the importance of AI risk depend on caring about transhumanist utopias?
Wouldn’t it be a good thing for humanity to die out?
Is AI safety about systems becoming malevolent or conscious and turning on us?
Isn’t it immoral to control and impose our values on AI?
We’re going to merge with the machines so this will never be a problem, right?
Aren’t AI existential risk concerns just an example of Pascal’s mugging?
🔍 Want to understand the research? Dive deeper.
💻 Prosaic alignment
What is prosaic alignment?
Would AI alignment be hard with deep learning?
Scalable oversight
What is AI Safety via Debate?
What is adversarial training?
How is the Alignment Research Center (ARC) trying to solve Eliciting Latent Knowledge (ELK)?
What is “HCH”?
What is Iterated Distillation and Amplification (IDA)?
What is Eliciting Latent Knowledge (ELK)?
What does the scheme Externalized Reasoning Oversight involve?
Interpretability
What is interpretability and what approaches are there?
What is the difference between verifiability, interpretability, transparency, and explainability?
What are polysemantic neurons?
What is a “polytope” in a neural network?
What is feature visualization?
What is neural network modularity?
What does generative visualization look like in reinforcement learning?
Where can I learn about interpretability?
Conceptual advances
What is shard theory?
How can LLMs be understood as “simulators”?
Brain like AGI
What safety problems are associated with whole brain emulation?
How would we align an AGI whose learning algorithms / cognition look like human brains?
What is “biological cognitive enhancement”?
What are the ethical challenges related to whole brain emulation?
📝 Agent foundation
What is “agent foundations”?
Important concepts
Why do we expect that a superintelligence would closely approximate a utility maximizer?
What is a subagent?
What are “type signatures”?
What are “true names” in the context of AI alignment?
What is mutual information?
Decision theory
What are the different versions of decision theory?
What is “functional decision theory”?
What is “causal decision theory (CDT)”?
What is “evidential decision theory”?
What should I read to learn about decision theory?
Research directions
What is “Do what I mean”?
What are the power-seeking theorems?
Can you give an AI a goal which involves “minimally impacting the world”?
What is a “quantilizer”?
Would it improve the safety of quantilizers to cut off the top few percent of the distribution?
What is Infra-Bayesianism?
What is “coherent extrapolated volition (CEV)”?
What are the leading theories in moral philosophy and which of them might be technically the easiest to encode into an AI?
🏛️ Governance
Would a slowdown in AI capabilities development decrease existential risk?
Are there any AI alignment projects which governments could usefully put a very large amount of resources into?
What is everyone working on in AI governance?
What might an international treaty on the development of AGI look like?
Is the UN concerned about existential risk from AI?
🔬 Research Organisations
Overviews
What approaches are AI alignment organizations working on?
What is everyone working on in AI alignment?
What are the main categories of technical alignment research?
What are some AI alignment research agendas currently being pursued?
What are the different AI Alignment / Safety organizations and academics researching?
Briefly, what are the major AI safety organizations and academics working on?
Prosaic
Big labs
What is OpenAI’s alignment research agenda?
What is DeepMind’s safety team working on?
How does DeepMind do adversarial training?
Academic labs
What is Sam Bowman researching?
What projects are CAIS working on?
What is David Krueger working on?
Other Orgs
What is the Alignment Research Center (ARC)’s research agenda?
What is Ought’s research agenda?
What is Redwood Research’s agenda?
What is Aligned AI / Stuart Armstrong working on?
How does Redwood Research do adversarial training?
Agent Foundation
What is the Center for Human Compatible AI (CHAI)?
What are Scott Garrabrant and Abram Demski working on?
What technical problems is MIRI working on?
What is John Wentworth’s research agenda?
What does MIRI think about technical alignment?
What was Refine?
What is Dylan Hadfield-Menell’s thesis on?
Other
What is Obelisk’s research agenda?
What is the Center on Long-Term Risk (CLR)’s research agenda?
What is Encultured working on?
🤝 Want to help with AI safety? Get involved!
📌 General
What actions can I take in under five minutes to contribute to the cause of AI safety?
How and why should I form my own views about AI safety?
📢 Outreach
How can I work on public AI safety outreach?
What links are especially valuable to share on social media or other contexts?
How can I work on AGI safety outreach in academia and among experts?
How can I convince others and present the arguments well?
🧪 Research
I want to work on AI alignment.
📚 Education and Career Path
What master’s thesis could I write about AI safety?
What subjects should I study at university to prepare myself for alignment research?
I want to take big steps to contribute to AI alignment (e.g. making it my career). What should I do?
I would like to focus on AI alignment, but it might be best to prioritize improving my life situation first. What should I do?
How can I work toward AI alignment as a software engineer?
📋 Guidance and Mentorship
Where can I find mentorship and advice for becoming a researcher?
Who should I talk to about my non-research AI alignment coding project idea?
How can I get funding?
🧪 Projects and Involvement
I’d like to do experimental work (i.e. ML, coding) for AI alignment. What should I do?
I want to help out AI alignment without necessarily making major life changes. What are some simple things I can do to contribute?
How can I do conceptual, mathematical, or philosophical work on AI alignment?
What are some exercises and projects I can try?
How can I use a background in the social sciences to help with AI alignment?
How can I do machine learning programming work to help with AI alignment?
What should I do with my machine learning research idea for AI alignment?
What should I do with my idea for helping with AI alignment?
🏛️ Governance
What are some AI governance exercises and projects I can try?
What are some helpful AI policy resources?
How can I work on AI policy?
🛠️ Ops & Meta
Where can I find people to talk to about AI alignment?
How can I work on helping AI alignment researchers be more effective, e.g. as a coach?
How can I work on assessing AI alignment projects and distributing grants?
How can I do organizational or operations work around AI alignment?
💵 Help financially
Would donating small amounts to AI safety organizations make any significant difference?
I’m interested in providing significant financial support to AI alignment. How should I go about this?
📚 Other resources
Where can I find videos about AI Safety?
What training programs and courses are available for AGI safety?
Where can I learn more about AI alignment?
AI Safety Memes Wiki
What are some good resources on AI alignment?
What are some good podcasts about AI alignment?
What are some good books about AGI safety?
I’d like to get deeper into the AI alignment literature. Where should I look?
How can I update my emotional state regarding the urgency of AI safety?