The Next ChatGPT Moment: AI Avatars

Epistemic Status: Speculative. Dependent on intuitions about near-term AI tech and human psychology.

Claim: Within the next 1-3 years, many people will have an interaction with an AI avatar that feels authentically human. This will significantly amplify the public perception of current AI capabilities and risks.

An AI avatar is a realistic AI-generated render of a human (speech and video) that can have a real-time conversation with a human, for example over a video call.

The individual components needed to implement AI avatars already exist. AI is capable of holding a conversation over text, transcribing speech to text, and synthesizing natural-sounding speech.[1] Generating photorealistic video of a talking human is currently limited, but still impressive and making rapid progress.

Taken together, these capabilities mean it will soon be possible to create a realistic AI avatar. The first generation avatars will be a bit rough, especially the rendered video, but overall there don’t seem to be large conceptual hurdles to creating convincing AI avatars.[2]

Personal conversation with a high-quality AI avatar will have a significant emotional and mental impact on most people.[3] The impact will be especially acute for people distant from the world of AI, but will also affect those familiar with AI.

For humans, communication medium matters just as much as content. The same words can hit much harder when spoken in an emotive voice by an expressive face, than when silently read off a screen. Having a realistic personal conversation with an AI avatar will change people’s gut-level intuitions about AI.

For better or worse, once decent AI avatars become generally accessible, public sentiment around AI will experience another shift comparable to the one spurred by ChatGPT.[4] AI will be perceived as more human-like and capable. It will seem like an independent agent that possesses “true intelligence”.

After talking with a realistic AI avatar, the common refrains of “It’s not actually intelligent, it just predicts the next token” and “Why would it want anything?” won’t resonate with the public. For many people, consciousness is a prerequisite for real AI, and human-like AI avatars will appear to be a direct instantiation of that.

ChatGPT’s release was a cultural moment.[5] It captured the public’s imagination and triggered a reclassification of AI from sci-fi to present reality. AI avatars could bring on another cultural moment that shifts public perception even further.

The upcoming shift is predictable—AI avatars don’t require any fundamental technical breakthroughs. It’s a major evolution that we have the rare opportunity to prepare for in advance.

  1. ^

    Speech-to-text is good enough (OpenAI Whisper), text-to-speech is nearly good enough (ElevenLabs), and conversation /​ language modeling is good enough (ChatGPT with a Character.ai-style personality). All this currently suffices for realistic audio conversation with an AI. Human video generation isn’t quite good enough yet, but it’s making progress (Audio to Photoreal, HeyGen, Metahuman). Based on the current rate of progress, a functional AI avatar seems attainable within 1-3 years.

  2. ^

    Latency might be a problem in the near-term. In particular, it’s unclear how fast the video generation will be.

  3. ^

    This is already happening to a limited extent. Many people have formed significant emotional attachments through text-only interactions with relatively weak language models (e.g. Character.ai and Replika).

  4. ^

    The shift could be more gradual than ChatGPT’s, though. AI avatar tech is improving gradually whereas ChatGPT was dropped sui generis on the world.

  5. ^

    The Google Trends chart for “AI”. ChatGPT came out on November 30, 2022.