OC ACXLW AI interpretability Breakthrough from anthropic 11/​11/​23

Hello Folks!

We are excited to announce the 48th Orange County ACX/​LW meetup, happening this Saturday and most Saturdays thereafter.

Host: Michael Michalchik

Email: michaelmichalchik@gmail.com (For questions or requests)

Location: 1970 Port Laurent Place

(949) 375-2045

Date: Saturday, Nov 11, 2023

Time: 2 PM

Conversation Starters :

The first concrete step towards AI alignment and safety and our ability to make it highly useful?!

Journal club video:

Community Paper Reading: Decomposing Language Models Into Understandable Components

Short paper walkthrough:


Anthropic Solved Interpretability?

The Paper itself: https://​​transformer-circuits.pub/​​2023/​​monosemantic-features/​​index.html

Zvi Moshowitz reports on the Paper:


Zvi Moshowitz reports on the reactions to the Paper:


This is a chatGPT glossary and brief overview of the ideas:


  • Walk & Talk: We usually have an hour-long walk and talk after the meeting starts. Two mini-malls with hot takeout food are readily accessible nearby. Search for Gelson’s or Pavilions in the zip code 92660.

  • Share a Surprise: Tell the group about something unexpected that changed your perspective on the universe.

  • Future Direction Ideas: Contribute ideas for the group’s future direction, including topics, meeting types, activities, etc.

No comments.