Negative Expertise

Link post

Marvin Minsky’s theory of Negative Expertise: Knowledge is typically considered in positive terms ,but can also be viewed in negative terms, a negative way to seem competent is to never make mistakes. The idea is that experts can be seen as those who know what not to do

Most of human knowledge is negative.

We can only be certain about things we create for ourselves like logic and mathematics. But we can reduce the chances of making mistakes by learning two different types of knowledge: Find “islands of consistency” and their boundaries. “Rule-based-systems” can contribute to negative expertise. Minsky argues that avoiding actions that could cause trouble is often more important than taking positive measures to prevent accidents. This is similar to the “info-hazards” theory.

RL Agents are by definition negative experts.

They can be considered as “negative experts” in some sense, in that they are trained to avoid certain actions that lead to negative outcomes or penalties, but also taking specific actions that lead to positive rewards.

Fault-tolerant systems

very large neural networks could be prone to accumulate too many interconnections and become paralyzed by oscillations or instabilities. One might have to provide a variety of alternative sub-systems. Perhaps we need a call for “Insulationists” researchers. This systems should be able to shut down themselves and rehash in new forms using some tripwire

Insulationists focus on designing AI systems that are insulated from one another, while interpretability researchers focus on understanding how AI systems make decisions and making them more transparent and understandable. It would be interesting to see an appetite for this.

FunnyGPT Models: Minsky suggests that jokes, like negative expertise, serve a cognitive function of helping people to think of ideas outside the overton-window and navigate their mental “censors” in a safe way. He believes that jokes can be seen as a form of negative expertise.

GPT-3, as a language model, can be seen as a “negative expert” in the sense that it has been trained on a large dataset of text and can identify and avoid certain patterns of language through negative sampling. Curious, to what extend is this theory studied today?