[Question] Are language models close to the superhuman level in philosophy?

An alternative way to ask the question in the title of this post: what capacities do philosophers use that language models lack?

Reading OpenAI’s “Self-critiquing models for assisting human evaluators” and Google’s “Language Model Cascades” has led me to the idea that most philosophical methods can be programmed via combining language models in a certain cascade, fine-tuning them on existing philosophical literature, and applying the right kind of reinforcement pressures to the system. And I seriously wonder if using the current SoTA language models (e. g., PaLM) in this way would already produce superhuman-level philosophical writing, i. e. philosophical writing that human philosophers would have a harder time critiquing than any existing philosophy done by humans, on a given topic.

Good human philosophers should know a lot of existing arguments on various questions. Language models obviously have an edge over humans in this: they can easily “read” all the philosophical writing existing in the world.

Philosophers use creative arguments, examples, and thought experiments to test and compare their theories. Perhaps, AI philosophers can search for such arguments or examples using random sampling, then self-criticising an argument or self-assessing the quality of an example for a particular purpose, and then modifying or refining the argument or the example. Perhaps, even with the current SoTA models, this search will be closer to brute force than the intuitive inquiry of a human philosopher, and can potentially take a long time, since the models probably can’t yet understand the structure of philosophical theories well (if I don’t underestimate them!).

The only philosophical theories and arguments that still seem firmly out of reach of language models are those based on scientific theories, e. g. some theories in the philosophy of language or the philosophy of mind and consciousness based on certain theories in neuroscience or psychology or anthropology, philosophy of consciousness, probability, agency based on certain theories in physics, etc. However, philosophical writings of this kind should probably be treated as interpretations of the respective scientific theories rather than philosophical theories in their own right. In other words, these topics are actually the areas where sciences chip away from philosophy, as it has gradually happened with all the existing scientific knowledge, which in the past all belonged to the realm of philosophy. Interpretations of scientific theories are neither purely scientific nor philosophical knowledge, but something in between.

Also, most approaches to ethics at least in part rely on people’s moral intuitions, which themselves are at least partially non-linguistic and couldn’t be derived from language, thus inaccessible to language models. This raises many interesting questions: is there such a thing as AI-generated ethics? If yes, should it be treated as a branch of philosophy separate from the “classical” (human-generated) ethics? Could AIs engage in “classical” ethics even in principle, at least until we have whole-brain simulations (which might still be not enough, because humans have gut feelings about things)? Should we ban AIs from engaging in “their” ethics, because this seems like a sure path to misaligned thoughts? If yes, how to do this? This last question is equivalent to the question of how to align AGI, though.

No comments.