This is my first post on lesswrong. I’ll merely be linkposting content on epistemics and alignment here while getting more familiar with the culture.
tl;dr:
We attempt to automatically infer one’s beliefs from their writing in three different ways. Initial results based on Twitter data hint at embeddings and language models being particularly promising approaches.
2 votes
Overall karma indicates overall quality.
0 votes
Agreement karma indicates agreement, separate from overall quality.
What do you think the results would be like if you try to use a language model to automatically filter for direct-opinion tweets and do automatic negation?
2 votes
Overall karma indicates overall quality.
0 votes
Agreement karma indicates agreement, separate from overall quality.
We tried using (1) subjectivity (based on simple bag-of-words), and (2) zero-shot text classification (NLI-based) to help us sift through the years of tweets in search for bold claims. (1) seemed a pretty poor heuristic overall, and (2) was still super noisy (e.g. It would identify “that’s awesome” as a bold claim, not particularly useful). The second problem was that even if tweets were identified as containing bold claims, those were often heavily contextualized in a reply thread, and so we tried decontextualizing those manually to increase the signal-to-noise ratio. Also, we were initially really confident that we’d use our automatic negation pipeline (i.e. few-shot prompt + DALL-E-like reranking of generations based on detected contradictions and minimal token edit distance), though in reality it would take way way longer than manual labeling given our non-existent infra.
I agree that all those manual steps are huge sources of experimenter bias, though. Doing it the way you suggested would improve replicability, but also increase noise and compute demands.
1 vote
Overall karma indicates overall quality.
0 votes
Agreement karma indicates agreement, separate from overall quality.
Cool to hear you tried it!