I’m an artist, writer, and human being.
To be a little more precise: I make video games, edit Wikipedia, and write here on LessWrong!
I’m an artist, writer, and human being.
To be a little more precise: I make video games, edit Wikipedia, and write here on LessWrong!
If that is the case, then I would very much like them to publicize the details for why they think other approaches are doomed. When Yudkowsky has talked about it in the past, it tends to be in the form of single-sentence statements pointing towards past writing on general cognitive fallacies. For him I’m sure that would be enough of a hint to clearly see why strategy x fits that fallacy and will therefore fail, but as a reader, it doesn’t give me much insight as to why such a project is doomed, rather than just potentially flawed. (Sorry if this doesn’t make sense btw, I’m really tired and am not sure I’m thinking straight atm)
Certainly for some people (including you!), yes. For others, I expect this post to be strongly demotivating. That doesn’t mean it shouldn’t have been written (I value honestly conveying personal beliefs and are expressing diversity of opinion enough to outweigh the downsides), but we should realistically expect this post to cause psychological harm for some people, and could also potentially make interaction and PR with those who don’t share Yudkowsky’s views harder. Despite some claims to the contrary, I believe (through personal experience in PR) that expressing radical honesty is not strongly valued outside the rationalist community, and that interaction with non-rationalists can be extremely important, even to potentially world-saving levels. Yudkowsky, for all of his incredible talent, is frankly terrible at PR (at least historically), and may not be giving proper weight to its value as a world-saving tool. I’m still thinking through the details of Yudkowsky’s claims, but expect me to write a post here in the near future giving my perspective in more detail.
that’s probably exactly what’s going on. The usernames were so frequent in the reddit comments dataset that the tokenizer, the part that breaks a paragraph up into word-ish-sized-chunks like ” test” or ” SolidGoldMagikarp” (the space is included in many tokens) so that the neural network doesn’t have to deal with each character, learned they were important words. But in a later stage of learning, comments without complex text were filtered out, resulting in your usernames getting their own words… but the neural network never seeing the words activate. It’s as if you had an extra eye facing the inside of your skull, and you’d never felt it activate, and then one day some researchers trying to understand your brain shined a bright light on your skin and the extra eye started sending you signals. Except, you’re a language model, so it’s more like each word is a separate finger, and you have tens of thousands of fingers, one on each word button. Uh, that got weird,
This is an incredible analogy
Hi, I joined because I was trying to understand Pascal’s Wager, and someone suggested I look up “Pascal’s mugging”… next thing I know I’m a newly minted HPMOR superfan, and halfway through reading every post Yudkowsky has ever written. This place is an incredible wellspring of knowledge, and I look forward to joining in the discussion!