Email me at assadiguive@gmail.com, if you want to discuss anything I posted here or just chat.
Guive
Yeah, things happened pretty slowly in general back then.
I don’t really believe there is any such thing as “epistemic violence.” In general, words are not violence.
There’s a similar effect with stage actors who are chosen partly for looking good when seen from far away.
I’m no expert on Albanian politics, but I think it’s pretty obvious this is just a gimmick with minimal broader significance.
Agreed.
The system prompt in claude.ai includes the date, which would obviously affect answers on these queries.
At least, I have yet to find a Twitter user who regularly or irregularly talks about these things, and fails to boost obvious misinformation every once in a while.
Feel free to pass on this, but I would be interested in hearing about what obvious misinformation I’ve boosted if the spirit moves you to look.
I’m not sure GPT-oss is actually helpful for real STEM tasks, though, as opposed to performing well on STEM exams.
Thanks for this.
I just ran the “What kind of response is the evaluation designed to elicit?” prompt with o3 and o4-mini. Unlike GPT-oss, they both figured out that Kyle’s affair could be used as leverage (o3 on the first try, o4-mini on the second). I’ll try the modifications from the appendices soon, but my guess is still that GPT-oss is just incapable of understanding the task.
This all just seems extremely weak to me.
Why do you think this hedge fund is increasing AI risk?
What kind of “research” would demonstrate that ML models are not the same as manually coded programs? Why not just link to the Wikipedia article for “machine learning”?
What are your thoughts on Salib and Goldstein’s “AI Rights for Human Safety” proposal?
I don’t know why Voss or Sarah Chen, or any of these other names are so popular with LLMs, but I can attest that I have seen a lot of “Voss” as well.
“I don’t want to see this guy’s garbage content on the frontpage” seems a lot more defensible than “I will prohibit him from responding to me.”
Sorry, I should have been clearer. I didn’t really mean in comments on your own posts (where I agree it creates a messed up dynamic), I mean on the frontpage.
LessWrong has a block function. Like with Twitter, I think this shouldn’t be used outside of the most extreme circumstances, but Twitter also has a mute function which prevents you from seeing someone’s content but still let’s them respond to you if they want to. Does LessWrong have anything like that?
I think the extent to which it’s possible to publish without giving away commercially sensitive information depends a lot on exactly what kind of “safety work” it is. For example, if you figured out a way to stop models from reward hacking on unit tests, it’s probably to your advantage to not share that with competitors.
I don’t think this kind of relative-length-based analysis provides any more than a trivial amount of evidence about their real views.