Wikipedia’s principles require it to rely on external analysts of news,
This seems like a nice example of Wikipedia preventing itself from conquering what it cannot defend.
Wikipedia’s principles require it to rely on external analysts of news,
This seems like a nice example of Wikipedia preventing itself from conquering what it cannot defend.
I tried copy-pasting just your prompt into a fresh Claude Opus 4.6 instance just to see what would happen. Interestingly, it did make at least one mistake in the sense that it filled one blank differently from my high-effort version. But when I asked about it, it agreed that the high-effort version was better.
https://claude.ai/share/726ce3fd-9f57-4532-b37d-ea7e9eced079
I tried a high-effort version just to see if we can establish a baseline. Did I get it correct?
I basically acted as Claude’s accountability buddy, encouraging it to use Python scripts and independently verify its own work and do things step by step.
Note the chat is quite long, so please scroll to the end for a version of the paragraph with all the answers filled in.
I’d try something like
:::spoiler “Is there a situation you are likely to run into today where you’d want to have makeup on, before you have a chance to reapply it?” The idea would be to get her to actually think about, and reply with “Huh, no actually, guess I’ll go swim” or “Uhh idk, I won’t swim tho” (in the case she actually has some other reason for not swimming) :::
This was very interesting to read. Thank you for writing up such a detailed example!
> Jimmy: Sounds like you’re not too happy with your new identity as “fat kid,” and are kinda pissed at life for pushing you to accept it. Jimmy: I’m gonna go out on a limb and predict that the reason you’re pissed and haven’t fully accepted it is that “accepting it” kinda feels like “accepting your fate.” Jimmy: You don’t want the rest of your life to be this way. The time you already spent messed up is fine. Spending a few more years recovering is fine. Shitty, yes, but fine. Jimmy: It’s the idea that it’s over, and all you got to look forward to is being a cripple in pain forever that pisses you off, and which you don’t want to accept. Jimmy: Am I wrong? Jimmy: Because if that’s the case, then yeah, fuck that. I wouldn’t accept that shit either, and being pissed off sounds like approximately the right reaction. Jimmy: It’s not time to accept that which hasn’t been determined.
> Okay, I think this is the important piece. This is the piece of truth that he could sense but hadn’t integrated which would have been lost “just accepting” the pain and feeling okay without doing something with the message. This is why he pushed back, rather than “just” treating pain like any other information. To him, the pain seemed to be saying “Your life is over. You’re a cripple now”, so if he says “okay”, then it’s “Okay, I’m a cripple”, an expectation of being a cripple, and therefore no ability to work towards not being a cripple. At least, not with his heart in it.
As someone who has struggled with accepting pain for exactly this reason, I feel really seen by this passage
On a similar note, I wonder if a more promising angle of attack is chemical research to find another compound that might be even more effective.
Given that sumatriptan and DMT are pretty similar, and the psychedelic effects of DMT are apparently not relevant, it’s plausible that there might be an even better molecule out there.
If that molecule did not have psychedelic effects, that would be ideal in terms of quick widespread adoption.
I recently learned that the Starship Troopers movie started out like this.
To quote Wikipedia
Development of Starship Troopers began in 1991 as Bug Hunt at Outpost 7, written by Neumeier. After recognizing similarities between Neumeier’s script and Heinlein’s book, producer Jon Davison suggested aligning the script more closely with the novel to garner greater interest from studio executives.
I suspect the clearest way to think about this is to carefully distinguish between the RL “agent” as defined by a learned policy (a mapping from states to actions) and the RL algorithm used to train that policy.
The RL algorithm is designed to create an agent which maximises reward.
The “goal” of an RL policy may not always be clear, but using Dennett’s intentional stance we can define it as “the thing it makes sense/compresses observations to say the policy appears to be maximising”.
Then I understand this post to be saying “The goal of an RL policy is not necessarily the same as the goal of the RL algorithm used to train it.”
Is that right?
Thank you for recording and posting these, I feel like I learned a lot, both about how to have conversations and lots of little details like the restaurant thing as proto preference synthesizer and the trauma cancer analogy and the Muhammad story and the disendorsing all judgements/resentments thing.
I wonder if, just like young people not thinking clearly about mortality, it’s just something healthy people don’t tend to think about, partly because it’s depressing.
(I’m also someone who got a lot more interested in this kind of thing after my own health issues)
re institutional incentives, I’ve heard that part of US News rankings are based on asking survey respondents to evaluate other universities by reputation. Professors elsewhere (can only, and do) evaluate other professors based on the quality of their research, not teaching.
I’m curious, did you check what the quality of teaching would be like at your university before you went? If not, why? If so, why did you pick it anyway?
to clarify, I don’t understand why positive CICO can increase your weight set point but negative CICO can’t decrease it.
Guyenet suspects that our brain’s weight set point might never go down dramatically after living long enough in the modern world, even if we eventually stop eating palatable food altogether. If true, this would make his theory harder to test, and again, his theory would earn a penalty for being more unfalsifiable, but at the same time, we should be clear about what observations his theory strongly predicts, and rapid weight loss on unpalatable diets is just not one of them.
I don’t understand how CICO can coexist with the idea of a weight set point. If the mechanism of gaining weight is CICO via overeating because food is so palatable, then it seems natural than on unpalatable food you would eat less, and thus I would expect rapid weight loss on unpalatable diets as a prediction of the theory.
I was confused by Buck’s response here because I thought we were going for worst-case quality until I realised:
The model will have low quality on those prompts almost by definition—that’s the goal.
Given that, we also want to have a generally useful model—for which the relevant distribution is ‘all fanfiction’, not “prompts that are especially likely to have a violent continuation”.
In between those two cases is ‘snippets that were completed injuriously in the original fanfic … but could plausibly have non-violent completions’, which seems like the interesting case to me.
I suppose one possibility is to construct a human-labelled dataset of specifically these cases to evaluate on.
Could you explain why you believe this? Do you mean this in a sense of “if history was different and human nature was different, there could have existed a society with the same population and total material wealth as ours in which everyone has a decent standard of living”? Or that (assuming human nature was different), there exists an “industrial+philanthropy policy” that would move us to the everyone-decent state from our current state without an increase in total material wealth?