Julian_R

Karma: 36

Julian_R 12 Aug 2025 16:51 UTC
1 point
0
in reply to: Elizabeth’s comment on: Elizabeth’s Shortform
I recently learned that the Starship Troopers movie started out like this.

To quote Wikipedia
Development of Starship Troopers began in 1991 as Bug Hunt at Outpost 7, written by Neumeier. After recognizing similarities between Neumeier’s script and Heinlein’s book, producer Jon Davison suggested aligning the script more closely with the novel to garner greater interest from studio executives.

Julian_R 12 Aug 2025 11:54 UTC
1 point
0
on: Reward is not the optimization target
I suspect the clearest way to think about this is to carefully distinguish between the RL “agent” as defined by a learned policy (a mapping from states to actions) and the RL algorithm used to train that policy.
The RL algorithm is designed to create an agent which maximises reward.
The “goal” of an RL policy may not always be clear, but using Dennett’s intentional stance we can define it as “the thing it makes sense/compresses observations to say the policy appears to be maximising”.

Then I understand this post to be saying “The goal of an RL policy is not necessarily the same as the goal of the RL algorithm used to train it.”

Is that right?

Julian_R 18 Aug 2024 21:53 UTC
10 points
1
on: Extended Interview with Zhukeepa on Religion
Thank you for recording and posting these, I feel like I learned a lot, both about how to have conversations and lots of little details like the restaurant thing as proto preference synthesizer and the trauma cancer analogy and the Muhammad story and the disendorsing all judgements/resentments thing.

Julian_R 29 May 2024 2:25 UTC
3 points
0
in reply to: riceissa’s comment on: How to get nerds fascinated about mysterious chronic illness research?
I wonder if, just like young people not thinking clearly about mortality, it’s just something healthy people don’t tend to think about, partly because it’s depressing.

(I’m also someone who got a lot more interested in this kind of thing after my own health issues)

Julian_R 20 Nov 2023 5:03 UTC
13 points
6
on: Am I going insane or is the quality of education at top universities shockingly low?
re institutional incentives, I’ve heard that part of US News rankings are based on asking survey respondents to evaluate other universities by reputation. Professors elsewhere (can only, and do) evaluate other professors based on the quality of their research, not teaching.

I’m curious, did you check what the quality of teaching would be like at your university before you went? If not, why? If so, why did you pick it anyway?

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?

StefanHex and Julian_R

25 Oct 2022 20:48 UTC

15 points

2 comments4 min readLW link

Julian_R 11 Feb 2022 13:48 UTC
1 point
0
in reply to: Julian_R’s comment on: Clarifying the palatability theory of obesity
to clarify, I don’t understand why positive CICO can increase your weight set point but negative CICO can’t decrease it.

Julian_R 11 Feb 2022 13:46 UTC
1 point
0
on: Clarifying the palatability theory of obesity

Guyenet suspects that our brain’s weight set point might never go down dramatically after living long enough in the modern world, even if we eventually stop eating palatable food altogether. If true, this would make his theory harder to test, and again, his theory would earn a penalty for being more unfalsifiable, but at the same time, we should be clear about what observations his theory strongly predicts, and rapid weight loss on unpalatable diets is just not one of them.

I don’t understand how CICO can coexist with the idea of a weight set point. If the mechanism of gaining weight is CICO via overeating because food is so palatable, then it seems natural than on unpalatable food you would eat less, and thus I would expect rapid weight loss on unpalatable diets as a prediction of the theory.

Julian_R 14 Oct 2021 23:35 UTC
1 point
0
in reply to: Buck’s comment on: Redwood Research’s current project
I was confused by Buck’s response here because I thought we were going for worst-case quality until I realised:
1. The model will have low quality on those prompts almost by definition—that’s the goal.
2. Given that, we also want to have a generally useful model—for which the relevant distribution is ‘all fanfiction’, not “prompts that are especially likely to have a violent continuation”.
In between those two cases is ‘snippets that were completed injuriously in the original fanfic … but could plausibly have non-violent completions’, which seems like the interesting case to me.
I suppose one possibility is to construct a human-labelled dataset of specifically these cases to evaluate on.

Julian_R’s Shortform

Julian_R23 Oct 2020 22:15 UTC

1 point

0 comments LW link