failing to Be Deliberate
One fun consequence of defining this concept is that now also when you try hard and you don’t succeed, you can feel bad for failing to be deliberate.
failing to Be Deliberate
One fun consequence of defining this concept is that now also when you try hard and you don’t succeed, you can feel bad for failing to be deliberate.
have you tried daridorexant?
i mean, it’s a huge category of people, so for some box 1 and for some box 2.
for me it’s box 2. i was bewildered to realize that i knew quite a lot of box 1′ers and we had very different reasons for hurting ourselves. (i transitioned to become more of a box 1′er later)
thank you for this post. “bearish on vibes” is a great phrase. i am constantly hung up on the fact that it’s not really possible to “know what normal people are like”, “know what people are like generally”, “know what the world is actually like”, without significant amounts of effort.
i think this background fact taints like… most discussion of social and ethical issues.
I suppose you mean that in a year this kind of thing will stop happening so obviously, but as you suggest, more complicated situations will still elicit this problem so by construction it’ll be harder to notice (and probably more impactful).
i’ve thought big names should do this for conference papers to keep conferences honest (peer review is anonymous but as i understand it it’s extremely obvious when a big name has written a paper)
Oh yeah, I’m certainly agreeing with the central intent of the post, now just clarifying the above discussion.
One clarification—here, as stated, “mechanisms operating in terms of linearly represented atoms” doesn’t constrain the mechanisms themselves to be linear, does it? SAE latents themselves are some nonlinear function of the actual model activations. But if the mechanisms are substantially nonlinear we’re not really claiming much.
My own impression is that things are nonlinear unless proven otherwise, and a priori I would really strongly expect the strong linear representation hypothesis to be just false. In general it seems extremely wishful to hope that exactly those things that are nonlinear (in whatever sense we mean) are not important, especially since we employ neural networks specifically to learn really weird functions we couldn’t have thought of ourselves.
Do you have any suggestions, or references to resources, for what individuals should do to be better prepared for another global pandemic?
Exceedingly virtuous exceptions exist, I’ll praise the ones I know of at the end. ↩︎
Where?
I see. If I understand you correctly, a mechanism, whether human-interpretable or not, which seems somehow to be functionally separate but not explainable in terms of their operations on linear subspaces of activation space, would count as evidence against the strong feature hypothesis, right?
Aren’t the MLPs in a transformer straightforward examples of this?
(BTW, I agree with the main thrust of the post. I think that the linear feature hypothesis in most usefully strong forms should be default-false unless proven otherwise; I appreciate the thing you said two comments up about how “disproving a vague hypothesis is a bit difficult”).
I didn’t do a lot of thorough research, but maybe I simply don’t know how to.
I googled around for resources, which usually leads to… I don’t know how to describe this, but short-form articles which are not very information dense and mutually contradictory, and I looked for opinions on Reddit and for an FAQ-like thing on /r/Ergonomics, which also didn’t tell me much definitive except that a) people have a variety of problems due to their variety of body shapes and b) it is a normal thing to want a desk that’s significantly lower than most desks.
I must have done some amount of Claude-querying, but it’s intensive to figure out what the root problems are here and whether there are canonical solutions to them, possibly because of the fact that the resources Claude would most easily reference are the same inadequate ones I’ve just described. I bet that it’s possible to figure this out with Claude if I go slowly and specifically enough, though.
I don’t think I found anything even approaching a central resource which claims to be comprehensive (however opinionated). Something like what they have at /r/bodyweightfitness, for example, would be excellent by the standards described here.
It feels to me like evaluating any of the sentences in your comment rigorously requires a more specific definition of “feature”.
While these are logically distinct things, can you think of an experiment that would be able to distinguish the two even in principle? In other words, you say “our existing techniques can’t yet”—but what would one that can distinguish these even look like?
I don’t necessarily disagree with this way of looking at things.
Serious question—how do you calibrate the standard by which you judge that something is good enough to warrant respect, or self-respect?
To illustrate what I mean, in one of your examples you judge your project teammates negatively for not having had the broad awareness to seriously learn ML ahead of time, in the absence of other obvious external stimuli to do so (like classes that would be hard enough to actually require that). The root of the negative judgment is that a better objective didn’t occur to them.
How can you ever be sure that there isn’t a better objective that isn’t occurring to you, at any given time? More broadly, how can you be sure that there isn’t just a generally better way of living, that you’re messing up for not currently doing?
If, hypothetically, you encountered a better version of yourself that presented you with a better objective and ways of living better, would you retroactively judge your life up to the present moment as worse and less worthy of respect? (Perhaps, based on the answer to the previous problem, the answer is “yes”, but you think this is an unlikely scenario.)
Does anyone have a rigorous reference or primer on computer ergonomics, or ergonomics in general? It’s hard to find a reference that says with authority/solid reasoning what good ergonomics are and why, and solutions to common problems.
Hey, I think I relate to this! I didn’t expect to see this phenomenon described anywhere, and I’m happy that you took the time to describe it.
I think I was able to improve on this (or, the aspects of this that were an issue for me) by coming up with new ways of expressing what I think and feel in lossier/less-precise/more vibe-driven ways.
Does anyone have a sense of whether, qualitatively, RL stability has been solved for any practical domains?
This question is at least in part asking for qualitative speculation about how the post-training RL works at big labs, but I’m interested in any partial answer people can come up with.
My impression of RL is that there are a lot of tricks to “improve stability”, but performance is path-dependent in pretty much any realistic/practical setting (where state space is huge and action space may be huge or continuous). Even for larger toy problems my sense is that various RL algorithms really only work like up to 70% of the time, and 30% of the time they randomly decline in reward.
One obvious way of getting around this is to just resample. If there are no more principled/reliable methods, this would be the default method of getting a good result from RL. It would follow that that’s just what the big labs do. But of course they may have secretly solved some stuff, but it’s hard to imagine what the form of that would be.
followup: after looking at the appendix i’m pretty sure the biggest distinction is that the SFT’d models in this paper are SFT’d on data that comes from entirely different models/datasets. so not only is the data not coming from a policy which adapts during training, it’s coming from policies very different from the model’s own. i think this by itself is enough to explain the results of the paper; i think this is a useful result but not the one i imagined upon reading the title.
i would still like to know the answer to the original question.
sadly don’t have any lemborexant, so can’t compare; i originally picked daridorexant naively due to its shorter half-life, thinking this corresponded to less daytime tiredness.
my naive understanding was actually also that lemborexant should be the one better at keeping you asleep, so it’s interesting to hear that it doesn’t seem to do that at all for you.