shawnghu

Karma: 159

shawnghu 13 Jan 2026 1:02 UTC
1 point
0
in reply to: Roman Malov’s comment on: Taking LLMs Seriously (As Language Models)
I think I see the logic. Were you thinking of making the model good at answering questions whose correct answer depend on the model itself, like “When asked a question of the form x, what proportion of the time would you tend to answer y?”

The previous remark about being a microscope into its dataset seemed benign to me, e.g, if the model were already good at answering questions like “What proportion of datapoints satisfying predicates X satisfy predicate Y?”

But perhaps you also argue that the latter induces some small amount of self-awareness → situational awareness?

shawnghu 12 Jan 2026 7:09 UTC
2 points
0
in reply to: JennaS’s comment on: Why AIs aren’t power-seeking yet
While the particulars of your argument seem to me to have some holes, I actually very much agree with your observation we don’t know what the upper limit of properly orchestrated Claude instances are, and that targeted engineering of Claude-compatible cognitive tools could vastly increase its capabilities.

One idea I’ve been playing with for a really long time is that the Claudes aren’t the actual agents, but instead just small nodes or subprocesses in a higher-functioning mind. If I loosely imagine a hierarchy of Claudes, each corresponding roughly to system-1 or subconscious deliberative processes, with the ability to write and read to files as a form of “long term memory/processing space” for the whole system, and I imagine that by some magical oracle process they coordinate/delegate as well as Claudes possibly can, subject to a vague notion of “how smart Claude itself is”, I see no reason a system like this can’t already be an AGI, and cannot in principle be engineered into existence using contemporary LLMs.

(However, I will say that this thing sounds pretty hard to actually engineer, i.e, it being “just an engineering problem” doesn’t mean it would happen soon, but OTOH maybe it could if people would try the right approach hard enough. I can’t imagine a clean way of applying optimization pressure to the Claudes in any such setup that isn’t an extremely expensive and reward-sparse form of RL.)

shawnghu 11 Jan 2026 2:33 UTC
1 point
0
on: Taking LLMs Seriously (As Language Models)
(paraphrasing would be a markov kernel here, and with the transitivity property I mentioned earlier, I’m asking that achieves its stationary distribution in one iteration)
for this condition, if you also want symmetricity, this is a very strong condition; you’d only accept “lossless paraphrasings”. i think not only are you achieving the stationary distribution in one iteration but the distribution cannot change, so this is either a markov kernel for every semantically different phrase, or not-markov.

shawnghu 11 Jan 2026 0:17 UTC
1 point
0
on: Taking LLMs Seriously (As Language Models)
There is some danger in this suggestion: it can improve the situational awareness of the LLM.
Why?

shawnghu 10 Jan 2026 1:36 UTC
1 point
0
in reply to: beyarkay’s comment on: Where’s the $100k iPhone?
i think compute and networking speeds are honestly enough that most people struggle to take advantage of more of those things (streaming video is about the most data-intensive thing a lot of people do, and what’s above that is mostly actual computational tasks), so it would take (significant) additional innovations in figuring out how to convert these things into better experiences in order for this to be tenable. it seems a lot of the time that the line is usually drawn somewhere around gaming enthusiasts (e.g there is a cohort of people who will buy a more powerful smartphone so it can render graphics better so they can game on their phones more enjoyably, same for the display). this could be because economic incentives towards innovations in compute still favor commoditizable things, since compute is more generally useful (for the amount of work you could employ to make phones better for a small contingent of people who would buy them, you could just make some similarly advanced/complex system better for some industrial/trad-tech purpose and make way more money)

shawnghu 10 Jan 2026 1:05 UTC
3 points
0
in reply to: beyarkay’s comment on: Where’s the $100k iPhone?
i think there is a false premise assumed here. a lot of products are not luxury products because of superior quality that can be innovated past. it’s primarily raw signaling value, sort of fiated by the brand.

shawnghu 10 Jan 2026 1:04 UTC
9 points
5
in reply to: beyarkay’s comment on: Where’s the $100k iPhone?
alcohol is possibly cheating, because it’s not really food and an expensive luxury category of its own? (but even there, really, most of the wine all the way at the highest end is like $1000 for bottles you’d consume semi-regularly, and about $10 at the cheap end).

naively, fast food is about $30 a day (fast food is actually kinda expensive, i guess the cheapest things you can do, bulk beans and rice sorts of things, are about one OOM cheaper). i think $1000 a day is actually more than enough to afford an entire person to do skilled work all of the time, although you could push it upwards a bit more if they were truly elite at the thing that they do, but probably not one OOM more.

(accounting tends to get tricky up there because I imagine a big part of the value proposition for your worker past a certain point is a sort of security, access to other commodities/conveniences, connections, etc.)

but to the original point, i do really struggle to imagine how an iphone is meant to provide you more value. i think a lot of what an iphone is supposed to do for you in terms of productivity is better achieved by other means, and it’s hard to improve for entertainment at its form factor. as for signaling value, here are two websites that sell phone cases in the $10k range, as a sort of jewelry.

https://caviar.global/ https://leronza.com/24k-gold-luxury-samsung-galaxy-z-fold7/

i guess you could switch between these regularly, as with jewelry, to effectively recover another OOM.

shawnghu 30 Dec 2025 9:03 UTC
1 point
0
in reply to: leogao’s comment on: leogao’s Shortform
sadly don’t have any lemborexant, so can’t compare; i originally picked daridorexant naively due to its shorter half-life, thinking this corresponded to less daytime tiredness.

my naive understanding was actually also that lemborexant should be the one better at keeping you asleep, so it’s interesting to hear that it doesn’t seem to do that at all for you.

shawnghu 29 Dec 2025 18:36 UTC
1 point
0
on: Stop pressing the Try Harder button
failing to Be Deliberate
One fun consequence of defining this concept is that now also when you try hard and you don’t succeed, you can feel bad for failing to be deliberate.

shawnghu 29 Dec 2025 18:33 UTC
1 point
0
in reply to: leogao’s comment on: leogao’s Shortform
have you tried daridorexant?

shawnghu 29 Dec 2025 18:29 UTC
1 point
0
in reply to: DirectedEvolution’s comment on: leogao’s Shortform
i mean, it’s a huge category of people, so for some box 1 and for some box 2.

for me it’s box 2. i was bewildered to realize that i knew quite a lot of box 1′ers and we had very different reasons for hurting ourselves. (i transitioned to become more of a box 1′er later)

shawnghu 15 Dec 2025 1:22 UTC
1 point
2
on: Anthropic’s JumpReLU training method is really good
We implemented Anthropic’s JumpReLU training method in SAELens—give it a try!
This link is broken; may have to do with your migration to decoderesearch.

Edit: I’m pretty sure the correct link is now here

shawnghu 12 Dec 2025 1:05 UTC
38 points
15
in reply to: leogao’s comment on: leogao’s Shortform
thank you for this post. “bearish on vibes” is a great phrase. i am constantly hung up on the fact that it’s not really possible to “know what normal people are like”, “know what people are like generally”, “know what the world is actually like”, without significant amounts of effort.

i think this background fact taints like… most discussion of social and ethical issues.

shawnghu 28 Nov 2025 20:19 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: The Best Lack All Conviction: A Confusing Day in the AI Village
I suppose you mean that in a year this kind of thing will stop happening so obviously, but as you suggest, more complicated situations will still elicit this problem so by construction it’ll be harder to notice (and probably more impactful).

shawnghu 5 Nov 2025 20:26 UTC
1 point
0
in reply to: leogao’s comment on: Daniel Birnbaum’s Shortform
i’ve thought big names should do this for conference papers to keep conferences honest (peer review is anonymous but as i understand it it’s extremely obvious when a big name has written a paper)

shawnghu 22 Oct 2025 21:57 UTC
1 point
0
in reply to: lewis smith’s comment on: The ‘strong’ feature hypothesis could be wrong
Oh yeah, I’m certainly agreeing with the central intent of the post, now just clarifying the above discussion.

One clarification—here, as stated, “mechanisms operating in terms of linearly represented atoms” doesn’t constrain the mechanisms themselves to be linear, does it? SAE latents themselves are some nonlinear function of the actual model activations. But if the mechanisms are substantially nonlinear we’re not really claiming much.

My own impression is that things are nonlinear unless proven otherwise, and a priori I would really strongly expect the strong linear representation hypothesis to be just false. In general it seems extremely wishful to hope that exactly those things that are nonlinear (in whatever sense we mean) are not important, especially since we employ neural networks specifically to learn really weird functions we couldn’t have thought of ourselves.

shawnghu 21 Oct 2025 22:20 UTC
4 points
0
on: Humanity Learned Almost Nothing From COVID-19
Do you have any suggestions, or references to resources, for what individuals should do to be better prepared for another global pandemic?

shawnghu 21 Oct 2025 22:18 UTC
3 points
0
on: Humanity Learned Almost Nothing From COVID-19
Exceedingly virtuous exceptions exist, I’ll praise the ones I know of at the end. ↩︎

Where?

shawnghu 21 Oct 2025 21:52 UTC
1 point
0
in reply to: lewis smith’s comment on: The ‘strong’ feature hypothesis could be wrong
I see. If I understand you correctly, a mechanism, whether human-interpretable or not, which seems somehow to be functionally separate but not explainable in terms of their operations on linear subspaces of activation space, would count as evidence against the strong feature hypothesis, right?

Aren’t the MLPs in a transformer straightforward examples of this?

(BTW, I agree with the main thrust of the post. I think that the linear feature hypothesis in most usefully strong forms should be default-false unless proven otherwise; I appreciate the thing you said two comments up about how “disproving a vague hypothesis is a bit difficult”).

shawnghu 20 Oct 2025 17:26 UTC
2 points
0
in reply to: CstineSublime’s comment on: shawnghu’s Shortform
I didn’t do a lot of thorough research, but maybe I simply don’t know how to.

I googled around for resources, which usually leads to… I don’t know how to describe this, but short-form articles which are not very information dense and mutually contradictory, and I looked for opinions on Reddit and for an FAQ-like thing on /r/Ergonomics, which also didn’t tell me much definitive except that a) people have a variety of problems due to their variety of body shapes and b) it is a normal thing to want a desk that’s significantly lower than most desks.

I must have done some amount of Claude-querying, but it’s intensive to figure out what the root problems are here and whether there are canonical solutions to them, possibly because of the fact that the resources Claude would most easily reference are the same inadequate ones I’ve just described. I bet that it’s possible to figure this out with Claude if I go slowly and specifically enough, though.

I don’t think I found anything even approaching a central resource which claims to be comprehensive (however opinionated). Something like what they have at /r/bodyweightfitness, for example, would be excellent by the standards described here.