martinkunev

Karma: 200

martinkunev 20 Mar 2026 17:09 UTC
1 point
0
on: Clarifying wireheading terminology
I don’t think I’ve heard people calling 1 wireheading. It certainly isn’t whatI have in mind when I hear the term.
For 2, I’m interested to see a non-embedded example. If the agent can tamper the input to its reward function, wouldn’t that make it embedded?

martinkunev 16 Mar 2026 11:26 UTC
1 point
0
on: Is CDT with precommitment enough?
The Death in Damascus thought experiment shows some conceptual shortcomings of CDT which precommitment cannot deal with.
In this thought experiment, CDT could (depending on implementation) enter into an infinite loop or keep on wasting resources to move between Damascus and Aleppo.

martinkunev 14 Mar 2026 0:05 UTC
1 point
0
on: How To Escape Super Mario Bros
I like the idea a lot. There are some details I’d question (coming up with concepts of attention, boredom or gravity doesn’t feel easy to me). I don’t think there is enough information to reconstruct a human face. I appreciate the post, because I would have strived for perfection and either gave up or never finished it.

martinkunev 19 Dec 2025 11:36 UTC
1 point
0
in reply to: Brendan Long’s comment on: Leading by example
That’s an interesting approach. My thinking at the time was that the number of vaccines administered gives legitimacy to the government’s decisions. There were several other factors I considered besides the mandate:
- All this happened while the vaccine was still in experimental stage in France.
- There was political motivation to use all the vaccines already ordered. As far as I’m aware, the frequency of boosting was not justified by any science.
- The benefits of the vaccines were exaggerated in public discourse (“you’ll never get sick”, “the virus will disappear”, etc.). The priority was never honest communication but manipulating people.
- There was an attempt to ridicule any opposition to the vaccine mandate as being motivated by woo-woo.
Just to be clear, I still wasn’t sure I was making the right choice at the time and I would have gotten vaccinated if there was no mandate.

martinkunev 5 Dec 2025 0:25 UTC
4 points
0
on: The Moonrise Problem
What is the relationship between convergent factorization and natural latent?

martinkunev 24 Nov 2025 1:49 UTC
1 point
0
in reply to: Adele Lopez’s comment on: Towards a Typology of Strange LLM Chains-of-Thought
this sounds like 3. Context Refresh

martinkunev 16 Nov 2025 0:32 UTC
1 point
0
on: Maybe Social Anxiety Is Just You Failing At Mind Control
60% resolved my social anxiety
I’m honestly curious how you measure that. I’ve been asking myself “am I progressing with respect to social anxiety” and I don’t have a good answer.

martinkunev 3 Nov 2025 22:52 UTC
2 points
0
on: Decision theory when you can’t make decisions
To me this reads as a definition of the term “free will” and then arguing against its existence.
If something can predict your action with better than random accuracy no matter how hard you try to prevent them, you don’t have free will over that action
The word “try” here hides lots of complexity and does some heavy lifting. Can you taboo it and rephrase that? Also, are you rejecting the computational theory of mind?
In the case of the brain-scanner, I’d say “the second before” just reflects the fact that taking an action in the physical world takes >0 time, and once you’ve made the decision to start the action you are no longer free to reverse it
If you take an agent embedded in the environment, I don’t think there is a clear-cut boundary between “not yet made a decision” and “have made the decision”. This abstraction just breaks down.

martinkunev 19 Oct 2025 23:47 UTC
1 point
0
on: When is a mind me?
My fear with the teleporter has always been the engineering details—can it get a consistent snapshot of me? what about the last moments after the snapshot and before the old copy is destroyed? can it reliably reconstruct me? what happens in case of a failure?
Assuming many worlds quantum mechanics, we should have similar anticipations for forking into two and for tossing a quantum coin.

martinkunev 12 Oct 2025 9:12 UTC
1 point
0
on: We are likely in an AI overhang, and this is bad.
There is no rollback when open-weight models are almost SOTA. Could we convince people like Zuck that open weights are too risky? I seriously doubt it.

martinkunev 27 Sep 2025 23:17 UTC
2 points
0
on: Female sexual attractiveness seems more egalitarian than people acknowledge
There are some differences between what women consider attractive female traits and what men actually find attractive. The example I like to give is how some women enlarge their lips even though men (usually) don’t find that attractive.
I think these are the relevant points:
Maybe women’s sense of each other’s beauty is more discriminating than men’s.
Some women probably don’t distinguish a sex symbol’s actual physical attractiveness from the other characteristics that would make her rarely and especially appealing to women (like fame, wealth, etc.), and they’re neglecting to account for these factors being less salient to men.

martinkunev 19 Sep 2025 11:13 UTC
1 point
0
on: Corrigibility, Much more detail than anyone wants to Read
This:
Might agent $b$ rewrite agent $a$ ‘s brain to make agent $a$ better satisfy agent $b$ ’s utility function? Most forms of wire-heading inherently limit the ability of agents to affect the future
and this
We have not proved that agent $b$ does not try to affect agent $a$ ‘s utility function (in fact, I expect in many cases agent $b$ does try to influence agent $a$ ’s utility function).
appear to be in conflict. Are you trying to say that depending on the circumstances b may try to influence a’s utility function or avoid doing so?

martinkunev 19 Aug 2025 1:19 UTC
2 points
0
on: Do you even have a system prompt? (PSA / repo)
How important is it to keep the system prompt short? I guess this would depend on the model, but does anybody have useful tips on that?

martinkunev 8 Aug 2025 0:50 UTC
−1 points
0
in reply to: Shankar Sivarajan’s comment on: Open weights == Closed source
well-established
The usage I’m objecting to started, as far as I can tell, about 2 years ago with Llama 2. The term “open weights”, which is often used interchangably, is a much better fit.

martinkunev 7 Aug 2025 22:04 UTC
2 points
0
in reply to: Brendan Long’s comment on: Open weights == Closed source
At some point the open/closed distinction becomes insufficient as a description. You could very well have an open-source wrapper (or fine-tuning) of something which is closed-source. Just try to not mislead people about what you’re offering.

martinkunev 7 Aug 2025 21:58 UTC
1 point
0
in reply to: Shankar Sivarajan’s comment on: Open weights == Closed source
If I vibe-coded an app prompting, say, Claude, and released it along with the generated code, would you have the same objections to me calling it “open source,”
No, because I don’t think this misleads people. Granted, the term “open source” is fuzzy at the boundaries. Should we use the term? I don’t know, but if we do, it only makes sense if it means something different from “closed source”.
wrong in suggesting they prefer to work with the model by editing the training data and “recompiling” instead of starting with the weights
One doesn’t exclude the other. If you’re creating v2 of your model, you’d likely: take the training code and data for v1; make some changes / add new things; run the new training code on the new data. For minor changes you may prefer to do fine-tuning on the weights.

martinkunev 7 Aug 2025 20:16 UTC
1 point
4
in reply to: Zac Hatfield-Dodds’s comment on: Open weights == Closed source
wildly more expensive
Suppose I write a program and let people download the binary. Can I say “I spent 100k on AWS to compile it, therefore the binary is open source”?
not even modification
Would you say compiling source code from scratch (e.g. for a different platform) is not a modification?
Even if you’re not intending to retrain the model from scratch, simply knowing what the training data is is valuable. Maybe you don’t care about the training data, but somebody else does. I don’t think “I could never possibly make use of the source code / training data” is an argument that a binary / weights is actually open source.
How does open source differ from closed source for you in the case of generative models? If they are the same, why use the term at all?

martinkunev 23 Jun 2025 23:31 UTC
3 points
0
on: Pronouns are Annoying
There is the possibility of misgendering somebody and them taking it seriously. Sometimes it feels like you’re walking in a minefield. It’s not conducive to a good social interaction.
too few pronouns, and communication becomes vague and cumbersome
I’m wondering why languages like finnish can do just fine with “hän” while english needs he/she.

martinkunev 23 Jun 2025 23:21 UTC
1 point
0
in reply to: ymeskhout’s comment on: Pronouns are Annoying
French to English you always translate as “you”. You probably mean translating from English to French where you need to make a judgement whether to use “vous” or “tu”.

martinkunev 13 Jun 2025 13:24 UTC
1 point
0
in reply to: Kerrigan’s comment on: Understanding Agent Preferences
I think of preferences as a description of agent behavior, which means the preferences changed.
When you say “got better at achieving it’s preference” I suppose you’re thinking of preference as some goal the agent is pursuing. I find this view (assuming goal directedness) less general in its ability to describe agent behavior. It may be more useful, but if so I think we need to justify it better. I don’t exclude the possibility that there is a piece of information I don’t know about.
Goal-directedness leads toward instrumental convergence and away from corrigibility. If we are looking to solve corrigibility, I think it’s worth it to question goal-directedness.