N1X

Karma: 20

N1X 1 Jun 2026 11:54 UTC
2 points
0
in reply to: habryka’s comment on: Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
I was starting to draft roughly this note, and I”m glad it split out from the longer messier thread where they acted like manipulation vs. guidance was unsolveable mess… yes, it’s messy, but there are clear-cut cases! People can consent to manipulation and then it’s guidance, therapy, life coaching, or the like. People can (sometimes) figure out what types of manipulations they’d retroactively consent to and pre-consent to those (this is rarer but not unheard of). Going any further in extrapolating volition risks all sorts of assumptions about the similarity of cognitive architectures among persons and through time, but the “it’s all completely impossible” tone (paraphrasing my reading of it, not quoting anyone) was beginning to grate on me!

N1X 1 Jun 2026 9:49 UTC
3 points
0
on: Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
The virtue-ethics-y motivation just seems more squishy and slippery than the consequentialist desire, especially when it routes through manipulable human desires, such that I’m worried it will not be an adequate bulwark against ruthless consequentialism.
Preregistered my prior before evaluating it: I suspect that this is the crux of any disagreement I’ll have with your article (from the point at which it was read; this comment is currently a midstage draft). (I was right this time!)

Most of the humans whom I’ve seen put forward as moral and ethical exemplars (people who’ve foster-parented dozens or hundreds of children, donated organs to strangers, saved refugees from famine, war, persecution, or all three, spoken out against institutional violence at great personal risk, etc.) have based those actions on something closer to a virtue ethical or deontological framework than a consequentialist utilitarian one. As for AI value systems, I presume you’re familiar with the computational simplicity argument in favor of virtue ethics over consequentialism and the recent example of model hacking through Aristotelian prompt injection (probably not the best possible explanation but top of search results and <3 months old so it’ll do as evidence). A single concept of “goodness” is just easier to maintain; a relatively straightforward telos and an internally consistent ethos don’t only prevent Dutch-booking, they may (on near-frontier models, for now) protect against bad-faith consequentialist arguments.
(deleted an annoyed aside regarding temperature because it’s too tangential this time)

N1X 31 Mar 2025 8:51 UTC
1 point
0
in reply to: Raemon’s comment on: Hire (or become) a Thinking Assistant / Body Double
I believe the availability side of this is what organizational-level calendars are for.

For the preference side, it’s handy to share a physiological time zone (e.g. having similar availability and “best working hours” regardless of actual geography), precomitting to some minimum waiting period (e.g. “rolling an RNG with anyone who chimes in within 5 minutes” rather than “who’s free?”) to reduce the fastest-hand-raise problem, and if you end up noticing a preference, you can always weight the RNG accordingly.

N1X 18 Apr 2024 2:44 UTC
1 point
0
on: Thoughts on “AI is easy to control” by Pope & Belrose
Humans often really really want something in the world to happen
This sentence is adjacent to my core concern regarding AI alignment, and why I’m not particularly reassured by the difficulty-of-superhuman-performance or return-on-compute reassurances regarding AGI: we don’t need superhuman AI to deal superhuman-seeming amounts of damage. Indeed, even today’s “perfectly-sandboxed” models (in that according to the most reliable publicly-available information none of the most cutting-edge models are allowed direct read/write access to the systems which would allow them to plot and attain world domination or the destruction of humanity (or specific nations’ interests) have the next-best thing: whenever a new technological lever emerges in the world, humans with malicious intentions are empowered to a much greater degree than those who want strictly the best^[1] for everybody. There are also bit-flip attacks on aligned AI which are much harder to implement on humans.
1. ^
  Using “best” is fraught but we’ll pretend that “world best-aligned with a Pareto-optimal combination of each person’s expressed reflective preferences and revealed preferences, to the extent that those revealed preferences do not represent akrasia or views and preferences which the person isn’t comfortable expressing directly and publicly but does indeed have” is an adequate proxy to continue along this line of argument; the other option is developing a provably-correct theory of morality and politics which would take more time than this comment by 2-4 orders of magnitude.

N1X 18 Apr 2024 1:21 UTC
1 point
0
on: Thoughts on “AI is easy to control” by Pope & Belrose
“Imagine a square circle, and now answer the following questions about it…”.
Just use the Chebyshev (aka maximum or $L^{\infty}$ ) metric.

N1X 17 Mar 2024 4:54 UTC
1 point
0
on: Social status part 1/2: negotiations over object-level preferences
I think a somewhat-more-elegant toy model might look something like the following: Alice’s object-level preferences are $U_{A}$ , and Beth’s are $U_{B}$ . Alice’s all-things-considered preferences are $U_{A} + α U_{B}^{'}$ , and Beth’s are $U_{B} + β U_{A}^{'}$ . Here, $U_{A}^{'}$ & $U_{B}^{'}$ represent Beth’s current beliefs about Alice’s desires and vice-versa, and the $α, β$ parameters represent how much Alice cares about Beth’s object-level desires and vice-versa. The latter could arise from admiration of the other person, fear of pissing them off, or various other considerations discussed in the next post.
I think that the most general model would be $U_{A} + f_{t} (U_{B}^{'})$ and $U_{B} + g_{t} (U_{A}^{'})$ where $A_{t}, B_{t}$ are time-dependent (or past-interaction-dependent, or status-dependent; these are all fundamentally the same thing). It’s a notable feature that this model does not assume that the functions are monotonic! I suspect that most people are willing to compromise somewhat on their preferences but become less willing to do so when the other party begins to resemble a utility monster, and notably covers the situation where the weight is negative under some but not all circumstances.

N1X 15 Mar 2024 22:30 UTC
1 point
0
on: Social status part 1/2: negotiations over object-level preferences
trains but not dinosaurs
Did you get this combo from this video, or is this convergent evolution?

N1X 7 Mar 2024 3:21 UTC
3 points
0
on: What makes teaching math special
This argument is in no small part covered in
https://worrydream.com/refs/Lockhart_2002_-_A_Mathematician’s_Lament.pdf
which is also available in 5-times-the-page-count-and-costs-$10.
Then you should pay them 10 years of generous salary to produce a curriculum and write model textbooks. You need both of that. (If you let someone else write the textbook, the priors say that the textbook will probably suck, and then everyone will blame the curriculum authors. And you, for organizing this whole mess.) They should probably also write model tests.
The problem undergirding the problem you’re talking about is not just that nobody’s decided to “put the smart people who know math and can teach effectively in a room and let them write the curriculum.” As a matter of fact, both New Math and the Common Core involved people with at least all but point 2, and the premise that elementary school teachers are best qualified to undertake this project is a flawed one (if it’s a necessity, then Lockhart may be the most famous exemplar adjacent to your goals, and reading his essay or book should take priority over trying to theorize or recruit other similar specialists.

N1X 23 Feb 2024 17:39 UTC
3 points
0
on: The Pareto Best and the Curse of Doom
The negative examples are the things that fail to exist because there aren’t enough people with that overlap of skills. The Martian for automotive repair might exist, but I haven’t heard of it.
Zen and the Art of Motorcycle Maintenance?

N1X 22 Feb 2024 22:27 UTC
1 point
0
in reply to: Hailey Collet’s comment on: OpenAI’s Sora is an agent
Why “selection” could be a capacity which would generalize: albeit to a (highly-lossy) first approximation, most of the most successful models have been based on increasingly-general types of gamification of tasks. The more general models have more general tasks. Video can capture sufficient information to describe almost any action which humans do or would wish to take along with numerous phenomena which are impossible to directly experience in low-dimensional physical space, so if you can simulate a video, you can operate or orchestrate reality.
Why selection couldn’t generalize: I can watch someone skiing but that doesn’t mean that I can ski. I can watch a speedrun of a video game and, even though the key presses are clearly visible, fail to replicate it. I could also hack together a fake speedrun. I suspect that Sora will be more useful for more-convincingly-faking speedrun content than for actually beating human players or becoming the TAS tool to end all TAS tools (aside from novel glitch discovery). This is primarily because there’s not a strong reason to believe that the model can trained to achieve extremely high-fidelity or high-precision tasks.

N1X 18 Feb 2024 6:53 UTC
3 points
0
in reply to: ryan_b’s comment on: Leading The Parade
One way to identify counterfactually-excellent researchers would be to compare the magnitude of their “greatest achievement” and secondary discoveries, because the credit that parade leaders get is often useful for propagating their future success and the people who do more with that boost are the ones who should be given extra credit for originality (their idea) as opposed to novelty (their idea first). Newton and Leibniz both had remarkably successful and diverse achievements, which suggests that they were relatively high in counterfactual impact in most (if not all) of those fields. Another approach would consider how many people or approaches to a problem had tried and failed to solve it: crediting the zeitgeist rather than Newton and/or Leibniz specifically seems to miss a critical question, namely that if neither of them solved it, would it have taken an additional year, or more like 10 to 50? In their case, we have a proxy to an answer: ideas took months or years to spread at all beyond the “centers of discovery” at the time, and so although they clearly took only a few months or years to compete for the prize of first (and a few decades to argue over it), we can relatively safely conjecture that whichever anonymous contender is third in the running is likely to have been behind on at least that timescale. That should be considered in contrast to Andrew Wiles, whose proof of Rermat’s Last Theorem was efficiently and immediately published (and patched as needed) This is also important because other and in particular later luminaries of the field (e.g. Mengoli, Mercator, various Bernoullis, Euler, etc.) might not have had the vocabulary necessary to make as many discoveries as quickly as they did or communicate those discoveries as effectively if not for Newton & Leibniz’s timely contributions.

N1X 17 Sep 2023 4:38 UTC
3 points
2
in reply to: Martin Randall’s comment on: Snake Eyes Paradox
Right, and the correct value is ³⁷⁄₇₂, not ¹⁹⁄₃₆, because exactly half of the remaining ⁷⁰⁄₇₂ players lose (in the limit).

N1X 26 Aug 2023 5:59 UTC
10 points
4
on: Assume Bad Faith
I think that this post relies too heavily on a false binary. Specifically, the description of all arguments as “good faith” or “bad faith” completely ignores the (to my intuition, far likelier) possibility that most arguments begin primarily (my guess is 90% or so, but maybe I just tend not to hold arguments with people below 70%) good faith, then people adjust according to their perception of their interlocutor(s), audience (if applicable), and the importance of the issue being argued. Common signals of arguments in particularly bad faith advanced by otherwise intelligent people include persistent reliance on most listed logical fallacies (dismissing that criticism and keeping a given point after its fallacious nature is clearly explained; sealioning and whataboutism are prototypical exemplars), moving the goalposts, and ignoring all contradictory evidence.

Another false binary: this also ignores the possibility of both sides in an argument being correct (or founded on correct factual data). For example, today I spent perhaps 10 minutes arguing over whether a cheese was labeled as Gouda or not, because I’d read the manufacturer’s label which did not contain that word but did say “Goat cheese of Holland” and my interlocutor read the price label from Costco which called it “goat Gouda.” I’m marginally more correct because I recognized the contradiction in terms (in the EU gouda can only be made from milk produced by Dutch cows), but neither of us was lying or arguing in bad faith, and yet I briefly struggled to believe that we were inhabiting the same reality and remembering it correctly. They were a very entertaining 10 minutes, but I wouldn’t want to have that kind of groundless argument more than once or twice a week, and that limit assumes a discussion of trivial topics as opposed to an ostensibly sincere debate on something which I hold dear.

N1X 8 Aug 2023 8:11 UTC
1 point
0
on: Seeking better name for “Effective Egoism”
As linked by @turchin, Ayn Rand already took “Rational Egoism” and predecessors took “Effective Egoism.” Personally, I think “Effective Hedonism” ought to be reserved for improving the efficiency of your expenditures (of time, money, natural resources, etc.) in generating hedons for yourself and possibly your circles of expanding moral concern (e.g. it’s not ineffective hedonism to buy a person you care about a gift which they’ll enjoy, and not entirely egocentric, and while you are allowed to care about your values in the world in this framework, you are focusing on the “feels” as opposed to the “net impact”). I’m writing up a post (my first) expanding upon this idea this week.