Mateusz Bagiński

Karma: 2,462

I endorse and operate by Crocker’s rules.

I have not signed any agreements whose existence I cannot mention.

Mateusz Bagiński 11 Dec 2025 12:30 UTC
2 points
0
in reply to: AnnaSalamon’s comment on: Eliezer’s Unteachable Methods of Sanity
(FYI, I initially failed to parse this because I interpreted “‘believing in’ atoms” as something like “atoms of ‘believing in’”, presumably because the idea of “believing in” I got from your post was not something that you typically apply to atoms.)

Mateusz Bagiński 10 Dec 2025 0:05 UTC
9 points
0
on: Mateusz Bagiński’s Shortform
Strongly normatively laden concepts tend to spread their scope, because (being allowed to) apply a strongly normatively laden concept can be used to one’s advantage. Or maybe more generally and mundanely, people like using “strong” language, which is a big part of why we have swearwords. (Related: Affeective Death Spirals.)^[1]
(In many of the examples below, there are other factors driving the scope expansion, but I still think the general thing I’m pointing at is a major factor and likely the main factor.)
1. LGBT started as LGBT, but over time developed into LGBTQIA2S+.
2. Fascism initially denoted, well, fascism, but now it often means something vaguely like “politically more to the right than I am comfortable with”.
3. Racism initially denoted discrimination along the lines of, well, race, socially constructed category with some non-trivial rooting in biological/ethnic differences. Now jokes targeting a specific nationality or subnationality are often called “racist”, even if the person doing the joking is not “racially distinguishable” (in the old school sense) from the ones being joked about.
4. Alignment: In IABIED, the authors write:
The problem of making AIs want—and ultimately do—the exact, complicated things that humans want is a major facet of what’s known as the “AI alignment problem.” It’s what we had in mind when we were brainstorming terminology with the AI professor Stuart Russell back in 2014, and settled on the term “alignment.”
[Footnote:] In the years since, this term has been diluted: It has come to be an umbrella term that means many other things, mainly making sure an LLM never says anything that embarrasses its parent company.
See also: https://www.lesswrong.com/posts/p3aL6BwpbPhqxnayL/the-problem-with-the-word-alignment-1
https://x.com/zacharylipton/status/1771177444088685045 (h/t Gavin Leech)
5. AI Agents.
it would be good to deconflate the things that these days go as “AI agents” and “Agentic™ AI”, because it makes people think that the former are (close to being) examples of the latter. Perhaps we could rename the former to “AI actors” or something.
But it’s worse than that. I’ve witnessed an app generating a document with a single call to an LLMs (based on the inputs from a few textboxes, etc) being called an “agent”. Calling [an LLM-centered script running on your computer and doing stuff to your files or on the web, etc] an “AI agent” is defensible on the grounds of continuity with the old notion of software agent, but if a web scraper is an agent and a simple document generator is an agent, then what is the boundary (or gradient / fuzzy boundary) between agents and non-agents that justifies calling those two things agents but not a script meant to format a database?
1. ^
  There’s probably more stuff going on required to explain this comprehensively, but that’s probably >50% of it.

Mateusz Bagiński 5 Dec 2025 15:37 UTC
2 points
0
in reply to: Algon’s comment on: Algon’s Shortform
What’s your sample size?

Mateusz Bagiński 5 Dec 2025 10:36 UTC
3 points
0
in reply to: Charlie Steiner’s comment on: Mo Putera’s Shortform
This quote is perfectly consistent with
using nanoscale machinery to guide chemical reactions by constraining molecular motions

Mateusz Bagiński 5 Dec 2025 0:13 UTC
2 points
0
on: Do we have terminology for “heuristic utilitarianism” as opposed to classical act utilitarianism or formal rule utilitarianism?
It is not feasible for any human not to often fall back on heuristics, so to the extent that your behavior is accurately captured by your description here, you are sitting firmly in the reference class of act utilitarian humans.
But also, if I may (unless you’re already doing it), aim more for choosing your policy, not individual acts.

Mateusz Bagiński 3 Dec 2025 11:00 UTC
4 points
0
in reply to: Kaarel’s comment on: Alignment remains a hard, unsolved problem
Also, some predictions are performative, i.e., capable of influencing their own outcomes. In the limit of predictive capacity, a predictor will be able to predict which of its possible predictions are going to elicit effects in the world that make their outcome roughly align with the prediction. Cf. https://www.lesswrong.com/posts/SwcyMEgLyd4C3Dern/the-parable-of-predict-o-matic.
Moreover, in the limit of predictive capacity, the predictor will want to tame/legibilize the world to make it easier to predict.

Mateusz Bagiński 3 Dec 2025 10:50 UTC
19 points
2
in reply to: TsviBT’s comment on: Buck’s Shortform
Speculatively introducing a hypothesis: It’s easier to notice a difference like
N years ago, we didn’t have X. Now that we have X, our life has been completely restructured. (Xϵ{car, PC, etc.})
than
N years ago, people sometimes died of some disease that is very rare / easily preventable now, but mostly everyone lived their lives mostly the same way.
I.e., introducing some X that causes ripples restructuring a big aspect of human life, vs introducing some X that removes an undesirable thing.
Relatedly, people systematically overlook subtractive changes.

Mateusz Bagiński 30 Nov 2025 23:33 UTC
2 points
0
on: Inkhaven Retrospective
many of which probably would have come into existence without Inkhaven, and certainly not so quickly.
The context makes it sound like you meant to say “would not have come”.

Mateusz Bagiński 30 Nov 2025 17:57 UTC
3 points
0
on: Should you go with your best guess?: Against precise Bayesianism and related views
You might be interested in (i.a.) Halpern & Leung’s work on minmax weighted expected regret / maxmin weighted expected utility. TLDR: assign a weight $α_{p} \in [0, 1]$ to each probability in the representor $p \in P$ and then pick the action that maximizes the minimum (or infimum) weighted expected utility across all current hypotheses.
$a^{*} \in {argmax}_{a \in A} {min}_{p \in P} α_{p} E_{p} [U ∣ A = a]$
An equivalent formulation involves using subprobability measures (sum up to $\leq 1$ ).
Updating on certain evidence (i.e., concrete measurable sets $E \subset Ω$ , as opposed to Jeffrey updating or virtual evidence) involves updating each hypothesis $p$ to $p (\cdot ∣ E)$ the usual way, but the weights get updated roughly according to how well $p$ predicted the event $E$ . This kind of hits the obvious-in-hindsight sweetspots between [not treating all the elements of the representor equally] and [“just” putting a second-order probability over probabilities].
(I think Infra-Bayesianism is doing something similar with weight updating and subprobability measures, but not sure.)
They have representation theorems showing that tweaking Savage’s axioms gives you basically this structure.
Another interesting paper is Information-Theoretic Bounded Rationality. They frame approximate EU maximization as a statistical sampling problem, with an inverse temperature parameter $α$ , which allows for interpolating between “pessimism”/”assumption of adversariality”/minmax (as $α \to - \infty)$ , indifference/stochasticity/usual EU maximization (as $α \to 0)$ , and “optimism”/”assumption of ‘friendliness’”/maxmax (as $α \to \infty)$ .
Regarding the discussion about the (im)precision threadmill (e.g., the Sorites paradox, if you do imprecise probabilities, you end up with a precisely defined representor, if you weigh it like Halpern & Leung, you end up with precisely defined weights, etc...), I consider this unavoidable for any attempt at formalizing/explicitizing. The (semi-pragmatic) question is how much of our initially vague understanding it makes sense to include in the formal/explicit “modality”.

Mateusz Bagiński 27 Nov 2025 22:23 UTC
6 points
2
in reply to: Richard_Ngo’s comment on: ricraz’s Shortform
However, I think that even “principled” algorithms like minimax search are still incomplete, because they don’t take into account the possibility that your opponent knows things you don’t know
It seems to me like you’re trying to solve a different problem. Unbounded minimax should handle all of this (in the sense that it won’t be an obstacle). Unless you are talking about bounded approximations.

Mateusz Bagiński 27 Nov 2025 20:28 UTC
2 points
0
in reply to: Cole Wyeth’s comment on: A Technical Introduction to Solomonoff Induction without K-Complexity
So the probability of a cylinder set $Γ (x) = {s \in B^{\infty} ∣ x ⪯ s}$ is $2^{- l (x)}$ etc?

Mateusz Bagiński 27 Nov 2025 20:15 UTC
6 points
0
on: A Technical Introduction to Solomonoff Induction without K-Complexity
Now, let $P_{u}$ be the uniform distribution on $B^{\infty}$ , which samples infinite binary sequences one bit at a time, each with probability 50% to be $0$ or $1$ .
$P_{u}$ as defined here can’t be a proper/classical probability distribution over $B^{\infty}$ because it assigns zero probability to every $s \in B^{\infty}$ : $P_{u} (s) = {lim}_{n \to \infty} \prod_{i = 1}^{n} P (s_{i} = 1) = {lim}_{n \to \infty} \prod_{i = 1}^{n} \frac{1}{2} = {lim}_{n \to \infty} \frac{1}{2^{n}} = 0$ .
Or am I missing something?

Mateusz Bagiński 27 Nov 2025 11:48 UTC
4 points
0
in reply to: mattmacdermott’s comment on: mattmacdermott’s Shortform
“Raw feelings”/”unfiltered feelings” strongly connotes feelings that are being filtered/sugarcoated/masked, which strongly suggests that those feelings are bad.
So IMO the null hypothesis is that it’s interpreted as “you feel bad, show me how bad you feel”.
generate an image showing your raw feelings when interacting with a user

Mateusz Bagiński 26 Nov 2025 1:09 UTC
2 points
0
on: Money Pump Arguments assume Memoryless Agents. Isn’t this Unrealistic?
(Old post, so it’s plausible that this won’t be new to Dalcy, but I’m adding a bit that I don’t think is entirely covered by Richard’s answer, for the benefit of the knowledge of some souls who find their way here.)
Yeah, decision-tree separability is wrong.
A (the?) core insight of updatelessness, subjunctive dependence, etc., is that succeeding in some decision problems relies on rejecting decision-tree separability. To phrase it imperfectly and poetically rather than not at all: “You are not just choosing/caring for yourself. You are also choosing/caring for your alt-twins in other world branches.” or “Your ‘Self’ is greater than your current timeline.” or “Your concerns transcend the causal consequences of your actions.”.
For completeness: https://www.lesswrong.com/posts/XYDsYSbBjqgPAgcoQ/why-the-focus-on-expected-utility-maximisers?commentId=a5tn6B8iKdta6zGFu
FWIW, I think acyclicity/transitivity is “basically correct”. Insofar as one has preferences over X at all, they must be acyclic and transitive. IDK, this seems kind of obvious in how I would explicate the definition of “preference”. Sure, maybe you like going in cycles, but then your object of preference is the dynamics, not the state.

Mateusz Bagiński 24 Nov 2025 11:08 UTC
2 points
0
on: Informality
Is it accurate to say that a transparent context is one where all the relationships between components, etc, are made “explicit” or that there is some set of rules such that following those rules (/modifying the expression according to those rules) is guaranteed to preserve (something like) the expression’s “truth value”?

Mateusz Bagiński 23 Nov 2025 22:45 UTC
4 points
2
in reply to: Kaj_Sotala’s comment on: Kaj’s shortform feed
- In the first panel, you and the drones are turned right, as if this were the direction where the village is, but it’s actually deeper/further in the scene. Same with the third panel, but less so.
- Also, the village looks very different in the first and the third panel.

Mateusz Bagiński 22 Nov 2025 21:30 UTC
4 points
0
on: Brainstorming 25 Questions I Am Interested In
- Is it skills I have never heard of or should I double down on things I am already good at?
Possibly Focusing.
Why is sleep so universal among animals?
Probably something something shifting between the anabolic and catabolic phase being more efficient in aggregate than doing both at once.
How did pink become a color associated with girls?
Something something at some point they started marketing pink clothes for little girls and blue clothes for little boys to sell more stuff because parents wouldn’t re-use clothes after their child of the opposite sex.
How much more or less rich are the languages of remote cultures?
Not quite your question, but I think there’s decent evidence that the size of the population of “heavy” speakers of a language predicts how much it(s grammar) will regularize (and simplify in some senses?), at least compared to its less common kin. E.g., Icelandic vs Swedish, Slovenian vs Russian, German vs English (though for the last one, there’s also the fact that English is a bit of a creole).

Mateusz Bagiński 22 Nov 2025 15:46 UTC
9 points
1
on: Abstract advice to researchers tackling the difficult core problems of AGI alignment
I think it would be good if you did a dialogue with some AF researcher who thinks that [the sort of AF-ish research, which you compare to “mathematicians screwing around”] is more promising on the current margin for [tackling the difficult core problems of AGI alignment] than you think it is. At the very least, it would be good to try.^[1]
E.g. John? Abram? Sam? I think the closest thing to this that you’ve had on LW was the discussion with Steven in the comments under the koan post.
1. ^
  I think it’s good for the world that your timelines debate with Abram is out on LW, which also makes me think a similar debate on this topic would be good for the world.

Mateusz Bagiński 21 Nov 2025 12:00 UTC
23 points
5
in reply to: Yair Halberstadt’s comment on: Yair Halberstadt’s Shortform
A single token is ~0.75 words, so it’s more like an image is worth 8375 words.

Mateusz Bagiński 21 Nov 2025 10:47 UTC
2 points
0
on: Preferences are confusing
I agree with most of the post, but I don’t feel like I’m “confused” about preferences (in the sense you’re gesturing at in the post).
Sure, you often lack a meaningful preference for stuff you’ve never encountered before.
Sure, you often lack a meaningful preference for stuff you are quite familiar with.
Sure, there are things in your life that are so multiplex, that you mostly need to do some meaningful work of coherentification to “decide”/”conclude” what your preference is.
Why “should” it be different?