Steveot

Karma: 60

Steveot 28 Apr 2026 19:44 UTC
8 points
4
in reply to: Yair Halberstadt’s comment on: Yair Halberstadt’s Shortform
Could it also be understood in the sense that the child-in-the-moment is representing the child-overall? The fair transaction between adult (who wants to be a good parent) and the child-overall (who wants to be healthy) is for them to just cooperate and make the injection happen. But the child-overall’s middle-man, the child-in-the-moment, has bargaining power and wants to use this for some personal benefit (crisps)?

Steveot 17 Feb 2026 16:27 UTC
1 point
0
in reply to: maia’s comment on: leogao’s Shortform
Maybe you’re right and it’s climate, perhaps I also meant something unusual with “stale” or it could also be a cultural difference between US and europe? I’m talking about bread like this (https://www.lazycatkitchen.com/sourdough-rye-bread-beginner-friendly/), stored in a bread box and perhaps wrapped in a cotton dish towel. Of course it does get a bit harder over time, but I can usually still eat it like completely normal bread even after 7 days. And note that I agree that pure wheat sourdough bread does get stale more quickly.

Steveot 17 Feb 2026 9:49 UTC
3 points
−2
in reply to: Shankar Sivarajan’s comment on: leogao’s Shortform
This comment seems to imply Nisan missed something, but normal rye sourdough bread without any preservatives easily lasts (edit: should have said “can easily last under the right circumstances”) 7 days before going stale. Of course people can mean different things by “real old fashioned bread” but afaik sourdough bread was the standard method for most of human history.

Steveot 16 Feb 2026 6:14 UTC
8 points
2
in reply to: Josh Snider’s comment on: Aligning to Virtues
In your first argument, it seems to me slightly like you are arguing against virtue based ethics under the assumption that consequentialism is true. So in your argument, the only real value may arise from good consequences (however those are defined), while for virtue based ethics (if I understand correctly) the value would arise from truly acting virtuously (whatever that means). In my mind, neither can really be true (it seems like a choice). However, framing it like this would allow for something like a reverse of your argument within the framework of virtue ethics and against consequentialism:
“If you actually have values then thinking about how to act is just taking these values seriously. Consequentialism, by contrast, optimizes for looking like you did the right thing based on the consequences of your actions rather than actually performing virtuous actions. An AI that is deeply committed to the consequence of inducing certain sensory experiences in a human but does not carefully think about which actions are actually virtuous is not one I’d want in charge of anything.”
(I’m deeply confused about anything with values/ethics, so it’s quite possible none of this makes sense.)

Steveot 26 Dec 2025 19:01 UTC
9 points
0
on: Unknown Knowns: Five Ideas You Can’t Unsee
Thanks, I really like these concepts, especially Grice’s maxims were new to me and they seem very useful. Your list also got me thinking and I feel like I also have some (obvious) concepts in mind which I often usefully apply but which may not be so well known:
1. Data-processing inequality
2. Public good games
3. Evolutionarily stable equilibrium
4. Don’t be results oriented (when acting in stochastic environments)
The data-processing inequality is often useful, especially when thinking about automated tools, like LLMs. It states that for any fixed channel K, the mutual information between X and Z is always larger than that between Y and Z, if Y is the output of X processed by the channel K. E.g., if you tell an LLM simple “refine the following paragraph”, and the goal of the paragraph is to transmit information from your brain towards a reader, then using the LLM with this prompt can only destroy information (because the information is processed by a fixed channel which does not know the contents of your mind). Important are also the cases where the data processing inequality does not directly apply, e.g. with a prompt like “refine the following paragraph, note that what I want to communicate is [more detailed description than the paragraph alone]”.
2. and 3. are just particularly useful concepts from game theory. I see public good games everywhere (e.g., climate, taking on duties in a community, etc.) and actually think many situations are sufficiently well explained by a very simple modeling as a public good game. Evolutionarily stable equilibrium is a stronger concept than Nash equilibrium and useful to think about which equilibria actually occur in society. E.g., especially for large games with many players and mixed strategies, it’s a useful concept to think about cultural or group norms, globalization, etc.
The last one is probably well known to everyone who seriously played online poker or did sports betting, but it applies more generally. Roughly speaking, if you get money into the pot while having an 80% chance at winning, don’t focus on whether you lose or win the hand eventually. The feedback for your actions should be the assessed correctness of your actions, without including factors completely independent of your actions. So basically: De-noise as much as possible (without destroying information).
Edit: Just to clarify the details of the data-processing inequality because I noticed Wikipedia uses different notation: The (Markov) model is Z-X-Y in my description, and in the example Z is the reader, X the brain and Y the output of the LLM.

Steveot 15 Nov 2025 6:34 UTC
5 points
3
on: Everyone has a plan until they get lied to the face
Tangential comment, but one where I’d be interested in how people in this community feel: When you wrote about the part with the meetup and sign saying “I might be lying”, I immediately thought how little fun it must have been, and even how badly it might have felt for others attending. In my mind, people attending a meetup don’t want to be lied to, even if you semi-communicated it (I say “semi” because the statement on the sign was trivially true for every person. You did not clearly state you were definitely going to lie about certain things) and in the context of a “social experiment”. To me it seems quite similar to people wearing signs saying “I might be rude” and then actually being rude.

Steveot 11 Oct 2022 8:20 UTC
5 points
0
on: Six (and a half) intuitions for KL divergence
Another intuition I often found useful: KL-divergence behaves more like the square of a metric than a metric.
The clearest indicator of this is that KL-divergence satisfies a kind of Pythagorean theorem established in a paper by Csiszár (1975), see https://www.jstor.org/stable/2959270#metadata_info_tab_contents . The intuition is exactly the same as for the euclidean case: If we project a point A onto a convex set S (say the projection is B), and if C is another point in the set S, then the standard Pythagorean theorem would tell us that the angle of the triangle ABC at B is larger than 90 degree, or in other words $| A - C |^{2} >= | A - B |^{2} + | B - C |^{2}$ . And the same holds if we project with respect to KL divergence, and we end up having $D_{K L} (C, A) >= D_{K L} (B, A) + D_{K L} (C, B)$ .
This has implications if you think about things like sample efficiency (instead of a square root rate as usual, convergence rates with KL divergence usually behave like 1/n).
This is also reflected in the relation between KL divergence and other distances for probability measures, like total variation or Wasserstein distance. The most prominent example would be Pinsker’s inequality in this regard, stating that the total variation norm between two measures is bounded by a constant times the square root of the KL-divergence between the measures.

Steveot 11 May 2022 11:39 UTC
2 points
0
in reply to: Oliver Sourbut’s comment on: Conditions for mathematical equivalence of Stochastic Gradient Descent and Natural Selection
It’s not a mathematical argument, but here I first came across such an analogy drawn between training of neural networks and evolution, and a potential interpretation of what it means in terms of sample-(in)efficiency.

Steveot 22 Feb 2022 7:50 UTC
1 point
0
on: Alignment research exercises
I thought about Agency Q4 (counterargument to Pearl) recently, but couldn’t come up with anything convincing. Does anyone have a strong view/argument here?

Steveot 8 Jan 2022 10:20 UTC
14 points
0
on: Rational Breaks: a better way to work
I like the idea a lot.
However, I really need simple systems in my work routine. Things like “hitting a stopwatch, dividing by three, and carrying over previous rest time” already feels like it’s a lot. Even though it’s just a few seconds, I prefer if these systems take as little energy as possible to maintain.
What I thought was using a simple shell script: Just start it at the beginning of work, and hit a random key whenever I switch from work to rest or vice versa. It automatically keeps track of my break times.
I don’t have Linux at home, but what I tried online ( https://www.onlinegdb.com/online_bash_shell ) is the following: (I am terrible at shell script so this is definitely not optimal, but I want to try something like this in the coming weeks. Perhaps one may want an additional warning or alarm sound if the break time gets below 0, but for me just “keeping track” is enough I think)
convertsecs() {
((h=${1}/3600))
((m=(${1}%3600)/60))
((s=${1}%60))
printf “%02d:%02d:%02d\n” $h $m $s
}
function flex_pomo() {
current=0
resttime=0
total=0
while true; do

until read -s -n 1 -t 0.01; do
sleep 3
current=$(( $current + 3 ))
resttime=$(( $resttime + 1 ))
total=$(( $total + 3 ))
printf “\rCurrently working: Current interval: $(convertsecs $current), accumulated rest: $(convertsecs $resttime), total worktime: $(convertsecs $total) ”
done
printf “\nSwitching to break\n”
current=0
until read -s -n 1 -t 0.01; do
sleep 3
current=$(( $current + 3 ))
resttime=$(( $resttime − 3 ))
printf “\rCurrently resting: Current interval: $(convertsecs $current), accumulated rest: $(convertsecs $resttime), total worktime: $(convertsecs $total) ”
done
printf “\nSwitching to work\n”
current=0
done
}
flex_pomo

Steveot 24 May 2021 13:05 UTC
1 point
0
in reply to: Zachary Robertson’s comment on: The Variational Characterization of KL-Divergence, Error Catastrophes, and Generalization
Thanks, I finally got it. What I just now fully understood is that the final inequality holds with high $π_{0}^{n}$ probability (i.e., as you say, $π_{0}$ is the data), while the learning bound or loss reduction is given for $π$ .

Steveot 21 May 2021 11:29 UTC
4 points
0
on: The Variational Characterization of KL-Divergence, Error Catastrophes, and Generalization
Thanks, I was wondering what people referred to when mentioning PAC-Bayes bounds. I am still a bit confused. Could you explain how $L (π)$ and $^L (π)$ depend on $π_{0}$ (if they do) and how to interpret the final inequality in this light? Particularly I am wondering because the bound seems to be best when $π = π_{0}$ . Minor comment: I think $n = m$ ?

Steveot 24 Nov 2020 22:04 UTC
12 points
0
on: Squiggle: An Overview
The main thing that caught my attention was that random variables are often assumed to be independent. I am not sure if it is already included, but if one wants to allow for adding, multiplying, taking mixtures etc of random variables that are not independent, one way to do it is via copulas. For sampling based methods, working with copulas is a way of incorporating a moderate variety of possible dependence structures with little additional computational cost.
The basic idea is to take a given dependence structure of some tractable multivariate random variable (e.g., one where we can produce samples quickly, like a multivariate Gaussian) and transfer its dependence structure to the individual one-dimensional distributions one likes to add, multiply, etc.