Viktor Rehnberg

Karma: 173

Viktor Rehnberg 14 Aug 2025 10:14 UTC
1 point
0
on: Social Dark Matter
Mostly unrelated to the content of the post, but looking at the distributions in this image

this reminds me quite a lot of the anecdote about a Poincaré and the baker.

The anecdote goes:

“[...] Poincaré, who made a habit of picking up a loaf of bread each day, noticed after weighing his loaves that they averaged about 950 grams instead of the 1000 grams advertised. He complained to the authorities and afterwards received bigger loaves. Still he had a hunch that something about his bread wasn’t kosher. And so with the patience only a famous—or at least tenured—scholar can afford, he carefully weighed his bread every day for the next year. Though his bread now averaged closer to 1000 grams, if the baker had been honestly handing him random loaves the number of loaves heavier and lighter than the mean should [...] have diminished following the bell shaped pattern of the error law. Instead, Poincaré found that there were too few light loaves and a surplus of heavy ones.”—The Drunkards Walk pp 155-156, Leonard Mlodinow

Now this anecdote is probably false and the exact distribution of a selection from the tale depends on the exact mechanics of the selection effect. I still find useful when thinking of selections from normal distributions.

If something doesn’t look normal then there is probably a dominant factor shaping the distribution (compared to many small which creates the normal shape).

Viktor Rehnberg 27 Mar 2025 14:54 UTC
2 points
0
on: Recent AI model progress feels mostly like bullshit
Another hypothesis: Your description of the task is

the hard parts of application pentesting for LLMs, which are 1. Navigating a real repository of code too large to put in context, 2. Inferring a target application’s security model, and 3. Understanding its implementation deeply enough to learn where that security model is broken.

From METR’s recent investigation on long tasks you would expect current models not to perform well on this.

I doubt a human professional could do the tasks you describe in something close to an hour, so perhaps its just currently too hard and the current improvements don’t make much of a difference for the benchmark, but it might in the future.

Viktor Rehnberg 8 Nov 2024 14:02 UTC
2 points
2
in reply to: L Rudolf L’s comment on: Survival without dignity
(Perhaps you’re thinking of this https://www.lesswrong.com/posts/EKu66pFKDHFYPaZ6q/the-hero-with-a-thousand-chances)

Viktor Rehnberg 8 Apr 2024 7:25 UTC
3 points
0
in reply to: Malentropic Gizmo’s comment on: Should you refuse this bet in Technicolor Sleeping Beauty?
Good formulation. “Given it’s Monday” can have two different meanings:
- you learn that you will only be awoken on Monday, then it’s 50%
- you awake assign ¹⁄₃ probability to each instance and then make the update $P (T | M) = P (M | T) P (T) / P (M) = (1 / 2) (2 / 3) / (2 / 3) = 50 %$
So it turns out to 50 % for both but it wasn’t initially obvious to me that these two ways would have the same result.

Viktor Rehnberg 8 Apr 2024 7:14 UTC
5 points
0
in reply to: Ape in the coat’s comment on: Should you refuse this bet in Technicolor Sleeping Beauty?
I’d say $P (Tail | Wake-up) = 2 / 3$

Viktor Rehnberg 4 Apr 2024 15:43 UTC
5 points
1
on: Should you refuse this bet in Technicolor Sleeping Beauty?
The possible observer instances and their probability are:
- Heads 50 %
  Red room 25 %
  Blue room 25 %
- Tails 50 %
  Red room 50 % (On Monday or Tuesday)
  Blue room 50 % (On Monday or Tuesday)
If I choose a strategy “bet only if blue” (or equivalentely “bet only if red”) then expected value for this strategy is $(- 300) * 0.25 + 200 * 0.5 = 25$ so I choose to follow this strategy.
I don’t remember what halfer and thirder were or what position I consider to be correct.

Viktor Rehnberg 28 Mar 2024 6:24 UTC
1 point
0
on: Was Releasing Claude-3 Net-Negative?
Capabilities leakages don’t really “increase race dynamics”.
Do people actually claim this? Shorter timelines seems like a more reasonable claim to make. To jump directly to impacts on race dynamics is skipping at least one step.

Viktor Rehnberg 19 Sep 2023 17:31 UTC
6 points
0
on: Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust
To me it feels like this policy is missing something that accounts for a big chunk of the risk.

While recursive self-improvement is covered by the “Autonomy and replication” point, there is another risk from actors that don’t intentionally cause large scale harm but use your system to make improvements to their own systems as they don’t follow your RSP. This type of recursive improvement doesn’t seem to be covered by any of “Misuse” or “Autonomy and replication”.

In short it’s about risks due to shortening of timelines.

Viktor Rehnberg 30 May 2023 6:45 UTC
1 point
0
on: How to have Polygenically Screened Children
You can see twin birth rates fell sharply in the late 90s
Shouldn’t this be triplet birthrates? Twin birthrates look pretty stable in comparison.

Viktor Rehnberg 5 Apr 2023 18:02 UTC
1 point
0
in reply to: Caridorc Tergilti’s comment on: Some 2-4-6 problems
Hmm, yeah it’s a bit hard to try stuff when there’s no good preview. Usually I’d recommend rot13 chiffer if all else fails but for number sequences that makes less sense.

Viktor Rehnberg 31 Mar 2023 6:31 UTC
7 points
0
on: Some 2-4-6 problems
I knew about 2-4-6 problem from HPMOR, I really like the opportunity to try it out myself. These are my results on the four other problems:

indexA

Number of guesses:

8 guesses of which 3 were valid and 5 non-valid

Guess:

“A sequence of integers whose sum is non-negative”

Result: Failure

indexB

Number of guesses:

39 of which 23 were valid 16 non-valid

Guess:

“Three ordered real numbers where the absolute difference between neighbouring numbers is decreasing.”

Result: Success

indexC

Number of guesses:

21 of which 15 were valid and 6 non-valid

Guess:

“Any three real numbers whose sum is less than 50.”

Result: Success

indexD

Number of guesses:

16 of which 8 were valid and 8 non-valid

Guess:

“First number is a real number and the other two are integers divisible by 5”

Result: Failure

Performance analysis

I’d say that the main failure modes were that I didn’t do enough tests and I was a very bad number generator. For example, in indexD

I made 9 tests to test my final hypothesis 4 of which were valid, that my guess and the actual rule would give the same result for these 9 tests if I were actually good at randomizing is very small.

I could also say that I was a bit naive on the first test and that I’d grown overconfident after two successive successes for the final test.

Viktor Rehnberg 31 Mar 2023 6:03 UTC
1 point
0
in reply to: Caridorc Tergilti’s comment on: Some 2-4-6 problems
See FAQ for spoiler tags, it seems mods haven’t seen your request. https://www.lesswrong.com/faq#How_do_I_insert_spoiler_protections_

Viktor Rehnberg 23 Feb 2023 19:59 UTC
1 point
0
on: Please don’t throw your mind away
These problems seemed to me similar to the problems at the International Physicist’s Tournament. If you want more problems check out https://iptnet.info

Viktor Rehnberg 17 Feb 2023 9:06 UTC
1 point
0
in reply to: dxu’s comment on: Seeking Power is Convergently Instrumental in a Broad Class of Environments
In case anyone else is looking for a source a good search term is probably the Beal Effect. From the original paper by Beal and Smith:

Once the effect is pointed out, it does not take long to arrive at the conclusion that it arises from a natural correlation between a high branching factor in the game tree and having a winning move available. In other words, mobility (in the sense of having many moves available) is associated with better positions

Viktor Rehnberg 20 Jan 2023 13:19 UTC
1 point
0
in reply to: Viktor Rehnberg’s comment on: finite, actual infinity, potential infinity
Or a counterexample from the other direction would be that you can’t describe a uniform distribution of the empty set either (I think). And that would feel even weirder to call “bigger”.

Viktor Rehnberg 20 Jan 2023 13:15 UTC
2 points
1
on: finite, actual infinity, potential infinity
Why would this property mean that it is “bigger”? You can construct a uniform distribution of a uncountable set through a probability density as well. However, using the same measure on a countably infinite subset of the uncountable set would show that the countable set has measure 0.

Viktor Rehnberg 13 Oct 2022 6:48 UTC
1 point
0
in reply to: cubefox’s comment on: Can you define “utility” in utilitarianism without using words for specific human emotions?
So we have that

[...] Richard Jeffrey is often said to have defended a specific one, namely the ‘news value’ conception of benefit. It is true that news value is a type of value that unambiguously satisfies the desirability axioms.

but at the same time

News value tracks desirability but does not constitute it. Moreover, it does not always track it accurately. Sometimes getting the news that X tells us more than just that X is the case because of the conditions under which we get the news.

And I can see how starting from this you would get that $U (⊤) = 0$ . However, I think one of the remaining confusions is how you would go in the other direction. How can you go from the premise that we shift utilities to be $0$ for tautologies to say that we value something to a large part from how unlikely it is.

And then we also have the desirability axiom

$U (A \lor B) = \frac{P (A) U (A) + P (B) U (B)}{P (A) + P (B)}$

for all $A$ and $B$ such that $P (A \land B) = 0$ together with Bayesian probability theory.

What I was talking about in my previous comment goes against the desirability axiom in the sense that I meant that for $X = "Sun with probability p and rain with probability (1 - p) "$ in the more general case there could be subjects that prefer certain outcomes proportionally more (or less) than usual such that $U (X) \neq p U (Sun) + (1 - p) U (Rain)$ for some probabilities $p$ . As the equality derives directly from the desirability axiom, it was wrong of me to generalise that far.

But, to get back to the confusion at hand we need to unpack the tautology axiom a bit. If we say that a proposition $⊤$ is a tautology if and only if $P (⊤) = 1$ ^[1], then we can see that any proposition that is no news to us has zero utils as well.

And I think it might be well to keep in mind that learning that e.g. sun tomorrow is more probable than we once thought does not necessarily make us prefer sun tomorrow less, but the amount of utils for sun tomorrow has decreased (in an absolute sense). This comes in nicely with the money analogy because you wouldn’t buy something that you expect with certainty anyway^[2], but this doesn’t mean that you prefer it any less compared to some other worse outcome that you expected some time earlier. It is just that we’ve updated from our observations such that the utility function now reflects our current beliefs. If you prefer $A$ to $B$ then this is a fact regardless of the probabilities of those outcomes. When the probabilities change, what is changing is the mapping from proposition to real number (the utility function) and it is only changing with an shift (and possibly scaling) by a real number.

At least that is the interpretation that I’ve done.
1. ↩︎
  This seems reasonable but non-trivial to prove depending on how we translate between logic and probability.
2. ↩︎
  If you do, you either don’t actually expect it or has a bad sense of business.

Viktor Rehnberg 12 Oct 2022 16:42 UTC
3 points
0
in reply to: PeterMcCluskey’s comment on: Actually, All Nuclear Famine Papers are Bunk
Skimming the methodology it seems to be a definite improvement and does tackle the short-comings mentioned in the original post to some degree at least.

Viktor Rehnberg 12 Oct 2022 15:59 UTC
1 point
0
in reply to: cubefox’s comment on: Can you define “utility” in utilitarianism without using words for specific human emotions?
Isn’t that just a question whether you assume expected utility or not. In the general case it is only utility not expected utility that matters.

Viktor Rehnberg 11 Oct 2022 16:59 UTC
2 points
0
in reply to: cubefox’s comment on: Can you define “utility” in utilitarianism without using words for specific human emotions?

Anyway, someone should do a writeup of our findings, right? :)

Sure, I’ve found it to be an interesting framework to think in so I suppose someone else might too. You’re the one who’s done the heavy lifting so far so I’ll let you have an executive role.

If you want me to write up a first draft I can probably do it end of next week. I’m a bit busy for at least the next few days.

Viktor Rehnberg

indexA

indexB

indexC

indexD

Performance analysis