harfe

Karma: 655

harfe 21 Jun 2026 11:36 UTC
LW: 1 AF: 1
0
AF
on: An Introduction to Reinforcement Learning for Understanding Infra-Bayesianism

Definition: Probability distributions on histories induced by a policy and environment

The probability distribution as defined does not sum up to 1. I think it should be , not .

It also feels like a definition defining on destinies is missing. The footnote already explains how. The rest of the post makes more sense to me if, by default, .

harfe 7 May 2026 23:37 UTC
1 point
0
in reply to: leogao’s comment on: leogao’s Shortform
For the orbital data centers, it makes more sense if you think of it as many small satellites rather then a few big ones.

SpaceX’s FCC filing talks of up to one million satellites in sun-synchronous orbit, (PDF here).

This makes the issues with cooling more manageable. The starlink v3 satellites are supposed to have 20kw of power each, so radiating out this waste heat on these scales is not impossible (Elon tweets about ~100kw per ton of satellite, and designing GPUs to run at higher temperatures).

Such an orbital GPU cloud would make more sense for inference, not training.

Overall I am not yet convinced that this is competitive with earth-based data centers, but it seems less stupid than I imagined at first.

harfe 21 Apr 2026 12:03 UTC
1 point
0
on: Today’s Ring Signatures and Related Tools
For the purpose of whistle-blowing, I wonder whether Signal could be used as a source of keys tied to identities of real people. The advantage would be that lots of OpenBrain employees might already use signal. But there are certainly some difficulties:
- Ring signatures can not be publicly verified, but they could be verified by a journalist who gets their hands on lots of Signal contacts of OpenBrain employees.
- Verifying that all the signal contacts in the ring actually belong to OpenBrain employees is difficult.
- The public keys are not visible in the normal signal client, so specialized tools would need to be created.

harfe 10 Apr 2026 8:29 UTC
5 points
0
on: harfe’s Shortform
There was report that the CIA used a new tool called Ghost Murmur to detect the electromagnetic signals of a human heart from (40?) miles away, using long-range quantum magnetometry.

See also Wikipedia.

My first guess (and still a hypothesis) is that this is deliberate disinformation by the US, but i do not have the expertise required to judge the plausibility. In any case, it could have been an interesting question on the “Could a superintelligence do that?” quiz show.

harfe 10 Apr 2026 5:55 UTC
1 point
1
in reply to: Mo Putera’s comment on: Mo Putera’s Shortform
Is any of the lean code public? That could give a better sense of what to expect. Saying that they are working on a “skeletal Lean code” could be very little compared to what would be required to convince other mathematicians.

harfe 19 Mar 2026 12:45 UTC
2 points
0
in reply to: Mikhail Samin’s comment on: Mikhail Samin’s Shortform
Yesterday they also did an exclusive interview with Sam Altman

harfe 21 Jan 2026 15:34 UTC
4 points
0
on: harfe’s Shortform
I learned that undersea data centers are possible. Microsoft had Project Natick, but it looks like they abandoned it. There is also a chinese project and a western startup. The main benefit seems to be reduced cooling costs.

harfe 13 Jan 2026 11:18 UTC
5 points
3
in reply to: lilkim2025’s comment on: Elizabeth’s Shortform
Canada also uses FPTP, so this is not the example you should be using for examining alternatives.

Proportional representation, which is common in continental Europe, does result in a diversity of parties in practice.

harfe 28 Oct 2025 14:27 UTC
9 points
0
on: Heuristics for assessing how much of a bubble AI is in/will be

There’s no bigger narrative than the one AI industry leaders have been pushing since before the boom: AGI will soon be able to do just about anything a human can do, and will usher in an age of superpowerful technology the likes of which we can only begin to imagine. Jobs will be automated, industries transformed, cancer cured, climate change solved; AI will do quite literally everything.

The article unfortunately does not seriously consider the possibility that AGI has the potential to automate most jobs in a few years. The large investments into AI would be justified in this case, even if current revenue is small! I think this is an important difference to past bubbles.

OpenAI, Anthropic, and the AI-embracing tech giants are burning through billions, inference costs haven’t fallen (those companies still lose money on nearly every user query), and the long-term viability of their enterprise programs are a big question mark at best.

The part about inference costs seems false, unless they mean total inference costs of all their instances.

harfe 27 Oct 2025 15:11 UTC
1 point
0
on: Uncommon Utilitarianism #3: Bounded Utility Functions
Most^[1] problems with unbounded utility functions go away if you restrict yourself to summable utility functions^[2]. Summable utility functions can still be unbounded.

For example, if each planet in the universe gives you 1 utility, and $P (universe has exactly n planets) = 2^{- n}$ for $n \geq 1$ , then your utility function is unbounded but summable. In such a universe it would be very unlikely for a casino to hand out a large number of planets.

Your proof relies on the assumption

assuming that the casino has unbounded utility to hand out.

and this assumption would be wrong in my example.
1. ↩︎
  In fact, I do not know of an exception.
2. ↩︎
  A summable function is a measurable function for which the integral of its absolute value is finite (using the probability measure for the integral in this context).

harfe 29 Apr 2025 10:59 UTC
4 points
0
in reply to: Zach Stein-Perlman’s comment on: What are the best standardised, repeatable bets?
Note that GWWC is shutting down their donor lottery, among other things: https://forum.effectivealtruism.org/posts/f7yQFP3ZhtfDkD7pr/gwwc-is-retiring-10-initiatives

harfe 14 Feb 2025 21:21 UTC
1 point
2
in reply to: plex’s comment on: harfe’s Shortform
Mid 2027 seems too late to me for such a candidate to start the official campaign.

For the 2020 presidential election, many democratic candidates announced their campaign in early 2019, and Yang already in 2017. Debates happened already in June 2019. As a likely unknown candidate, you probably need a longer run time to accumulate a bit of fame.

harfe 12 Feb 2025 17:20 UTC
12 points
0
in reply to: MichaelDickens’s comment on: harfe’s Shortform

Also Musk’s regulatory plan is polling well

What plan are you referring to? Is this something AI safety specific?

harfe 12 Feb 2025 13:05 UTC
1 point
0
in reply to: Mateusz Bagiński’s comment on: harfe’s Shortform
I wouldn’t say so, I don’t think his campaign has made UBI advocacy more difficult.

But an AI notkilleveryoneism campaign seems more risky. It could end up making the worries look silly, for example.

harfe 12 Feb 2025 3:40 UTC
5 points
0
in reply to: Mitchell_Porter’s comment on: harfe’s Shortform
Their platform would be whatever version and framing of AI notkilleveryoneism the candidates personally endorse, plus maybe some other smaller things. They should be open that they consider the potential human disempowerment or extinction to be the main problem of our time.

As for the concrete policy proposals, I am not sure. The focus could be on international treaties, or banning or heavy regulation of AI models who were trained with more than a trillion quadrillion (10^27) operations. (not sure I understand the intent behind your question).

harfe 11 Feb 2025 18:19 UTC
81 points
13
on: harfe’s Shortform
A potentially impactful thing: someone competent runs as a candidate for the 2028 election on an AI notkilleveryoneism^[1] platform. Maybe even two people should run, one for the democratic primary, and one in the republican primary. While getting the nomination is rather unlikely, there could be lots of benefits even if you fail to gain the nomination (like other presidential candidates becoming sympathetic to AI notkilleveryoneism, or more popularity of AI notkilleveryoneism in the population, etc.)

On the other hand, attempting a presidential run can easily backfire.

A relevant previous example to this kind of approach is the 2020 campaign by Andrew Yang, which focussed on universal basic income (and downsides of automation). While the campaign attracted some attention, it seems like it didn’t succeed in making UBI a popular policy among democrats.
1. ↩︎
  Not necessarily using that name.
What links here?
- The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better by Thane Ruthenis (21 Feb 2025 20:15 UTC; 158 points)
- 2028 Should Not Be AI Safety’s First Foray Into Politics by Jesse Richardson (4 Mar 2025 16:46 UTC; 5 points)

harfe 7 Feb 2025 14:22 UTC
LW: 2 AF: 2
1
AF
on: Debate, Oracles, and Obfuscated Arguments

This can easily be done in the cryptographic example above: B can sample a new number $y = p^{'} \cdot q^{'}$ , and then present $y$ to a fresh copy of A that has not seen the transcript for $x$ so far.

I don’t understand how this is supposed to help. I guess the point is to somehow catch a fresh copy of A in a lie about a problem that is different from the original problem, and conclude that A is the dishonest debater?

But couldn’t A just answer “I don’t know”?

Even if it is a fresh copy, it would notice that it does not know the secret factors, so it could display different behavior than in the $x$ case where A knows the secret factors $p, q$ .

harfe 3 Feb 2025 16:26 UTC
9 points
2
in reply to: johnswentworth’s comment on: What do coherence arguments actually prove about agentic behavior?

Some of these are very easy to prove; here’s my favorite example. An agent has a fixed utility function and performs Pareto-optimally on that utility function across multiple worlds (so “utility in each world” is the set of objectives). Then there’s a normal vector (or family of normal vectors) to the Pareto surface at whatever point the agent achieves. (You should draw a picture at this point in order for this to make sense.) That normal vector’s components will all be nonnegative (because Pareto surface), and the vector is defined only up to normalization, so we can interpret that normal vector as a probability distribution. That also makes sense intuitively: larger components of that vector (i.e. higher probabilities) indicate that the agent is “optimizing relatively harder” for utility in those worlds. This says nothing at all about how the agent will update, and we’d need a another couple sentences to argue that the agent maximizes expected utility under the distribution, but it does give the prototypical mental picture behind the “Pareto-optimal → probabilities” idea.

Here is an example (to point out a missing assumption): Lets say you are offered to bet on the result of a coin flip for $1$ dollar. You get $3$ dollars if you win, and your utility function is linear in dollars. You have three actions: “Heads”, “Tails”, and “Pass”. Then “Pass” performs Pareto-optimally across multiple worlds. But “Pass” does not maximize expected utility under any distribution.

I think what is needed for the result is an additional convexity-like assumption about the utilities. This could be the set of achievable utility vectors is convex'', or even something weaker like every convex combination of achievable utility vectors is dominated by an achievable utility vector” (here, by utility vector I mean $(u_{w})_{w \in W}$ if $u_{w}$ is the utility of world $w$ ). If you already accept the concept of expected utility maximization, then you could also use mixed strategies to get the convexity-like assumption (but that is not useful if the point is to motivate using probabilities and expected utility maximization).

Or: even if you do expect powerful agents to be approximately Pareto-optimal, presumably they will be approximately Pareto optimal, not exactly Pareto-optimal. What can we say about coherence then?

The underlying math statement of some of these kind of results about Pareto-optimality seems to be something like this:

If $¯ x$ is Pareto-optimal wrt utilities $u_{i}$ , $i = 1, \dots n$ and a convexity assumption (e.g. the set ${(u_{i} (x))_{i = 1}^{n} : x}$ is convex, or something with mixed strategies) holds, then there is a probability distribution $μ$ so that $¯ x$ is optimal for $U (x) = E_{i \sim μ} u_{i} (x)$ .

I think there is a (relatively simple) approximate version of this, where we start out with approximate Pareto-optimality.

We say that $¯ x$ is Pareto $ε$ —optimal if there is no (strong) Pareto-improvement by more than $ε$ (that is, there is no $x$ with $u_{i} (x) > u_{i} (¯ x) + ε$ for all $i$ ).

Claim: If $¯ x$ is Pareto $ε$ —optimal and the convexity assumption holds, then there is a probability distribution $μ$ so that $¯ x$ is $ε$ -optimal for $U (x) = E_{i \sim μ} u_{i} (x)$ .

Rough proof: Define $Y := {(u_{i} (x))_{i = 1}^{n} : x}$ and $¯ ¯¯ ¯ Y$ as the closure of $Y$ . Let $~ y \in ¯ ¯¯ ¯ Y$ be of the form $~ y = (u_{i} (¯ x) + δ)_{i = 1}^{n}$ for the largest $δ$ such that $~ y \in ¯ ¯¯ ¯ Y$ . We know that $δ \leq ε$ . Now $~ y$ is Pareto-optimal for $Y$ , and by the non-approximate version there exists a probability distribution $μ$ so that $~ y$ is optimal for $y \mapsto E_{i \sim μ} y_{i}$ . Then, for any $x$ , we have $\mathbb{E}{i\sim\mu} u_i(x) \leq \mathbb{E}{i\sim\mu} \tilde y_i = \mathbb{E}{i\sim\mu} (u_i(\bar x) + \delta)\le \varepsilon + \mathbb{E}{i\sim\mu} u_i(\bar x), $ that is, $¯ x$ is $ε$ -optimal for $U$ .

harfe 23 Jan 2025 17:03 UTC
LW: 13 AF: 9
2
AF
in reply to: Vanessa Kosoy’s comment on: Vanessa Kosoy’s Shortform
I think there are some subtleties with the (non-infra) bayesian VNM version, which come down to the difference between “extreme point” and “exposed point” of $D$ . If a point is an extreme point that is not an exposed point, then it cannot be the unique expected utility maximizer under a utility function (but it can be a non-unique maximizer).

For extreme points it might still work with uniqueness, if, instead of a VNM-decision-maker, we require a slightly weaker decision maker whose preferences satisfy the VNM axioms except continuity.

harfe 22 Jan 2025 16:27 UTC
LW: 12 AF: 8
2
AF
in reply to: Vanessa Kosoy’s comment on: Vanessa Kosoy’s Shortform

For any $Φ, Ψ \in^D$ , if $Θ^{*} = Φ \lor Ψ$ then either $Φ \subseteq Ψ$ or $Ψ \subseteq Φ$ .

I think this condition might be too weak and the conjecture is not true under this definition.

If $Φ_{1} \subseteq Φ_{2}$ , then we have $E_{y \sim ξ} {min}_{μ \in Φ_{2}} E_{x \sim μ} u (x, y) \leq E_{y \sim ξ} {min}_{μ \in Φ_{1}} E_{x \sim μ} u (x, y)$ (because a minimum over a larger set is smaller). Thus, $Φ_{2}$ can only be the unique argmax if $Φ_{1} = Φ_{2}$ .

Consider the example $^D={[0,x]:x∈[0,1]}$ . Then $^D$ is closed. And $Θ^{*} = [0, 1]$ satisfies $Θ^{*} = Φ \lor Ψ ⟹ Φ \subseteq Ψ \lor Ψ \subseteq Φ$ . But per the above it cannot be a unique maximizer.

Maybe the issue can be fixed if we strengthen the condition so that $Φ^{*}$ has to be also minimal with respect to $\subseteq$ .