jacek

Karma: 238

jacek 16 Mar 2026 8:39 UTC
34 points
3
on: Less Dead
I see no pushback about the actual science of it in the comments, which seems surprising. It’s great that there is a new brain preservation company around, but I think the results are somewhat over-claimed. I’m not an expert in biochemistry, but if I understand correctly, your (new) preprint is basically: take n=5 pigs, try to have their brains preserved, which fails in the first two of them because of incorrectly placed cannula, then succeeds in the next three, in particular the last one with t=14 mins gap. Then look at the damage on a few sites under microscope and judge them to be OK. For reference, the abstract says that you develop “connectomically traceable whole brains [...] and establish that 14 min is the approximate length of the perfusability window”. The new preprint is not peer-reviewed, and the old paper is from more than 10 years ago. (I see that it’s actually has 61 citations in google scholar, it would be useful to know what was the progress / assessment from the scientific community since then.)

Tiny sample size, imperfect experimental setup, no published actual quantitative data, as well as obvious conflict of interest on top of this makes it hard for me to fully believe in the results. FWIW I asked one biochem PhD friend of mine for opinion, and then also ChatGPT, both seem to strongly agree that this is a cool proof of concept, but significantly over-claimed as far as scientific evidence goes.

(My) self-referential reason to believe in free will

jacek6 Jan 2025 23:35 UTC

12 points

6 comments1 min readLW link

Characterizing stable regions in the residual stream of LLMs

Jett Janiak, jacek, Chatrik, Giorgi Giglemiani, nlpet and StefanHex

26 Sep 2024 13:44 UTC

43 points

4 comments1 min readLW link

(arxiv.org)

jacek 17 Oct 2023 18:12 UTC
LW: 5 AF: 3
0
AF
in reply to: TurnTrout’s comment on: Goodhart’s Law in Reinforcement Learning
Thanks for the comment! Note that we use state-action visitation distribution, so we consider trajectories that contain actions as well. This makes it possible to invert $η$ (as long as all states are visited). Using only states trajectories, it would indeed be impossible to recover the policy.

Goodhart’s Law in Reinforcement Learning

jacek, Joar Skalse, OliverHH, Charlie Griffin and Xingjian Bai

16 Oct 2023 0:54 UTC

126 points

22 comments7 min readLW link

jacek 17 Feb 2023 18:32 UTC
3 points
0
in reply to: Noosphere89’s comment on: A warm-up for the AI governance project
Yes, I agree that the politicisation is the central issue. But this is exactly why I wrote the first part—I feel that this section is true despite it (I didn’t claim that most people agree with the solution, only that the elites, experts, and the reader’s social bubble does!).

So one question I’m trying to understand is: since politicisation happened to climate change, why do we think that it won’t happen to AI governance? I.e. the point is that pursuing goals by political means might just usually end up like that, because of the basic structure of the political discourse (you get points for opposing the other side, etc).

A warm-up for the AI governance project

jacek17 Feb 2023 18:06 UTC

10 points

2 comments3 min readLW link

jacek 13 Jan 2023 9:22 UTC
1 point
0
in reply to: Algon’s comment on: Categorical-measure-theoretic approach to optimal policies tending to seek power
Hm, so one comment is that the proof in the post was not meant to convey the intuition for the existence of the concrete probability distribution—the measurability of the POWER inequality is a necessary first step, but not really technically related to the (potential) rest of the proof (although I had initially hoped that lifting some distribution on rewards by the Giry monad might produce something interesting).

As for why the additional structure might be helpful: the issue with there being no Lebesgue-like uniform measure is that in the infinite-dimensional space like $[0, 1]^{N}$ , one cannot assign any positive measure to any subset. For example, in $[0, 1]$ , each of the halves have to have equal measures, because the measure has to be shift-invariant. In $[0, 1]^{2}$ , we can do this with each of the four squares like $[0, 1 / 2] \times [0, 1 / 2]$ . Repeating this process, in the limit, there is no measure we can assign to those intervals, because they can be divide into countably-many non-negligible sets (c.f. https://en.wikipedia.org/wiki/Infinite-dimensional_Lebesgue_measure).

So the problem is that first, the space is too big, and second, there is too much freedom of cutting the space into pieces and shifting them. The EPIC metric paper I linked to in the post (or some related research) might be helpful in solving both of these issues.

First, we can make the space smaller by dividing it by some equivalence relation—reward shaping properties in MDPs provide such relation (although EPIC considers the relation from the original paper by Ng et al which is too weak—something stronger is needed). To give a concrete (although a bit silly) example: there is no uniform measure on the space of real-valued functions on $[0, 1]$ . But suppose we have a (very strong) equivalence relation $f \sim g$ iff $f (0) = g (0)$ . Then, the space collapses to just $R$ , which has a normal $λ$ measure.

The second problem is that we had too much freedom in shifting the subsets (or, the shift-invariance was too strong). In our case, “shifting” is applied to the sets of probability distributions of rewards. But individual rewards cannot always be shifted, since this operation doesn’t preserve optimal policies. So maybe this puts some restrictions on the transformations we can apply to the space, and the measures don’t blow up.

So, briefly, I don’t understand those behaviours very well yet, but my intuitive optimism comes from:
- first, the fact that of rewards seems to be rich, so if the space of distributions of rewards inherits some of the properties, the induced symmetries would limit the allowed transformations
- second, there is another approach I which forgot to write about in the post, which is to consider non-shift-invariant uninformative priors—for example, Jeffrey prior on $[0, 1]$ is not the uniform distribution. It seems that the problem we are dealing with here is quite common in math, and people have invented workarounds (like the abstract Wiener spaces mentioned in the wikipedia article) - the issue is checking whether any of those workaround applies here

Categorical-measure-theoretic approach to optimal policies tending to seek power

jacek12 Jan 2023 0:32 UTC

31 points

3 comments6 min readLW link

jacek

(My) self-refer­en­tial rea­son to be­lieve in free will

Char­ac­ter­iz­ing sta­ble re­gions in the resi­d­ual stream of LLMs

Good­hart’s Law in Re­in­force­ment Learning

A warm-up for the AI gov­er­nance project

Cat­e­gor­i­cal-mea­sure-the­o­retic ap­proach to op­ti­mal poli­cies tend­ing to seek power

(My) self-referential reason to believe in free will

Characterizing stable regions in the residual stream of LLMs

Goodhart’s Law in Reinforcement Learning

A warm-up for the AI governance project

Categorical-measure-theoretic approach to optimal policies tending to seek power