Dan MacKinlay

Karma: 345

Professionally, AI, science, AI4Science, Safety4AI. Also human ecology and indonesian death metal remixes.
See danmackinlay.name for more words about background and my now page for bonus stuff.

Dan MacKinlay 29 Apr 2026 9:12 UTC
2 points
0
on: An Alignment Journal: Adaptation to AI
Since publishing this we noticed the following initiative: https://future-science.org/mirror
Mirror: An Automated Journal of AI Interpretability is a fully automated journal of AI interpretability. This journal features original research composed, conducted, and written entirely by LLMs analyzing LLMs. Much of the research published in Mirror falls within the category of “mechanistic interpretability,” in which model behaviors are decomposed into operations in the model’s internal representation space, but any rigorous research advancing our understanding of LLMs is welcome, be it mechanistic, behavioral, or theoretical.
I would be curious to know if others get value from this “fully automated journal”

An Alignment Journal: Adaptation to AI

JessRiedel, Dan MacKinlay, Yonatan Cale and david reinstein

28 Apr 2026 17:04 UTC

25 points

1 comment13 min readLW link

(blog.alignmentjournal.org)

An Alignment Journal: Features and policies

JessRiedel, Dan MacKinlay, Luca, Daniel Murfet and david reinstein

7 Apr 2026 15:22 UTC

61 points

4 comments15 min readLW link

(blog.alignmentjournal.org)

Dan MacKinlay 5 Mar 2026 4:54 UTC
2 points
0
in reply to: zroe1’s comment on: An Alignment Journal: Coming Soon
We do not yet plan to support replications of empirical work. Organisationally, there is a desire to keep opening scope tight and theoretical to avoid having diffuse messaging at start up

Personally, I would make a case that replications are not as important in the ML/AI research as in the sciences of the physical (although this depends somewhat on what we mean by “replications”)

That said, I think that there is a strong argument for replications generally, and maybe in this field too, and if the Editorial Board agreed with that, then that is what we would do. I am beholden at this point to mention the connection to the UnJournal work that David has mentioned elsewhere in these comments.

Dan MacKinlay 5 Mar 2026 4:49 UTC
2 points
0
in reply to: Gurkenglas’s comment on: An Alignment Journal: Coming Soon
I like this idea aesthetically. I foresee some challenges in making “staking” something that won’t trigger alarms in the existing research bureaucracies that host many of our potential authors. If you have clever ideas for how to handle that I would be curious to hear.

Dan MacKinlay 5 Mar 2026 4:47 UTC
3 points
0
in reply to: beyarkay’s comment on: An Alignment Journal: Coming Soon
This publication bias story in ML is a whole can of worms which I would love to open at some point. tl;dr it is a problem, but the field has semi-accidentally mitigated many of the worse excesses of it. There is an IMO massively under-regarded work on this— Moritz Hardt’s Machine Learning Benchmarks, which I will write a LW review of some day if I have time.

Dan MacKinlay 5 Mar 2026 4:41 UTC
3 points
0
in reply to: david reinstein’s comment on: An Alignment Journal: Coming Soon
Yes, I’m excited to see what we can learn David’s experience, especially given the incentive designer’s insight that he brings to this. We also, collectively, have some experience with the ILIAD conferences which was a precursor experimenting with alternative compensation mechanisms. See Proceedings of ILIAD: Lessons and Progress for some analysis of that project.

Dan MacKinlay 3 Mar 2026 22:17 UTC
5 points
0
in reply to: Leon Lang’s comment on: An Alignment Journal: Coming Soon
We are trying to do both, in that we are attempting to be a bridge between LW and wider scientific communities. Where do you feel our tone might be excluding domain scientists?

An Alignment Journal: Coming Soon

Dan MacKinlay, JessRiedel, Edmund Lau, Daniel Murfet, Scott Aaronson, Jan_Kulveit, david reinstein, Alexander Gietelink Oldenziel and Marcus Hutter

3 Mar 2026 20:27 UTC

254 points

29 comments6 min readLW link

(blog.alignmentjournal.org)

Dan MacKinlay 9 Feb 2026 13:37 UTC
1 point
0
in reply to: megasilverfist’s comment on: Ryan Kidd’s Shortform
@megasilverfist there are quite a few of us based in Melbourne. HMU.

Dan MacKinlay 9 Feb 2026 13:35 UTC
4 points
0
in reply to: HunterJay’s comment on: Ryan Kidd’s Shortform
We’re not free at the Melbourne AI Safety Hub, but we are all terribly charming.

Dan MacKinlay 9 Feb 2026 8:47 UTC
2 points
0
in reply to: Ryan Kidd’s comment on: Ryan Kidd’s Shortform
Tom Everitt did his PhD in Australia too. (As did I, FWIW.)

Dan MacKinlay 8 Jan 2026 15:14 UTC
1 point
0
on: DSLT 2. Why Neural Networks obey Occam’s Razor
If $W$ contains one true parameter $w^{(0)} \in W_{0}$ ,
Having trouble parsing this. Does this mean that one element of the parameter vector is “true”?

The deep history of intelligence

Dan MacKinlay2 Aug 2025 4:04 UTC

10 points

0 comments1 min readLW link

(danmackinlay.name)

“Opponent shaping” as a model for manipulation and cooperation

Dan MacKinlay1 Aug 2025 7:50 UTC

16 points

0 comments17 min readLW link

(danmackinlay.name)

Dan MacKinlay 20 May 2025 23:01 UTC
3 points
0
on: Selective regularization for alignment-focused representation engineering
Interesting! Ingenious choice of “color learning” to solve the problem of plotting the learned representations elegantly.
This puts me in mind of the “disentangled representation learning” literature (review e.g. here). I’ve thought about disentangled learning mostly in terms of the Variational Auto-Encoder and GANs, but I think there is work there that applies to any architecture with a bottleneck, so your bottleneck MLP might find some interesting extensions there,
I wonder: what is the generalisation of your regularisation approach to architectures without a bottleneck? I think you gesture at it when musing on how to generalise to transformers. If the latent/regularised content space needs to “share” with lots of concepts, how do we get “nice mappings” there?

Dan MacKinlay 20 May 2025 5:02 UTC
1 point
0
on: Will Jesus Christ return in an election year?
I’m enjoying envisaging this as an alternative explanation for the classic Lizardman’s Constant, which is a smidge larger than 3% but then, in cheap talk markets you have less on the line, so…

Dan MacKinlay 12 May 2025 10:45 UTC
1 point
0
in reply to: Sheikh Abdur Raheem Ali’s comment on: Sheikh Abdur Raheem Ali’s Shortform
Ideally you would wish to calibrate your EV calcs against the benefit of a UAE AISI, though, no, not the expected budget? We could estimate the value of such an institute being more than the running cost (or, indeed, less) depending on the relative leverage of such an institute.

Dan MacKinlay

An Align­ment Jour­nal: Adap­ta­tion to AI

An Align­ment Jour­nal: Fea­tures and policies

An Align­ment Jour­nal: Com­ing Soon

The deep his­tory of intelligence

“Op­po­nent shap­ing” as a model for ma­nipu­la­tion and cooperation

An Alignment Journal: Adaptation to AI

An Alignment Journal: Features and policies

An Alignment Journal: Coming Soon

The deep history of intelligence

“Opponent shaping” as a model for manipulation and cooperation