AI Safety Researcher
Axel Ahlqvist
Axel Ahlqvist’s Shortform
Why use citation when hyperlinks exists?
To me, hyperlinks seem superior in every way.
Disturbs the reading less
More convenient to access source
For citations, I click
Get sent to the bibliography
I use the link
Then I have to go back and find where I was in the paper.
I would guess the main reason is that links are not stable?
You can fix problems #1 and #3 above
Is the #3 referrering to the “1.” in the second numbered list?
Sorry, I missed this. In case you are interested in AI Safety there is an active AI Safety group in Stockholm that does biweekly meetups: https://www.facebook.com/groups/4935355363232955.
Mechanistic Interpretability Via Learning Differential Equations: AI Safety Camp Project Intermediate Report.
[Question] What is the point of 2v2 debates?
Definitely guilty of preaching to the choir :).
So people feel that LW should be focussed on other things than critiquing influential but unqualified opinions. I am sympathetic to this. It is somewhat of a Sisyphus task to weed out bad opinions from public discourse and responding on LW is probably not the most effient way of doing it in any case.
Personally, when I am convinced of something, I try to find the strongest critiques of that belief. For instance, I’ve looked for criticisms of Yudkowsky and even read a little on r/SneerClub to evaluate whether I’ve been duped by internet lunatics :). If other people acted the same, it would be valuable to have critiques of bad opinions, even if they are posted where the intended audience otherwise never visits. But I suspect few people act like that.
I would be interested in if you have suggestions for what are better ways to effect public opinion than posts like this one. I guess the rationality project of raising the global sanity level is partly aimed at this.
I’ve received a significant ratio of downvotes on the post. Since this is my first post on LW, I would greatly appreciate feedback on why readers did not find the post of sufficient quality for the site.
I believe even broad pointers could be very helpful. Was it mostly about sloppy argumentation, the tone, the language, etc?
I agree about the distinction and agree that both invariance and unawareness are worth pursuing to get better use out of evaluation and auditing.
However, I am slightly more positive towards unawareness and slightly less positive towards invariance.
When I imagine an AI powerful enough that it would cause a catastrophe, it is because it has a goal regarding the outcome of the world that is misaligned with human goals. I have a hard time imagining such a goal that would be invariant to whether it was being evaluated or not since evaluation could lead to the model being retrained. It seems to me that in order to solve eval invariance for sufficiently intelligent and agentic systems we would have to solve most of what makes alignment difficult in the first place.
Regarding unawareness I agree that it won’t scale to ASI and that Eval Awareness for current evals requires only Human level intelligence. However, I believe there is a lot of headroom to make our evals more realistic. For instance, if we have prompts from deployments that led to concerning behavior for previous models, we could subject new models to a deployment distribution of prompts and manually inspect the ones that previously led to misaligned behavior. The only fundamental difference between deployment and evals is that deployment will happen in the future. So a sufficiently inteligent model could figure this out by for example waiting to see a factorisation of RSA-2048.
Since neither approach would work on a sufficiently intelligent and agentic systems, the question is what scales furthest.