Masters student in Physics at the University of Queensland.
I am interested in Quantum Computing, physical AI Safety guarantees and alignment techniques that will work beyond contemporary models.
Masters student in Physics at the University of Queensland.
I am interested in Quantum Computing, physical AI Safety guarantees and alignment techniques that will work beyond contemporary models.
I think a lot of your confusion is stemming from the fact that you are treating PR statments from Anthropic as if they were being made in good faith.
For example:
“Maybe Anthropic should’ve been more clear about what “behind” and “ahead” mean, and when or when not they’re giving themselves the option/soft obligation to pause”
They will try to avoid doing this because it is very embarassing when your previous statements contradict your actions.
“Are Anthropic employees not reacting to this?”
Anthropic employees are paid large amounts of money and get to talk about their concerns with other people in the organisation. They have a direct financial incentive to avoid speaking publicly against the company.
“On a personal note, many of us are much more nervous about working for Anthropic and are much more nervous about the strategic decision-making of its leadership during the critical period.”
The good news is that having a cool job and earning a huge amount of money is enough to quell any moral concerns you might have.
I’d previously worked through a dozen or so chapters of the same Woit textbook you’ve linked as context for Representation Theory.
Given some group , a (limear) “representation” is a homomorphism from G into the GL(V) the general linear group of some vector space.
That is, a map is a representation iff for all elements ,.
Does “preferences between deals dependant on unknown world states” have a group structure? If not it cannot be a representation in the sense meant by Woit.
Is anyone been able to do this right now?
After a quick check, I’m unable to replicate the behaviour shown in this thread using Google Translate on Chrome android mobile browser.
I think it is extremely difficult to predict how a community of intelligences would react to that information.
I’m away from my laptop for a few days, but you’d make a stronger argument by applying Bayes Rules.
Sure, maybe you expect 1 coincidentally suspicious trade per year but that doesn’t mean this specific trade wasn’t insider trading.
I am not sure that all of the individuals with the opportunity to make this trade are rich.
Even if they did have more than $80k, they might not have felt they could safely access any more money without leaving a trail. I’d expect theres a chance that a large withdrawal from your personal bank account is noticed by an investigation.
“I know someone who can actually attend to literally five conversations at once.”
I agree that some people are better at generic multitasking than others, and there are some people who are better at monitoring multiple conversations.
I also believe you know somebody who claims they can attend to 5 conversations at once.
But I’d comfortably bet even money that their ability to recall and process information drops off quickly once they’re trying to attend to more than 2. My model is that unintentionally tricking yourself into believing you have this ability is easier than actually learning it.
Beyond that I’m not sure that the multi/single thread dichotomy is a particularly useful abstraction to describe how human brains function nor does it provide much predictive power.
Here I am not claiming all humans are single or multi-threaded. I am disputing if it is even a meaningful abstraction.
“SOTA alignment research includes stuff like showing that training the models on a hack-filled environment misaligns them unless hacking is framed as a good act”
I am not sure that these are examples of the kind of alignment research TsviBT meant, as the post concerns AGI.
SOTA alignment researchers at Anthropic can:
- prove the existence of phenomena through explicitly demonstrating them.
- make empirical observations and proofs about the behaviour of contemporary models.
- offer conjectures about the behaviour of future models.
Nobody at Anthropic can offer (to my knowledge) a substantial scientific theory that would give reason to be extremely confident that any technique they’ve found will extend to models in the future. I am not sure if they have ever explicitly claimed that they can.
“Empirically when I advocate internally for things that would be commercially costly to Anthropic I don’t notice this weighing on my decisionmaking basically at all, like I’m not sure I’ve literally ever thought about it in that setting?”
With respect, one of the dangers of being a flawed human is the fact that you aren’t aware of every factor that influences your decision making.
I’m not sure that a lack of consciously thinking about financial loss/gain is good empirical evidence that it isn’t affecting your choices.
By all means wear what you want, but the positive reactions you get from strangers who directly approach you are not necessarily an accurate way to gauge how most people are reacting to your outfit. You’re sampling from the population of “people who have spontaneously chosen to engage with you”.
Generally when you wear a polarising outfit people who dislike it won’t go out of their way to tell you. I’m extroverted enough that I will (very occasionally) complimented strangers in public on nice/unusual outfits, but I’ve never told a stranger their outfit is bad.
”I inevitably get weird looks from the kind of people who think having a tattoo is an affront to god but they give me that look for just existing with blue hair and pronouns too”
This line in particular just seems like bad epistemics. Is it really likely that everyone who reacts badly to their outfit would also judge them for having coloured hair?
I think we should show some solidarity to people committed to their beliefs and making a personal sacrifice, rather than undermining them by critiquing their approach.
Given that they’re both young men and the hunger strikes are occurring in the first world, it seems unlikely anyone will die. But it does seem likely they or their friends will read this thread.
Beyond that, the hunger strike is only on day 2 and is has already received a small amount of media coverage. Should they go viral then this one action alone will have a larger differential impact on reducing existential risk than most safety researchers will achieve in their entire careers.
https://www.businessinsider.com/hunger-strike-deepmind-ai-threat-fears-agi-demis-hassabis-2025-9
il faut imaginer sisyphe heureux
Von Neumann might have been driven by a feeling of inadequacy, but that doesn’t mean it was necessary for his success. One can imagine Von NewOutlook-Mann who took the same actions in life but viewed them as working towards a positive goal rather than needing to prove himself.
It strikes me that Anthropic’s blog post is engaging in a bit of double-speak in saying they are “disrupting” the operations of cybercriminals.
What they are describing is retroactively taking action after crime has occurred.
The following illustration from 2015 by Tim Urban seems like a decent summary of how people interpreted this and other statements.
I’ve thrown on some limit orders if anyone is strongly pro-Kokotajlo.
That’s awesome to hear.
(On a side note your hyperlink currently includes a spurious fullstop that means the link 404′s).
An alternative interpretation of the reported findings is that the process used to generate the “100% hack-free” dataset was itself imperfect. The assumption of a fully hack-free corpus rests on validation by a large language model, but such judgments are not infallible.
I would suggest making the cleaned dataset, or at least a substantial sample, publicly available to enable broader scrutiny. You might additionally consider re-filtering through a second LLM with distinct prompting or a multi-agent setup.
“I think society has weird memes about balding and male beauty in general. Stoically accepting a disfigurement isn’t particularly noble”
I think calling natural balding “disfigurement” is in line with the weird memes around male beauty.
Not having hair isn’t harmful.
Disclaimer: I may go bald.
I think you’re extrapolating too far from your own experiences. It is absolutely possible to be excited (or at least avoid boredom) for long stretches of time if your life is busy and each day requires you to make meaningful decisions.
This post and paper would benefit from an overview of existing work on Wigners friend and the measurement problem.
For the measurement problem I suggest starting with the chapter “Quantum Coherence and Measurement Theory” in Wells & Milburn’s “Quantum Optics”.
My prior is that almost any decision which is not explictly absurd can be provided a cohesive and somewhat defensible justification when written by intelligent people.