Masters student in Physics at the University of Queensland.
I am interested in Quantum Computing, physical AI Safety guarantees and alignment techniques that will work beyond contemporary models.
Masters student in Physics at the University of Queensland.
I am interested in Quantum Computing, physical AI Safety guarantees and alignment techniques that will work beyond contemporary models.
What useful heuristics you use for quickly estimating probabilities do you use?
Mine is: try to actively think about picking the probability such that if you would be impartial between taking either side of the bet.
This is (tautologically) the correct approach but I am often estimating probabilities in an “adversarial” context (i.e. informally with friends where we both have fairly different models). In this context you’re incentivised to negotiate the best possible odds for yourself, even if you aren’t actively trying to do this.
Rather than estimating the actual probability, I found my brain was estimating “what odds would I be happy to place a bet on” which ends up estimating lower odds for the side you expect to bet on.
(Obviously, the best tip is “practice” and “keep a record of your predictions and examine them”.)
I believe 4 to be bad advice for most children. If you’re from a rich, stable family and recieve a good education yet still struggle at a particular subject then maybe, yes, it does reflect some kind of inherit weakness.
But the majority of children who are behind in an area simply haven’t spent enough time working on it. Regardless of a childs genuine abilities, they’ll do worse if you tell them they should consider themselves to be “less capable”.
What predictions does this model make?
Why would you expect the American government not to pursue America first policies?
My prior is that almost any decision which is not explictly absurd can be provided a cohesive and somewhat defensible justification when written by intelligent people.
I think a lot of your confusion is stemming from the fact that you are treating PR statments from Anthropic as if they were being made in good faith.
For example:
“Maybe Anthropic should’ve been more clear about what “behind” and “ahead” mean, and when or when not they’re giving themselves the option/soft obligation to pause”
They will try to avoid doing this because it is very embarassing when your previous statements contradict your actions.
“Are Anthropic employees not reacting to this?”
Anthropic employees are paid large amounts of money and get to talk about their concerns with other people in the organisation. They have a direct financial incentive to avoid speaking publicly against the company.
“On a personal note, many of us are much more nervous about working for Anthropic and are much more nervous about the strategic decision-making of its leadership during the critical period.”
The good news is that having a cool job and earning a huge amount of money is enough to quell any moral concerns you might have.
I’d previously worked through a dozen or so chapters of the same Woit textbook you’ve linked as context for Representation Theory.
Given some group , a (limear) “representation” is a homomorphism from G into the GL(V) the general linear group of some vector space.
That is, a map is a representation iff for all elements ,.
Does “preferences between deals dependant on unknown world states” have a group structure? If not it cannot be a representation in the sense meant by Woit.
Is anyone been able to do this right now?
After a quick check, I’m unable to replicate the behaviour shown in this thread using Google Translate on Chrome android mobile browser.
I think it is extremely difficult to predict how a community of intelligences would react to that information.
I’m away from my laptop for a few days, but you’d make a stronger argument by applying Bayes Rules.
Sure, maybe you expect 1 coincidentally suspicious trade per year but that doesn’t mean this specific trade wasn’t insider trading.
I am not sure that all of the individuals with the opportunity to make this trade are rich.
Even if they did have more than $80k, they might not have felt they could safely access any more money without leaving a trail. I’d expect theres a chance that a large withdrawal from your personal bank account is noticed by an investigation.
“I know someone who can actually attend to literally five conversations at once.”
I agree that some people are better at generic multitasking than others, and there are some people who are better at monitoring multiple conversations.
I also believe you know somebody who claims they can attend to 5 conversations at once.
But I’d comfortably bet even money that their ability to recall and process information drops off quickly once they’re trying to attend to more than 2. My model is that unintentionally tricking yourself into believing you have this ability is easier than actually learning it.
Beyond that I’m not sure that the multi/single thread dichotomy is a particularly useful abstraction to describe how human brains function nor does it provide much predictive power.
Here I am not claiming all humans are single or multi-threaded. I am disputing if it is even a meaningful abstraction.
“SOTA alignment research includes stuff like showing that training the models on a hack-filled environment misaligns them unless hacking is framed as a good act”
I am not sure that these are examples of the kind of alignment research TsviBT meant, as the post concerns AGI.
SOTA alignment researchers at Anthropic can:
- prove the existence of phenomena through explicitly demonstrating them.
- make empirical observations and proofs about the behaviour of contemporary models.
- offer conjectures about the behaviour of future models.
Nobody at Anthropic can offer (to my knowledge) a substantial scientific theory that would give reason to be extremely confident that any technique they’ve found will extend to models in the future. I am not sure if they have ever explicitly claimed that they can.
“Empirically when I advocate internally for things that would be commercially costly to Anthropic I don’t notice this weighing on my decisionmaking basically at all, like I’m not sure I’ve literally ever thought about it in that setting?”
With respect, one of the dangers of being a flawed human is the fact that you aren’t aware of every factor that influences your decision making.
I’m not sure that a lack of consciously thinking about financial loss/gain is good empirical evidence that it isn’t affecting your choices.
By all means wear what you want, but the positive reactions you get from strangers who directly approach you are not necessarily an accurate way to gauge how most people are reacting to your outfit. You’re sampling from the population of “people who have spontaneously chosen to engage with you”.
Generally when you wear a polarising outfit people who dislike it won’t go out of their way to tell you. I’m extroverted enough that I will (very occasionally) complimented strangers in public on nice/unusual outfits, but I’ve never told a stranger their outfit is bad.
”I inevitably get weird looks from the kind of people who think having a tattoo is an affront to god but they give me that look for just existing with blue hair and pronouns too”
This line in particular just seems like bad epistemics. Is it really likely that everyone who reacts badly to their outfit would also judge them for having coloured hair?
I think we should show some solidarity to people committed to their beliefs and making a personal sacrifice, rather than undermining them by critiquing their approach.
Given that they’re both young men and the hunger strikes are occurring in the first world, it seems unlikely anyone will die. But it does seem likely they or their friends will read this thread.
Beyond that, the hunger strike is only on day 2 and is has already received a small amount of media coverage. Should they go viral then this one action alone will have a larger differential impact on reducing existential risk than most safety researchers will achieve in their entire careers.
https://www.businessinsider.com/hunger-strike-deepmind-ai-threat-fears-agi-demis-hassabis-2025-9
il faut imaginer sisyphe heureux
Von Neumann might have been driven by a feeling of inadequacy, but that doesn’t mean it was necessary for his success. One can imagine Von NewOutlook-Mann who took the same actions in life but viewed them as working towards a positive goal rather than needing to prove himself.
It strikes me that Anthropic’s blog post is engaging in a bit of double-speak in saying they are “disrupting” the operations of cybercriminals.
What they are describing is retroactively taking action after crime has occurred.
The following illustration from 2015 by Tim Urban seems like a decent summary of how people interpreted this and other statements.
I’ve thrown on some limit orders if anyone is strongly pro-Kokotajlo.
“we’ve had so much evidence roll in… we could have more evidence about intelligence and its steerability than any other point in human history”
Anthropic staff perform experiments on modern LLMs and see this as generalisable evidence about future models.
Another interpretation is that these experiments are simply evidence about the behaviour of modern LLMs and will not generalise cleanly to AGI.
I am inclined to believe the latter argument.
The security techniques needed to secure chimpanzees in a zoo are different to the techniques you need to secure an intelligent and healthy adult.
Notice that this holds despite the similar “architecture” of the chimp and human brains. Our neuron count is likewise within an order of magnitude.