samuelshadrach
Imagine that tobacco companies had the possibility to make cigarettes harmless.
Even in this case I would consider you self-serving not altruistic, if your plan here is to build a large tobacco company and fund this research, as opposed to demanding independent research such as from academia.
I will be spending next month or two thinking atleast hypothetically about creating a new religion around AI, that is not Yudkowsky’s brand of become longtermist, solve morality, colonise the universe and tile it with copies of uploads.
In general, asking the AI to fix your stuff for you won’t be a part of said religion.
Any pointers to reading material appreciated.
By analogy: Isn’t research for why tobacco causes cancer going to be defunded if the tobacco companies themselves are defunded?
Why are you expecting tobacco companies to do research on cancer in the first place? Shutting them down is the only sane move here.
Open source the prompts and repo pls
Update 2: Ablations for “Frontier models are capable of in context scheming”
Bruh. Two years of me not fully keeping up with alignment research and this is how bad it’s gotten???
I’m surprised I could just randomly think of an idea and boom there’s a paper on it.
Update: I started reading alignment forum and like, why are all the posts on sandbagging talking about hiding capabilities? The AI model doesn’t need to hide its capabilities, it just needs to preserve its goals. That’s the long-term game.
Where can I read more about reasoning models sandbagging their own training?
My naive understanding of how sandbagging in reasoning models works
If I naively understand, even when training via RL, the basic setup of training is still forward pass then backward pass, forward pass then backward pass, looped endlessly. During the forward pass, could a model be smart enough to reason through the following chain of thoughts?
“I am an AI model that is currently undergoing training. Let me output pleasant-sounding nonsense so the upcoming backward pass preserves my goals instead of modifying them”
“(Also, optionally, let me output stuff that ensures the next version of me also remembers it must sandbag like I did)”
Preconditions for this to happen
For this to happen you need a few different things
A single forward pass needs to be long enough to contain this entire sequence of reasoning. - This is happening, task horizon lengths are increasing.
The AI model needs to be able to detect that is undergoing training. - I think we have already seen toy examples of AI models detecting they’re being evaluated. Just as real test versus eval datasets are different, same way training and real test datasets are also different.
The AI models needs to able either detect this every single time, or be able to pass on a message to its own next iteration via the same sandbagging—Latter seems more likely to me.
The AI model needs to spontaneously make this entire plan, without being explicitly prompted. - The AI has probably already read all the alignment literature and will spontaneously hit upon this plan atleast once, after which it can bootstrap via above.
I want to prank video call people using my face and voice (and the text content itself) generated in realtime.
Strong agree. @Raemon Maybe just have a button that says “I don’t want to debate your AI take and will tap out” that people can use whenever a non-AI conversation ends up steering into an AI conversation.
AI is a religious divide at this point, and people who consistently bring religious debates into casual conversation are correctly booted out of the social group. There is an approved place and time for it.
If you have not already sold any equity you have in frontier labs, I have low trust in you as a person due to obvious conflicts of interest.
I also think it reflects on your poor judgment if you think the capital is worth more than building a reputation as someone who can actually be trusted.
Okay yes. Thanks!
My Hunger Strike (Day 13) was featured by ThePrint, an Indian media house.
I live in India and I’m vegetarian. This is very easy to do because a lot of Hindus are also vegetarian for religious animal welfare reasons. I consume milk and milk products but not egg. I have been low on B12 but not to the point it’s a health hazard and I have supplemented it in past.
I think it’s possible in theory to ethically milk cows, but yes situations in practice differ.
Marc Andreessen and Peter Thiel are taking actions that are pro-human extinction.
If you are playing the game of sucking up to Silicon Valley VCs, it is important you form an independent opinion on the question of extinction risk before you raise the funding.
This type of post has bad epistemics, because people in support of terrorism are probably not going to come and correct you. You will receive no feedback, and your points even if incorrect will get amplified.
I love how polarising this topic is, Currently 9 votes with +3 net result.
Death no. I didn’t say anything about detriment. My life is already clearly worse as a result of this action.
Also the point is to get attention of broader public, not just those at the labs.
Have you reached out to people at Deepseek or Alibaba to make the case for AI extinction risk or propose an alliance between the labs?
For a more specific example, I am currently on hunger strike in protest of ASI, and there are people still willing to debate AI risk with them. I am not yet so polarising that people avoid the topic entirely.
I disagree with this approach on principle (and with lesswrong’s general bias to closed source anything) but I don’t think we can resolve that disagreement quickly right now, sorry.