On catastrophic risk and effective persuasion:
https://us06web.zoom.us/webinar/register/WN_j555baQqRjeWhiefEAQ82Q#/registration
On catastrophic risk and effective persuasion:
https://us06web.zoom.us/webinar/register/WN_j555baQqRjeWhiefEAQ82Q#/registration
So far as the slave carries out immediate work from fear of consequences they are locally aligned with the master’s will.
How did you get respondents? Why are they “nationally representative”?
1/ evidence for these statements?
2/ in what sense is it profitable to throw away food or maintain empty dwellings that is distinct from “maintaining everyone else’s quality of life”?
3/ if the evil is that some people’s needs are not valued enough could that not be remedied by giving them money and making it profitable to meet their needs?
Is martingale different from conservation of expected evidence?
https://www.lesswrong.com/posts/jiBFC7DcCrZjGmZnJ/conservation-of-expected-evidence
With Respect
Given that in more than a third of the cases where GPT and the answer set disagreed you thought GPT was right and the answer set was wrong, did you check for cases where GPT and the answer set agreed on an answer you thought was wrong?
Yours Sincerely
This seems to have stopped in July 2022.
“Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).”
You could also study the distribution of correlation strengths found over the range of correlations tested, possible, seeing how it compares to what would be expected by chance.
skeptical reaction with one expression of support: https://statmodeling.stat.columbia.edu/2023/05/31/jurassic-ai-extinction/
and generally “beware the one of just one study”
In 26 models taken from volumes 21 to 25 of the journal Law and Human Behavior, the highest R-squared -proportion of VARIANCE, not variation, explained was 40% and the second highest 24%
Evidence?
See
https://www.lesswrong.com/posts/j2W3zs7KTZXt2Wzah/how-do-you-feel-about-lesswrong-these-days-open-feedback#sXJxcMhT7t4NmbJpc
about ninth top comment under first answer:
[-]TurnTrout1y211″
I’ve thought about this claim more over the last year. I now disagree. I think that this explanation makes us feel good but ultimately isn’t true.
I can point to several times where I have quickly changed my mind on issues that I have spent months or years considering:
in early 2022, I discarded my entire alignment worldview over the course of two weeks due to Quintin Pope’s arguments. Most of the evidence which changed my mind was comm’d over Gdoc threads. I had formed my worldview over the course of four years of thought, and it crumbled pretty quickly.
In mid-2022, realizing that reward is not the optimization target took me about 10 minutes, even though I had spent 4 years and thousands of hours thinking about optimal policies. I realized while reading an RL paper say “agents are trained to maximize reward”; reflexively asking myself what evidence existed for that claim; and coming back mostly blank. So that’s not quite a comment thread, but still seems like the same low-bandwidth medium.
In early 2023, a basic RL result came out opposite the way which shard theory predicted. I went on a walk and thought about how maybe shard theory was all wrong and maybe I didn’t know what I was talking about. I didn’t need someone to beat me over the head with days of arguments and experimental results. In the end, I came back from my walk and realized I’d plotted the data incorrectly (the predicted outcome did in fact occur).
I think I’ve probably changed my mind on a range of smaller issues (closer to the size of the deceptive alignment case) but have forgotten about them. The presence of example (1) above particularly suggests to me the presence of similar google-doc-mediated insights which happened fast; where I remember one example, probably I have forgotten several more.
To conclude, I think people in comment sections do in fact spend lots of effort to avoid looking dumb, wrong, or falsified, and forget that they’re supposed to be seeking truth.
In part, I think, because the site makes truth-seeking harder by spotlighting monkey-brain social-agreement elements. ”
Also:
https://www.lesswrong.com/w/updated-beliefs-examples-thereof?sortedBy=new
and the implications of Less Wrong having such a tag