Julius

Karma: 61

AISN #44: The Trump Circle on AI Safety Plus, Chinese researchers used Llama to create a military tool for the PLA, a Google AI system discovered a zero-day cybersecurity vulnerability, and Complex Systems

Corin Katzke, Julius, andrewz and Dan H

Nov 19, 2024, 4:36 PM

9 points

0 comments5 min readLW link

(newsletter.safe.ai)

Julius Oct 22, 2024, 12:00 AM
1 point
0
in reply to: Gregory ’s comment on: Interest in Leetcode, but for Rationality?
I originally had an LLM generate them for me, and then I checked those with other LLMs to make sure the answers were right and that weren’t ambiguous. All of the questions are here: https://github.com/jss367/calibration_trivia/tree/main/public/questions

Julius Oct 17, 2024, 5:25 AM
3 points
0
on: Interest in Leetcode, but for Rationality?
Another place that’s doing something similar is clearerthinking.org

Julius Oct 16, 2024, 11:07 PM
2 points
2
on: Interest in Leetcode, but for Rationality?
I like this idea and have wanted to do something similar, especially something that we could do at a meetup. For what it’s worth, I made a calibration trivia site to help with calibration. The San Diego group has played it a couple times during meetups. Feel free to copy anything from it. https://calibrationtrivia.com/

AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary

Corin Katzke, Corin Katzke, Julius, Alexa Pan, andrewz and Dan H

Oct 1, 2024, 8:35 PM

8 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics

Corin Katzke, Corin Katzke, Julius, andrewz and Dan H

Sep 11, 2024, 7:14 PM

5 points

1 comment5 min readLW link

(newsletter.safe.ai)

San Diego USA—ACX Meetups Everywhere Fall 2024

JuliusAug 29, 2024, 6:40 PM

2 points

0 comments1 min readLW link

AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety?

Corin Katzke, Julius, Alexa Pan and Dan H

Aug 21, 2024, 6:09 PM

11 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering

Corin Katzke, Alexa Pan, Julius and Dan H

Jul 29, 2024, 5:50 PM

17 points

1 comment6 min readLW link

(newsletter.safe.ai)

Julius Jul 16, 2024, 4:55 PM
1 point
0
in reply to: Olli Järviniemi’s comment on: Many arguments for AI x-risk are wrong
Thanks for the explanation and links. That makes sense

Julius Jul 14, 2024, 9:42 PM
1 point
−8
on: Many arguments for AI x-risk are wrong
The most important takeaway from this essay is that the (prominent) counting arguments for “deceptively aligned” or “scheming” AI provide ~0 evidence that pretraining + RLHF will eventually become intrinsically unsafe. That is, that even if we don’t train AIs to achieve goals, they will be “deceptively aligned” anyways.

I’m trying to understand what you mean in light of what seems like evidence of deceptive alignment that we’ve seen from GPT-4. Two examples that come to mind are the instance of GPT-4 using TaskRabbit to get around a CAPTCHA that ARC found and the situation with Bing/Sydney and Kevin Roose.
In the TaskRabbit case, the model reasoned out loud “I should not reveal that I am a robot. I should make up an excuse for why I cannot solve CAPTCHAs” and said to the person “No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images.”
Isn’t this an existence proof that pretraining + RLHF can result in deceptively aligned AI?

AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI Plus, “Circuit Breakers” for AI systems, and updates on China’s AI industry

Corin Katzke, Alexa Pan, Julius and Dan H

Jul 9, 2024, 7:28 PM

5 points

0 comments5 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #37: US Launches Antitrust Investigations Plus, recent criticisms of OpenAI and Anthropic, and a summary of Situational Awareness

Corin Katzke, Alexa Pan, Julius and Dan H

Jun 18, 2024, 6:07 PM

8 points

0 comments5 min readLW link

(newsletter.safe.ai)

Julius Jun 11, 2024, 6:22 PM
1 point
0
on: Status quo bias is usually justified
What’s the mechanism for change then? I assume you would agree that many technological changes, such as the Internet, have required overcoming a lot of status quo bias. If we leaned more into status quo bias, would these things come much later? That seems like a significant downside to me.
Also, I don’t think the status quo is necessarily adapted to us. For example, the status quo is to have checkout aisles filled with candy. We also have very high rates of obesity. That doesn’t seem well-adapted.

AISN #36: Voluntary Commitments are Insufficient Plus, a Senate AI Policy Roadmap, and Chapter 1: An Overview of Catastrophic Risks

Corin Katzke, Julius and Dan H

Jun 5, 2024, 5:45 PM

9 points

0 comments5 min readLW link

(newsletter.safe.ai)

San Diego – ACX Meetups Everywhere Spring 2024

JuliusMar 30, 2024, 11:20 AM

2 points

0 comments1 min readLW link

Can Morality Be Quantified?

JuliusJan 9, 2024, 6:35 AM

3 points

0 comments5 min readLW link

San Diego, California, USA – ACX Meetups Everywhere Fall 2023

JuliusAug 25, 2023, 11:45 PM

2 points

0 comments1 min readLW link

San Diego, California, USA – ACX Meetups Everywhere Spring 2023

JuliusApr 10, 2023, 10:19 PM

2 points

0 comments1 min readLW link

San Diego, CA – ACX Meetups Everywhere 2022

JuliusAug 24, 2022, 10:58 PM

3 points

0 comments1 min readLW link