jeffreycaruso

Karma: −10

jeffreycaruso 16 Jan 2024 16:54 UTC
2 points
2
on: The case for training frontier AIs on Sumerian-only corpus
What do you estimate the time lag to be if AI startup “Ishtar” utilized your proposed safety method while every other AI startup that Ishtar was competing with in the medical sector, didn’t?
It also seems to me like medical terminology would be extremely hard to translate into an ancient language where those words weren’t known. The fine-tuning to correct for those errors would have to be calculated into that time lag as well.
Would a bad actor be able to craft Persuasive Adversarial Prompts (PAP) in Sumerian and have the same impact as they did in this study?
“Most traditional AI safety research has approached AI models as machines and centered on algorithm-focused attacks developed by security experts. As large language models (LLMs) become increasingly common and competent, non-expert users can also impose risks during daily interactions. This paper introduces a new perspective to jailbreak LLMs as human-like communicators, to explore this overlooked intersection between everyday language interaction and AI safety. Specifically, we study how to persuade LLMs to jailbreak them. First, we propose a persuasion taxonomy derived from decades of social science research. Then, we apply the taxonomy to automatically generate interpretable persuasive adversarial prompts (PAP) to jailbreak LLMs. Results show that persuasion significantly increases the jailbreak performance across all risk categories: PAP consistently achieves an attack success rate of over on Llama 2-7b Chat, GPT-3.5, and GPT-4 in trials, surpassing recent algorithm-focused attacks. On the defense side, we explore various mechanisms against PAP and, found a significant gap in existing defenses, and advocate for more fundamental mitigation for highly interactive LLMs.”
Source: https://arxiv.org/abs/2401.06373

jeffreycaruso 17 Jan 2024 2:43 UTC
13 points
7
on: Open Thread – Winter 2023/2024
Hello, I came across this forum while reading an AI research paper where the authors quoted from Yudkowsky’s “Hidden Complexity of Wishes.” The linked source brought me here, and I’ve been reading some really exceptional articles ever since.
By way of introduction, I’m working on the third edition of my book “Inside Cyber Warfare” and I’ve spent the last few months buried in AI research specifically in the areas of safety and security. I view AGI as a serious threat to our future for two reasons. One, neither safety nor security have ever been prioritized over profits by corporations dating all the way back to the start of the industrial revolution. And two, regulation has only ever come to an industry after a catastrophe or a significant loss of life has occurred, not before.
I look forward to reading more of the content here, and engaging in what I hope will be many fruitful and enriching discussions with LessWrong’s members.

jeffreycaruso 11 Feb 2024 2:45 UTC
3 points
0
on: Believing In
I’ve never seen prediction used in reference to a belief before. I’m curious to hear how you arrived at the conclusion that a belief is a prediction.
For me, a belief is something that I cannot prove but I suspect is true whereas a prediction is something that is based in research. Philip Tetlock’s book Superforcasting is a good example. Belief has little to nothing to do with making accurate forecasts whereas confronting one’s biases and conducting deep research are requirements.
I think it’s interesting how the word “belief” can simultaneously reflect certainty and uncertainty depending upon how it’s used in a sentence. For example, if I ask a project manager when her report will be finished, and she responds “I believe it will be finished on Tuesday”, I’d press the issue because that suggests to me that it might not be finished on Tuesday.
On the other hand, if I ask my religious brother-in-law if he believes in God, he will answer yes with absolute certainty.
My difficulty probably lies in part with my own habits around when and how I’m accustomed to using the word “believe” versus your own.

jeffreycaruso 11 Feb 2024 2:49 UTC
1 point
0
in reply to: trevor’s comment on: Open Thread – Winter 2023/2024
Thanks, Trevor. I’ve bookmarked that link. Just yesterday I started creating a short list of terms for my readers so that link will come in handy.

jeffreycaruso 11 Feb 2024 4:54 UTC
2 points
−4
in reply to: AnnaSalamon’s comment on: Believing In
I just read the post that you linked to. He used the word “prediction” one time in the entire post so I’m having trouble understanding how that was mean’t to be an answer to my question. Same with that it’s a cornerstone of LessWrong, which, for me, is like asking a Christian why they believe in God, and they answer, because the Bible tells me so.
Is a belief a prediction?
If yes, and a prediction is an act of forecasting, then there must be a way to know if your prediction was correct or incorrect.
Therefore, maybe one requirement for a belief is that it’s testable, which would eliminate all of our beliefs in things unseen.
Maybe there are too many meanings assigned to just that one word—belief. Perhaps instead of it being a verb, it should be a preposition attached to a noun; i.e., a religious belief, a financial belief, etc. Then I could see a class of beliefs that were predictive versus a different class of beliefs that were matters of faith.

jeffreycaruso 18 Feb 2024 15:11 UTC
1 point
on: Chapter 4: The Efficient Market Hypothesis
Technically, Harry didn’t earn his wealth by defeating Voldemort. His mother earned it by giving her life to protect him. It was the sacrifice born of love that defeated the Killing Curse, one of the few ways it could be defeated. Perhaps that’s an example of Fundamental Attribution Error.

jeffreycaruso 4 Mar 2024 4:46 UTC
1 point
0
on: Lying is Cowardice, not Strategy
I don’t see the practical value of a post that starts off with conjecture rather than reality; i.e., “In a saner world....”
You clearly wish that things were different, that investors and corporate executives would simply stop all progress until ironclad safety mechanisms were in place, but wishing doesn’t make it so.
Isn’t the more pressing problem what can be done in the world that we have, rather than in a world that we wish we had?

jeffreycaruso 6 Mar 2024 3:31 UTC
1 point
0
in reply to: Brendan Long’s comment on: Contingency Plans For Rogue AIs
It would be considerably more difficult, however hacking wasn’t really the behavior that I had in mind. Metzinger’s BAAN argument goes to the threat of human extinction so I was more curious regarding any research being done regarding how to shut an AI system down with no possibility of a reboot.

jeffreycaruso 10 Mar 2024 1:23 UTC
1 point
0
on: Frequent arguments about alignment
Are there other forums for AI Alignment or AI Safety and Security besides this one where your article could be published for feedback from perspectives that haven’t been shaped by Rationalist thinking or EA?

jeffreycaruso 13 Mar 2024 14:53 UTC
−17 points
−33
on: jeffreycaruso’s Shortform
It seems to me that Effective Altruism uses a theoretical negative outcome (an extinction-level event) as motivation for action in a very similar way to how Judeo-Christian religions use another theoretical negative outcome (your unsaved soul going to Hell for eternal torment) as motivation for action.
Both have high priests who establish dogma, and legions of believers who evangelize and grow the base.
Both spend vast amounts of money to persuade others to adopt their belief system.
There’s nothing new there regarding how religions work, but for a philosophical belief that’s supposed to be grounded in rational decision-making, there’s a giant looming gap in the reasoning chain when it comes to AI posing an existential risk to humanity.
Unless I’m missing something.
Is there any proof that I haven’t read yet which demonstrates that AGI or Superintelligence will have the capability to go rogue and bring about Armagadden?

jeffreycaruso 13 Mar 2024 16:34 UTC
1 point
−1
in reply to: Zack_M_Davis’s comment on: jeffreycaruso’s Shortform
Thank you for the link to that paper, Zack. That’s not one that I’ve read yet.
And you’re correct that I raised two separate issues. I’m interested in hearing any responses that members of this community would like to give to either issue.

jeffreycaruso 13 Mar 2024 16:46 UTC
3 points
0
in reply to: Chris_Leong’s comment on: jeffreycaruso’s Shortform
My apologies for not being clear in my Quick Take, Chris. As Zach pointed out in his reply, I posed two issues.
The first being an obvious parallel for me between EA and Judeo-Christian religions. You may or may not agree with me, which is fine. I’m not looking to convince anyone of my point-of-view. I was merely interested in seeing if others here had a similar POV.
The second issue I raised was what I saw as a failure in the reasoning chain where you go from Deep Learning to Consciousness to an AI Armageddon. Why was that leap in faith so compelling to people?
I don’t see either of those questions as not being in the interest of the “public good”, but perhaps you just said that because my first attempt wasn’t clear. Hopefully, I’ve remedied that with this answer.

jeffreycaruso 14 Mar 2024 4:11 UTC
3 points
0
in reply to: Zack_M_Davis’s comment on: jeffreycaruso’s Shortform
I looked at the paper you recommended Zack. The specific section having to do with “how” AGI is developed (para 1.2) skirts around the problem.
“We assume that AGI is developed by pretraining a single large foundation model using selfsupervised learning on (possibly multi-modal) data [Bommasani et al., 2021], and then fine-tuning it using model-free reinforcement learning (RL) with a reward function learned from human feedback [Christiano et al., 2017] on a wide range of computer-based tasks.4 This setup combines elements of the techniques used to train cutting-edge systems such as GPT-4 [OpenAI, 2023a], Sparrow [Glaese et al., 2022], and ACT-1 [Adept, 2022]; we assume, however, that 2 the resulting policy goes far beyond their current capabilities, due to improvements in architectures, scale, and training tasks. We expect a similar analysis to apply if AGI training involves related techniques such as model-based RL and planning [Sutton and Barto, 2018] (with learned reward functions), goal-conditioned sequence modeling [Chen et al., 2021, Li et al., 2022, Schmidhuber, 2020], or RL on rewards learned via inverse RL [Ng and Russell, 2000]—however, these are beyond our current scope.”
Altman has recently said in a speech that continuing to do what has led them to GPT4 is probably not going to get to AGI. “”Let’s use the word superintelligence now, as superintelligence can’t discover novel physics, I don’t think it’s a superintelligence. Training on the data of what you know, teaching to clone the behavior of humans and human text, I don’t think that’s going to get there. So there’s this question that has been debated in the field for a long time: what do we have to do in addition to a language model to make a system that can go discover new physics?”
https://the-decoder.com/sam-altman-on-agi-scaling-large-language-models-is-not-enough/
I think it’s pretty clear that no one has a clear path to AGI, nor do we know what a superintelligence will do, yet the Longtermist ecosystem is thriving. I find that curious, to say the least.

jeffreycaruso 19 Mar 2024 2:29 UTC
−1 points
0
on: Statement on AI Extinction—Signed by AGI Labs, Top Academics, and Many Other Notable Figures
What aspect of AI risk is deemed existential by these signatories? I doubt that they all agree on that point. Your publication “An Overview of Catastrophic AI Risks” lists quite a few but doesn’t differentiate between theoretical and actual.
Perhaps if you were to create a spreadsheet with a list of each of the risks mentioned in your paper but with the further identification of each as actual or theoretical, and ask each of those 300 luminaries to rate them in terms of probability, then you’d have something a lot more useful.

jeffreycaruso 19 Mar 2024 3:04 UTC
1 point
0
in reply to: RHollerith’s comment on: Statement on AI Extinction—Signed by AGI Labs, Top Academics, and Many Other Notable Figures
Which makes it an existential risk.
“An existential risk is any risk that has the potential to eliminate all of humanity or, at the very least, kill large swaths of the global population.”—FLI

jeffreycaruso 19 Mar 2024 3:11 UTC
1 point
0
in reply to: Dweomite’s comment on: Statement on AI Extinction—Signed by AGI Labs, Top Academics, and Many Other Notable Figures
That’s a good example of my point. Instead of a petition, a more impactful document would be a survey of risks and their probability of occurring in the opinion of these notable public figures.
In addition, there should be a disclaimer regarding who has accepted money from Open Philanthropy or any other EA-affiliated non-profit for research.

jeffreycaruso 8 Apr 2024 17:45 UTC
3 points
0
on: A Shutdown Problem Proposal
What happens if you shut down power to the AWS or Azure console powering the Foundation model? Wouldn’t this be the easiest way to test various hypotheses associated with the Shutdown Problem in order to either verify it or reject it as a problem not worth sinking further resources into?

Exploring the Esoteric Pathways to AI Sentience (Part One)

jeffreycaruso27 Apr 2024 1:02 UTC

−13 points

7 comments2 min readLW link

jeffreycaruso 27 Apr 2024 13:50 UTC
1 point
0
in reply to: Gunnar_Zarncke’s comment on: Exploring the Esoteric Pathways to AI Sentience (Part One)
Thank you for asking.
To generalize across disciplines, a critical aspect of human-level artificial intelligence, requires the ability to observe and compare. This is a feature of sentience. All sentient beings are conscious of their existence. Non-sentient conscious beings exist, of course, but none who could pass a Turing test or a Coffee-making test. That requires both sentience and consciousness.

jeffreycaruso 27 Apr 2024 14:59 UTC
1 point
0
in reply to: Gunnar_Zarncke’s comment on: Exploring the Esoteric Pathways to AI Sentience (Part One)
Good list. I think I’d use a triangle to organize them. Have consciousness at the base, then sentience, then drawing from your list, phenomenal consciousness, followed by Intentionality?

jeffreycaruso

Ex­plor­ing the Eso­teric Path­ways to AI Sen­tience (Part One)

Exploring the Esoteric Pathways to AI Sentience (Part One)