catubc

Karma: 159

catubc 27 Feb 2025 6:35 UTC
3 points
0
on: How to Make Superbabies
Great write up!
Why don’t you do this in a mouse first? The whole cycle from birth to phenotype, including complex reasoning (e.g. bayesian inference, causality) can take 6 months.

catubc 7 Feb 2025 6:49 UTC
3 points
0
on: The Field of AI Alignment: A Postmortem, and What To Do About It
Exactly, and thanks for writing this.
I would go further and say that—AI safety is AI dev—and this happened years ago. If we stopped it all now, we’d extend our timelines:
https://www.lesswrong.com/posts/vkzmbf4Mve4GNyJaF/the-case-for-stopping-ai-safety-research

catubc 29 Jul 2024 9:09 UTC
1 point
0
on: Decomposing Agency — capabilities without desires
Interesting read, would be great to see more done in this direction. However,it seems that mind-body dualism is still the prevalent (dare I say “dominant”) mode of understanding human will and consciousness in CS and AI-safety. In my opinion—the best picture we have of human value creation comes from social and psychological sciences—not metaphysics and mathematics—and it would be great to have more interactions with those fields.
For what it’s worth I’ve written a bunch on agency-loss as an attractor in AI/AGI-human interactions.
https://www.lesswrong.com/posts/dDDi9bZm6ELSXTJd9/intent-aligned-ai-systems-deplete-human-agency-the-need-for
And a shorter paper/poster on this at ICML last week: https://icml.cc/virtual/2024/poster/32943

catubc 26 Jul 2024 5:50 UTC
2 points
0
in reply to: Seth Herd’s comment on: The case for stopping AI safety research
Sorry, fixed broken link now.
The problem with “understanding the concept of intent”—is that intent and goal formation are some of the most complex notions in the universe involving genetics, development, psychology, culture and everything in between. We have been arguing about what intent—and correlates like “well-being” mean—for the entire history of our civilization. It looks like we have a good set of no-nos (e.g. read the UN declaration on human rights) - but in terms of positive descriptions of good long term outcomes it gets fuzzy. There we have less guidance, though I guess trans- and post-humanism seems to be a desirable goal to many.

catubc 25 Jul 2024 13:58 UTC
2 points
0
in reply to: Seth Herd’s comment on: The case for stopping AI safety research
Seth. I just spoke about this work at ICML yesterday. Some other similar works:
Eliezers work from way back in 2004. https://intelligence.org/files/CEV.pdf. I haven’t read it in full—but it’s about AIs that interact with human volition—which is what I’m also worried about.
Christiano’s: https://www.lesswrong.com/posts/HBxe6wdjxK239zajf/what-failure-looks-like. This is a lot about slow take offs and AI’s that slowly become unstoppable or unchangeable because they become part of our economic world.
My paper on arxiv is a bit of a long read (GPT-it) : https://arxiv.org/abs/2305.19223 But it tries to show where some of the weak points in human volition and intention generation are—and why we (i.e. “most developers and humanity in general”) still think of human reasoning in a mind-body dualistic framework: i.e. there’s a core to human thought, goal selection and decisoin making—that can never be corrupted or manipulated. We’ve already discovered loads of failure modes—and we weren’t even faced with omnipotent-like opponents. (https://www.sog.unc.edu/sites/www.sog.unc.edu/files/course_materials/Cognitive%20Biases%20Codex.pdf). The other point main point my work makes is that when you apply enough pressure on an aligned AI/AGI to find an optimal solution or “intent” you have for a problem that is too hard to solve—the solution it will eventually find is to change the “intent” of the human.

catubc 23 May 2024 16:20 UTC
11 points
6
in reply to: Garrett Baker’s comment on: The case for stopping AI safety research
Thanks Garrett. There is obviously nuance that a 1min post can’t get at. I am just hoping for at least some discussion to be had on this topic. There seems to be little to none now.

The case for stopping AI safety research

catubc23 May 2024 15:55 UTC

53 points

38 comments1 min readLW link

catubc 2 Jun 2023 9:16 UTC
7 points
0
in reply to: Charlie Steiner’s comment on: Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety
Thanks for the comment. I agree broadly of course, but the paper says more specific things. For example, agency needs to be prioritized, probably taken outside of standard optimization, otherwise decimating pressure is applied on other concepts including truth and other “human values”. The other part is a empirical one, also related to your concern, namely, human values are quite flexible and biology doesn’t create hard bounds / limits on depletion. If you couple that with ML/AI technologies that will predict what we will do next—then approaches that depend on human intent and values (broadly) are not as safe anymore.

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

catubc31 May 2023 21:18 UTC

26 points

4 comments11 min readLW link

catubc 11 Apr 2023 11:01 UTC
10 points
3
on: Why Simulator AIs want to be Active Inference AIs
Thanks so much for writing this, I think it’s a much needed—perhaps even a bit late contribution connecting static views of GPT-based LLMs to dynamical systems and predictive processing. I do research on empirical agency and it’s still surprises me how little the AI-safety community touches on this central part of agency—namely that you can’t have agents without this closed loop.
I’ve been speculating a bit (mostly to myself) about the possibility that “simulators” are already a type of organism—given that appear to do active inference—which is the main driving force for nervous system evolution. Simulators seem to live in this inter-dimensional paradigm where (i) on one hand during training they behave like (sensory-systems) agents because they learn to predict outcomes and “experience” the effect of their prediction; but (ii) during inference/prediction they generally do not receive feedback. As you point out, all of this speculation may be moot as many are moving pretty fast towards embedding simulators and giving them memory etc.
What is your opinion on this idea of “loosening up” our definition of agents? I spoke to Max Tegmark a few weeks ago and my position is that we might be thinking of organisms from a time-chauvinist position—where we require the loop to be closed in a fast fashion (e.g. 1sec for most biological organisms).

catubc 17 Mar 2023 16:18 UTC
1 point
0
in reply to: Erik Jenner’s comment on: Red-teaming AI-safety concepts that rely on science metaphors
Thanks for the comment Erik (and taking the time to read the post).
I generally agree with you re: the inner/outer alignment comment I made. But the language I used and that others also use continues to be vague; the working def for inner-alignment on lesswrong.com is whether an “optimizer is the production of an outer aligned system, then whether that optimizer is itself aligned”. I see little difference—but I could be persuaded otherwise.
My post was meant to show that it’s pretty easy to find significant holes in some of the most central concepts researched now. This includes eclectic, but also mainstream research including the entire latent-knowledge approach which seems to make significant assumptions about the relationship between human decision making or intent and super-human AGIs. I work a lot on this concept and hold (perhaps too) many opinions.
The tone might not have been ideal due to time limits. Sorry if that was off putting.
I was also trying to make the point that we do not spend enough time shopping our ideas around with especially basic science researchers before we launch our work. I am a bit guilty of this. And I worry a lot that I’m actually contributing to capabilities research rather than long-term AI-safety. I guess in the end I hope for a way for AI-safety and science researchers to interact more easily and develop ideas together.

catubc 17 Mar 2023 16:01 UTC
2 points
0
in reply to: baturinsky’s comment on: Red-teaming AI-safety concepts that rely on science metaphors
Thanks for the comment. Indeed, if we could agree on capping, or slowing down, that would be a promising approach.

Red-teaming AI-safety concepts that rely on science metaphors

catubc16 Mar 2023 6:52 UTC

5 points

4 comments5 min readLW link

catubc 8 Feb 2023 6:29 UTC
1 point
−5
on: Focus on the places where you feel shocked everyone’s dropping the ball
Thank you so much for this effectiveness focused post. I thought I would add another perspective, namely “against the lone wolf” approach, i.e. that AI-safety will come down to one person, or a few persons, or an elite group of engineers somewhere. I agree for now there are some individuals who are doing more conceptual AI-framing than others, but in my view I am “shocked that everyone’s dropping the ball” by putting up walls and saying that general public is not helpful. Yes, they might not be helpful now, but we need to work on this!… Maybe someone with the right skill will come along :)
I also view academia as almost hopeless (it’s where I work). But it feels that if a few of us can get some stable jobs/positions/funding—we can start being politically active within academia and the return on investment there could be tremendous.

catubc 5 Feb 2023 14:43 UTC
LW: 2 AF: 1
0
AF
on: A newcomer’s guide to the technical AI safety field
Hi Chin. Thanks for writing this review, it seems like a well-needed and timed article—at least from my perspective as I was looking for something like this. In particular, I’m trying to frame my research interest relative to AI-safety field, but as you point out this is still too early.
I am wondering if you have any more insights for how you came up with your diagram above? In particular, are there any more peer-reviewed articles, or arXiv papers like Amodei et al (https://arxiv.org/abs/1606.06565) that you relied on? For example, I don’t understand why seed AI is such a critical concept in AI literature (is it even published), as it seems related to the concept of viruses which are an entire field in CS. Also, why is brain-inspired AI a category in your diagram, as far as I know that story isn’t published/peer reviewed or have signifcant traction?
I imagine I’m in the same place you were before you wrote this article, and I’d love to get some more insight about how you ended up with this layout.
Thank you so much,
catubc

catubc 18 Nov 2022 7:55 UTC
2 points
0
in reply to: Jonathan Moregård’s comment on: AGIs may value intrinsic rewards more than extrinsic ones
Thanks for the reply Jonathan. Indeed I’m also a bit skeptical that our innate drives (whether the ones from SDT theory or others) are really non-utility maximizing. But in some cases they do appear so.
One possibility is that they were driven to evolve for utility maximization but have now broken off completely and serve some difficult-to-understand purpose. I think there are similar theories of how consciousness developed—i.e. that it evolved as a by-effect/side-effect of some inter-organism communication—and now plays many other roles.

catubc 18 Nov 2022 7:51 UTC
1 point
0
in reply to: Roman Leventov’s comment on: AGIs may value intrinsic rewards more than extrinsic ones
Hi Roman.
First of all, thank you so much for reading and taking the time to respond.
I don’t have the time—or knowledge—to respond to everything, but from your response, I worry that my article partially missed the target. I’m trying to argue that humans may not be just—utility—maximizers and that a large part of being human (or maybe any organism?) is to just enjoy the world via some quasi-non-rewarded types of behavior. So there’s no real utility for some or perhaps the most important things that we value. Seeking out “surprising” results does help AIs and humans learn, and seeking out information as well. But I’m not sure human psychology supports human intrinsic rewards as necessarily related to utility maximization. I do view survival and procreation as genetically encoded drives—but they are not the innate drives I described above. It’s not completely clear what we gain when we enjoy being in the world, learning, socializing.
I’m aware of Friston’s free energy principle (it was one of the first things I looked at in graduate school). I personally view most of it as non-falsifiable, but I know that many have used to derive useful interpretation of brain function.
Also I quickly googled LeCun’s proposal, and his conception of future AI, and his intrinsic motivation module is largely about boot-strapped goals—albeit human pro-social ones.
The ultimate goal of the agent is minimize the intrinsic cost over the long run. This is where basic behavioral drives and intrinsic motivations reside. The design of the intrinsic cost module determines the nature of the agent’s behavior. Basic drives can be hard-wired in this module. This may include feeling \good” (low energy) when standing up to motivate a legged robot to walk, when influencing the state of the world to motivate agency, when interacting with humans to motivate social behavior, when perceiving joy in nearby humans to motivate empathy, when having a full energy supplies (hunger/satiety), when experiencing a new situation to motivate curiosity and exploration, when fulfilling a particular program, etc
I would say that my question—which I did not answer in the post—is whether we can design AIs that don’t seek to maximize some utility or minimize some cost? What would that look like? Some computer-cluster just spinning up to do computations for no effective purpose?
I don’t really have an answer here.

AGIs may value intrinsic rewards more than extrinsic ones

catubc17 Nov 2022 21:49 UTC

8 points

6 comments4 min readLW link

LLMs may capture key components of human agency

catubc17 Nov 2022 20:14 UTC

27 points

0 comments4 min readLW link

catubc 26 Sep 2022 6:34 UTC
2 points
1
in reply to: Nathan Helm-Burger’s comment on: Agency engineering: is AI-alignment “to human intent” enough?
Thanks Nathan. I understand that most people working on technical AI-safety research focus on this specific problem, namely of aligning AI—and less on misuse. I don’t expect a large ai-misuse audience here.
Your response—that “truly-aligned-AI” would not change human intent—was also suggested by other AI researchers. But this doesn’t address the problem: human intent is created from (and dependent on) societal structures. Perhaps I failed to make this clearer. But I was trying to suggest we lack an understanding of the genesis of human actions/intentions or goals—and thus cannot properly specify how human intent is constructed—and how to protect it from interference/manipulation. A world imbued with AI-techs will change the societal landscape significantly and potentially for the worse. I think that many view human “intention” as a property of humans that acts on the world and is somehow isolated or protected from the physical and cultural world (see Fig 1a). But the opposite is actually true: in humans intent and goals are likely caused significantly more by society than biology.
The optimist statement: The best way I can interpret “truly-aligned-AI won’t change human agency” is to say that “AI” will—help humans—solve the free will problem and will then “work with us” to redesign what human goals should be. But this later statement is a very tall-order (a United Nations statement that perhaps will never see the light of day...).

catubc

The case for stop­ping AI safety research

In­tent-al­igned AI sys­tems de­plete hu­man agency: the need for agency foun­da­tions re­search in AI safety

Red-team­ing AI-safety con­cepts that rely on sci­ence metaphors

AGIs may value in­trin­sic re­wards more than ex­trin­sic ones

LLMs may cap­ture key com­po­nents of hu­man agency

The case for stopping AI safety research

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

Red-teaming AI-safety concepts that rely on science metaphors

AGIs may value intrinsic rewards more than extrinsic ones

LLMs may capture key components of human agency