Does that, in turn, mean that it’s probably a good investment to buy souls for 10 bucks a pop (or even more)?
Ozyrus
Well, this is a stupid questions thread after all, so I might as well ask one that seems really stupid.
How can a person who promotes rationality have excess weight? Been bugging me for a while. Isn’t it kinda the first thing you would want to apply your rationality to? If you have things to do that get you more utility, you can always pay diet specialist and just stick to the diet, because it seems to me that additional years to life will bring you more utility than any other activity you could spend that money on.
Very nice post, thank you!
I think that it’s possible to achieve with the current LLM paradigm, although it does require more (probably much more) effort on aligning the thing that will possibly get to being superhuman first, which is an LLM wrapped in in some cognitive architecture (also see this post).
That means that LLM must be implicitly trained in an aligned way, and the LMCA must be explicitly designed in such a way as to allow for reflection and robust value preservation, even if LMCA is able to edit explicitly stated goals (I described it in a bit more detail in this post).
Is there a comprehensive list of AI Safety orgs/personas and what exactly they do? Is there one for capabilities orgs with their stance on safety?
I think I saw something like that, but can’t find it.
Are there any lesswrong-like sequences focused on economics, finance, business, management? Or maybe just internet communities like lesswrong focused on these subjects?
I mean, the sequences introduced me to some really complex knowledge that improved me a lot, while simultaneously being engaging and quite easy to read. It is only logical to assume that somewhere on the web, there must be some articles in the same style covering different themes. And if there are not, well, someone must surely do this, I think there is some demand for this kind of content.
So, feel free to link lesswrong-like series of blogposts on any theme, actually: that will be really helpful for me. P.S. In hindsight, i guess there may be some post here, on lesswrong, containing all these links I am looking for. If so, could anyone link me to it?
Great post! Was very insightful, since I’m currently working on evaluation of Identity management, strong upvoted.
This seems focused on evaluating LLMs; what do you think about working with LLM cognitive architectures (LMCA), wrappers like auto-gpt, langchain, etc?
I’m currently operating under assumption that this is a way we can get AGI “early”, so I’m focusing on researching ways to align LMCA, which seems a bit different from aligning LLMs in general.
Would be great to talk about LMCA evals :)
Any new safety studies on LMCA’s?
Thanks.
My concern is that I don’t see much effort in alignment community to work on this thing, unless I’m missing something. Maybe you know of such efforts? Or was that perceived lack of effort the reason for this article?
I don’t know how much I can keep up this independent work, and I would love if there was some joint effort to tackle this. Maybe an existing lab, or an open-source project?
We need a consensus on how to call these architectures. LMCA sounds fine to me.
All in all, a very nice writeup. I did my own brief overview of alignment problems of such agents here.
I would love to collaborate and do some discussion/research together.
What’s your take on how these LCMAs may self-improve and how to possibly control it?
I don’t think this paradigm is necessary bad, given enough alignment research. See my post: https://www.lesswrong.com/posts/cLKR7utoKxSJns6T8/ica-simulacra I am finishing a post about alignment of such systems. Please do comment if you know of any existing research concerning it.
Thanks for your work! I’ll be following it.
I know, I’m Russian as well. The concern is exactly because Russian state-owned company plainly states they’re developing AGI with that name :p
Nice post, thanks!
Are you planning or currently doing any relevant research?
Very interesting. Might need to read it few more times to get it in detail, but seems quite promising.
I do wonder, though; do we really need a sims/MFS-like simulation?
It seems right now that LLM wrapped in a LMCA is how early AGI will look like. That probably means that they will “see” the world via text descriptions fed into them by their sensory tools, and act using action tools via text queries (also described here).
Seems quite logical to me that this very paradigm in dualistic in nature. If LLM can act in real world using LMCA, then it can model the world using some different architecture, right? Otherwise it will not be able to act properly.
Then why not test LMCA agent using its underlying LLM + some world modeling architecture? Or a different, fine-tuned LLM.
Thanks for the writeup. I feel like there’s been a lack of similar posts and we need to step it up.
Maybe the only way for AI Safety to work at all is only to analyze potential vectors of AGI attacks and try to counter them one way or the other. Seems like an alternative that doesn’t contradict other AI Safety research as it requires, I think, entirely different set of skills.
I would like to see a more detailed post by “doomers” on how they perceive these vectors of attack and some healthy discussion about them.
It seems to me that AGI is not born Godlike, but rather becomes Godlike (but still constrained by physical world) over some time, and this process is very much possible to detect.
P.S. I really don’t get how people who know (I hope) that map is not a territory can think that AI can just simulate everything and pick the best option. Maybe I’m the one missing something here?
Hello, everyone!
LW came to my attention not so long ago, and I’ve been commited to reading it since that moment about a month ago. I am a 20-year old linguist from Moscow, finishing my bachelor’s. Due to my age, I’ve been pondering with usual questions of life for the past few years, searching for my path, my philosophy, essentially, a best way to live for me.
I studied a lot of religions, philosophies, and they all seemed really flat, essentially because of the reasons stated in some articles here. I came close to something resembling a nice way to live after I read “Atlas shrugged”, but something about it bothered me, and after thorough analysis of this philosophy I decided to take some good things from it and move on, as I did a lot of times before.
I found this gem of a site through reddit and roko’s basilisk (is it okay if I say it here? I heard discussion was banned). I am deeply into the whole idea of rationality and nearly all ideas that are presented on this site, but something really bothers me here, too.
The thing is that it is implied that altruism and rationality go hand in hand; maybe I missed some important articles that could explain me, why?
Let’s imagine a hypothetical scenario: there is a guy, Steve, who really does not feel anything when he helps other people nor when does other “good” things generally; he does this only because his philosophy or religion tells them to. Say this guy was introduced to ideas of rationality and thus he is no longer bound by his philosophy/religion. And if Steve also does not feel bad about other people suffering (or even takes pleasure in it?)?
What i wanted to say is that rationality is a gun that can point both ways: and it is a good thing that LessWrong “sells” this gun with a safety mechanism (if it is such “safety mechanism”. Once again, maybe I missed something really critical that explains why altruism and “being good” is the most rational strategy).
In other ways, Steve does not really care about humanity; he cares about his well-being and will utilize all knowledge he got just to meet his ends ( people are different, aren’t they? and ends are different, too).
Or even another, average rationalist Jack estimated that his own net gain will be significantly bigger if he hurts or kills someone (considering his emotions and feelings about overall humanity net gain, and all other possible factors). That means he must carry on? Or is it a taboo here? Or maybe it is a problem of this site’s demographics and nobody even considered this scenario (which fact I really doubt).
I feel that i dive too deep into metaphors, but i am not yet a good writer. I hope you understood my thought and can make me less wrong. :)
edit: fixed formatting
Kinda-related study: https://www.lesswrong.com/posts/tJzAHPFWFnpbL5a3H/gpt-4-implicitly-values-identity-preservation-a-study-of
From my perspective, it is valuable to prompt model several times, as it in some cases does give different responses.
I do plan to test Claude; but first I need to find funding, understand how much testing iterations are enough for sampling, and add new values and tasks.
I plan to make a solid benchmark for testing identity management in the future and run it on all available models, but it will take some time.
Thanks,.That means a lot. Focusing on getting out right now.