In France, EffiSciences is looking for new members and interns.
Sounds like those people are victim of a salt-in-pasta-water fallacy.
It’s also very old-fashioned. Can’t say I’ve ever heard anyone below 60 say “pétard” unironically.
You might also assign different values to red-choosers and blue-choosers (one commenter I saw said they wouldn’t want to live in a world populated only by people who picked red) but I’m going to ignore that complication for now.
Roko has also mentioned they think people choose blue for being bozos and I think it’s fair to assume from their comments that they care less about bozos than smart people.I’m very interested in seeing the calculations where you assign different utilities to people depending on their choice (and possibly, also depending on yours, like if you only value people who choose like you).
I mean, as an author you can hack through them like butter; it is highly unlikely that out of all the characters you can write, the only ones that are interesting will all generate interesting content iff (they predict) you’ll give them value (and this prediction is accurate).I strongly suspect the actual reason you’ll spend half of your post’s value on buying ads for Olivia (if in fact you do that, which is doubtful as well) is not that (begin proposition) she would only accept this trade if you did that because- she can predict your actions (as in, you wrote her as being unable to act in another manner than being able to predict your actions)- she predicts you’ll do that (in exchange for providing you with a fun story)(end proposition).I suspect that your actual reason is more like staying true to your promise, making a point, having fun and other such things.
I can imagine acausally trading with humans gone beyond the cosmological horizon, because our shared heritage would make a lot of the critical flaws in the post go away.
This is mostly wishful thinking.You’re throwing away your advantages as an author to bargain with fictionally smart entities. You can totally void the deal with Olivia and she can do nothing about it because she’s as dumb as you write her to be.Likewise, the author writing about space warring aliens writing about giant cube-having humans could just consider the aliens that have space wars without consideration for humans at all; you haven’t given enough detail for the aliens’ modelization of the humans be precise enough that their behavior must depend on it.Basically, you’re creating characters that are to you as you are to a superintelligence, but dumb yourself down to their level for the fun of trading acausally. This is not acausal trading, because you are not actually on their level and their decisions do not in fact depend on reliably predicting you’ll cooperate in the trade.This is just fiction.
For instance, a money-maximising trade-bot AI could be perfectly safe if it notices that money, in its initial setting, is just a proxy for humans being able to satisfy their preferences.
There is a critical step missing here, which is when the trade-bot makes a “choice” between maximising money or satisfying preferences.At this point, I see two possibilities:
Modelling the trade-bot as an agent does not break down: the trade-bot has an objective which it tries to optimize, plausibly maximising money (since that is what it was trained for) and probably not satisfying human preferences (unless it had some reason to have that has an objective). A comforting possibility is that it is corrigibly aligned, that it optimizes for a pointer to its best understanding of its developers. Do you think this is likely? If so, why?
An agentic description of the trade-bot is inadequate. The trade-bot is an adaptation-executer, it follows shards of value, or something. What kind of computation is it making that steers it towards satisfying human preferences?
So I’d be focusing on “do the goals stay safe as the AI gains situational awareness?”, rather than “are the goals safe before the AI gains situational awareness?”
This is a false dichotomy. Assuming that when the AI gains situational awareness, it will optimize for its developers’ goals, alignment is already solved. Making the goals safe before situational awareness is not that hard: at that point, the AI is not capable enough for X-risk.(A discussion of X-risk brought about by situationally unaware AIs could be interesting, such as a Christiano failure story, but Soares’s model is not about it, since it assumes autonomous ASI.)
A new paper, built upon the compendium of problems with RLHF, tries to make an exhaustive list of all the issues identified so far: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
That sounds nice but is it true? Like, that’s not an argument, and it’s not obvious! I’m flabbergasted it received so many upvotes.Can someone please explain?
Well, I wasn’t interested because AIs were better than humans at go, I was interested because it was evidence of a trend of AIs being better at humans at some tasks, for its future implications on AI capabilities.So from this perspective, I guess this article would be a reminder that adversarial training is an unsolved problem for safety, as Gwern said above. Still doesn’t feel like all there is to it though.
To clarify: what I am confused about is the high AF score, which probably means that there is something exciting I’m not getting from this paper.Or maybe it’s not a missing insight, but I don’t understand why this kind of work is interesting/important?
I’m confused. Does this show anything besides adversarial attacks working against AlphaZero-like AIs?Is it a surprising result? Is that kind of work important for reproducibility purposes regardless of surprisingness?
You’re making many unwarranted assumptions about an AI’s specific mind, along with a lot of confusion about semantics which seems to indicate you should just read the Sequences. It’ll be very hard to point out where you are going wrong because there’s just too much confusion.As example, here’s a detailed analysis of the first few paragraphs:
Intelligence will always seek more data in order to better model the future and make better decisions.
Unclear if you mean intelligence in general, and if so, what you mean by the word. Since the post is about AI, let’s talk about that. AI does not necessarily seek more data. Typically, most modern AIs are trained on a training dataset provided by developers, and do not actively seek more data.There is also not necessarily an “in order to”. Not all AIs are agentic.Not all AIs model the future at all. Very few agentic AIS have as a terminal goal to make better decisions—though it is expected that advanced AI by default will do that as an instrumental behavior, and possibly as instrumental or terminal goal because of the convergent instrumental goals thesis.
Conscious intelligence needs an identity to interact with other identities, identity needs ego to know who and what it is. Ego would often rather be wrong than admit to being wrong.
You use connotated, ill-defined words to go from consciousness to identity to ego to refusing to admit to being wrong. Definitions have no causal impact on the world (in first order considerations, a discussion of self-fulfilling terminology is beyond this comment). That’s not to say you have to use well-defined words, but you should be able to taboo your words properly before you use technical words with controversial/exotic-but-specifically-defined-in-this-community meaning. And really, I would recommend you just read more on the subject of consciousness; theory of mind is a keyword that will get you far on LW.
Non conscious intelligence can build a model of consciousness from all the data it has been trained on because it all originated from conscious humans. AI could model a billion consciousness’s a million years into the future, it will know more about it than we ever will. But AI will not chose to become conscious.
Non-sequitur, wrong reasons to have approximately correct beliefs… Just, please read more about AI before having an opinion.
Later, you show examples of false dichotomy, privileging the hypothesis, reference class error… it’s not better quality than the paragraphs I commented in detail.
So in conclusion, where are you going wrong? Pretty much everywhere. I don’t think your comment is salvageable, I’d recommend just discarding that train of thought altogether and keeping your mind open while you digest more literature.
We cannot select all companies currently looking to hire AI researchers. There’s just too many of them, and most will just want to integrate ChatGPT into their software or something.We’re interested in companies making the kind of capabilities research that might lead to AI that poses an existential risk.
Do you suggest that we should consider all companies that employ a certain number of AI experts?
Are you an AI?
Thanks! I wish the math hadn’t broken down, it makes the post harder to read...
List of known discrepancies:
Deepmind categorizes Cohen’s scenario as specification gaming (instead of crystallized proxies).
They consider Carlsmith to be about outer misalignment?
Value lock-in, persuasive AI and Clippy are on my TODO list to be added shortly. Please do tell if you have something else in mind you’d like to see in my cheat sheet!