66$, with some help of a friend.
Simon Fischer
Filled out the survey. The cryonics-question could use an option “I would be signed up if it was possible where I live.”
My guess would be: If the integrity check gets corrupted, the mutated nanomachine could possibly “work”, but if the decryption routine gets corrupted, the instructions can’t get decrypted and the nanomachine wouldn’t work.
The threat model here seems basically wrong and focused on sins of commission when sins of omission are, if anything, an even larger space of threats and which apply to ‘safe’ solutions reported by the Oracle.
Sure, I mostly agree with the distinction you’re making here between “sins of commission” and “sins of omissions”. Contrary to you, though, I believe that getting rid of the threat of “sins of commission” is extremely useful. If the output of the Oracle is just optimized to fulfill your satisfaction goal and not for anything else, you’ve basically gotten rid of the superintelligent adversary in your threat model.
‘Devising a plan to take over the world’ for a misaligned Oracle is not difficult, it is easy, because the initial steps like ‘unboxing the Oracle’ are the default convergent outcome of almost all ordinary non-dangerous use which in no way mentions ‘taking over the world’ as the goal. (“Tool AIs want to be Agent AIs.”) To be safe, an Oracle has to have a goal of not taking over the world.
I agree that for many ambitious goals, ‘unboxing the Oracle’ is an instrumental goal. It’s overwhelmingly important that we use such an Oracle setup only for goals that are achievable without such instrumental goals being pursued as a consequence of a large fraction of the satisficing outputs. (I mentioned this in footnote 2, but probably should have highlighted it more.) I think this is a common limitation of all soft-optimization approaches.
There are many, many orders of magnitude more ways to be insecure than to be secure, and insecure is the wide target to hit.
This is talking about a different threat model than mine. You’re talking here about security in a more ordinary sense, as in “secure from being hacked by humans” or “secure from accidentally leaking dangerous information”. I feel like this type of security concerns should be much easier to address, as you’re defending yourself not against superintelligences but against humans and accidents.
The example you gave about the Oracle producing a complicated plan that leaks the source of the Oracle is an example of this: It’s trivially defended against by not connecting the device the Oracle is running on to the internet and not using the same device to execute the great “cure all cancer” plan. (I don’t believe that either you or I would have made that mistake!)
Regarding the drop of unemployment in Germany, I’ve heard it claimed that it is mainly due to changing the way the unemployment statististics are done, e.g. people who are in temporary, 1€/h jobs and still receiving benefits are counted als employed. If this point is still important, I can look for more details and translate.
EDIT: Some details are here:
It is possible to earn income from a job and receive Arbeitslosengeld II benefits at the same time. [...] There are criticisms that this defies competition and leads to a downward spiral in wages and the loss of full-time jobs. [...]
The Hartz IV reforms continue to attract criticism in Germany, despite a considerable reduction in short and long term unemployment. This reduction has led to some claims of success for the Hartz reforms. Others say the actual unemployment figures are not comparable because many people work part-time or are not included in the statistics for other reasons, such as the number of children that live in Hartz IV households, which has risen to record numbers.
Don’t you believe in flying saucers, they ask me? Don’t you believe in telepathy? — in ancient astronauts? — in the Bermuda triangle? — in life after death? No, I reply. No, no, no, no, and again no. One person recently, goaded into desperation by the litany of unrelieved negation, burst out “Don’t you believe in anything?” “Yes”, I said. “I believe in evidence. I believe in observation, measurement, and reasoning, confirmed by independent observers. I’ll believe anything, no matter how wild and ridiculous, if there is evidence for it. The wilder and more ridiculous something is, however, the firmer and more solid the evidence will have to be.”
Isaac Asimov
We probably would’ve been less enthusiastic about hooking up LLMs to the Internet too, but here we collectively are. We do face a superintelligent adversary: all of the incentives and factions of humanity. An Oracle which is simply neutral is still default dangerous.
I completely agree with that. My proposal does not address the global coordination problem that we face, but it might be a useful tool if we collectively get our act together or if the first party with access to superintelligence has enough slack to proceed extra carefully. Even more modestly, I was hoping this might contribute to our theoretical understanding of why soft-optimization can be useful.
What’s Worm? Oh, wait..
“This is what it looks like in practice, by default, when someone tries to outsource some cognitive labor which they could not themselves perform.”
This proves way too much.I agree, I think this even proves P=NP.
Maybe a more reasonable statement would be: You can not outsource cognitive labor if you don’t know how to verify the solution. But I think that’s still not completely true, given that interactive proofs are a thing. (Plug: I wrote a post exploring the idea of applying interactive proofs to AI safety.)
Thank you for writing about this! A minor point: I don’t think aerosolizing monkeypox suspensions using a nebulizer can be counted as gain of function research, not even “at least kind of”. (Or do I lack reading comprehension and misunderstood something?)
PokerStrategy.com provides you with a $50 starting capital if you pass their quiz.
I’ve done this; I now have ca. 500$ without ever cashing in, but it took quite some time.
the feedback is NOT immediate.
This can not be said too often, the variance is quite high and it takes a lot of self-control to play consistently even if you’ve been losing money for a long time.
First thought: The oracle is going to choose to systematically answer or not answer the queries we give it. This represents a causal channel of one bit per query it can use to influence the outside world[1]. Can you conquer the world in one awkwardly delivered kilobyte or less? Maybe.
Agreed. I think it’s potentially a good bit worse than one kilobyte if let ourselves bet tricked to ask many questions, different questions or lower the difficulty of the safety constraint too much.
As mentioned in footnote 10, this requires a kind of perfect coordination between different Oracles that we manage to build, where the Oracles have to be able to correctly predict for all kinds of questions whether humanity (with the help of other Oracles) has been able to figure out the answers (or the rough sizes of various sets). Otherwise, when caught in deception, we can (as you noted) scrap or retrain the Oracles.
Second thought: What kind of optimisation problems can we specify well enough for a formal proof checker to tell whether they’ve been satisficed? Are they the kind of problems where solving them can save the world?
You’re correct in that this approach is only useful as far as validation is easier then generation. Though for this technique, the validation doesn’t have to be by a formal proof checker, any program that you can run suffices. It might even be a very slow program (e.g. a big LLM) if you have an effective way communicate your goal set to the Oracle (e.g. using a natural language prompt as we already use today when using current AIs).
The soft optimization post took 24 person-weeks (assuming 4 people half-time for 12 weeks) plus some of Jeremy’s time.
Team member here. I think this is a significant overestimate, I’d guess at 12-15 person-weeks. If it’s relevant I can ask all former team members how much time they spent; it was around 10h per week for me. Given that we were beginners and spent a lot of time learning about the topic, I feel we were doing fine and learnt a lot.
Working on this part-time was difficult for me and the fact that people are not working on these things full-time in the camp should be considered when judging research output.
Missile attacks are not piracy, though, right?
It’s good that you learned a few things from these incidents, but I’m sceptical of the (different) claim implied by the headline that Peter Zeihan was meaningfully correct here. If you interpret “directions” imprecisely enough, it’s not hard to be sometimes directionally correct.
I don’t believe these “practical” problems (“can’t try long enough”) generalize enough to support your much more general initial statement. This doesn’t feel like a true rejection to me, but maybe I’m misunderstanding your point.
Microsoft is the sort of corporate bureaucracy where dynamic orgs/founders/researchers go to die. My median expectation is that whatever former OpenAI group ends up there will be far less productive than they were at OpenAI.
I’m a bit sceptical of that. You gave some reasonable arguments, but all of this should be known to Sam Altman, and he still chose to accept Microsoft’s offer instead of founding his own org (I’m assuming he would easily able to raise a lot of money). So, given that “how productive are the former OpenAI folks at Microsoft?” is the crux of the argument, it seems that recent events are good news iff Sam Altman made a big mistake with that decision.
I think I mostly agree with this, but from my perspective it hints that you’re framing the problem slightly wrong. Roughly, the problem with the outsourcing-approaches is our inability to specify/verify solutions to the alignment problem, not that specifying is not in general easier than solving yourself.
(Because of the difficulty of specifying the alignment problem, I restricted myself to speculating about pivotal acts in the post linked above.)
[...] P3 masks, worn properly, with appropriate eye protection while maintaining basic hand hygiene are efficient in preventing SARS-CoV-2 infection regardless of setting.
If this is true, then this is a great idea and it’s somewhat suprising that these masks are not in widespread use already.
I suspect the plan is a bit less practical than stated, as I expect there to be problems with compliance, in particular because the mask are mildly unpleasant to wear for prolonged periods.
From 3.3
To do we would want to put the threatened agent
to do so(?) we would
From 3.4
an agent whose single goal is to stymie the plans and goals of single given agent
of a single given agent
From 4.1
then all self-improving or constructed superintelligence must fall prey to it, even if it were actively seeking to avoid it.
every, or change the rest of the sentence (superintelligences, they were)
From 4.5
There are goals G, such that an entity an entity with goal G
a superintelligence will goal G can exist.
There is no such thing as general intelligence, i.e. an algorithm that is “capable of behaving intelligently over many domains” if not specifically designed for these domain(s). As a corollar, AI will not go FOOM. (80% confident)
EDIT: Quote from here