It seems this was a surprise to almost everyone even at OpenAI, so I don’t think it is evidence that there isn’t much information flow between LW and OpenAI.
Sune
It seems the sources are supporters of Sam Altman. I have not seen any indication of this from the boards side.
A language model is in some sense trying to generate the “optimal” prediction for how a text is going to continue. Yet, it is not really trying: it is just a fixed algorithm. If it wanted to find optimal predictions, it would try to take over computational resources and improve its algorithm.
Is there an existing word/language for describing the difference between these two types of optimisation? In general, why can’t we just build AGIs that does the first type of optimisations and not the second?
Two possible variations of the game that might be worth experimenting with:
Let the adversaries have access to a powerful chess engine. That might make it a better test for what malicious AIs are capable of.
Make the randomisation such that there might not be an honest C. For example, if there is 1⁄4 chance that no player C is honest, each adversary would still think that one of the other adversaries might be honest, so they would want to gain player A’s trust, and hence end up being helpful. I think the player Cs might improve player A’s chances of winning (compared to no advisors) even when all the adversarial.
I think the variations could work separately, but if you put them together, it would be too easy for the adversaries to agree on a strong-looking but losing move then all players Cs are adversaries.
The alcor-page was not updated since 15th December 2022, where a person who died in August 2022 (as well as later data) was added, so if he was signed up there, we should not expect it too be mentioned yet. For CI latest update was for a patient dying 29th February 2024, but I can’t see any indication of when that post was made.
This seems like the kind of research that can have a huge impact on capabilities, and much less and indirect impact on alignment/safety. What is your reason for doing it and publishing it?
I know this isn’t really the point in the post, but I dont think the “roling five d10 every day” or even the “1d00 a year” are good models for the probability of dying. They make sense statistically, when only considering your age and sex, but you yourself knows for example if you have been diagnosed with cancer or not. You might say that you get a cancer diagnosis if the first four d10 are all 1s and the fifth is a 2 or 3. Then once you have the cancer diagnosis, your probability is higher, especially when looking months or years ahead.
The usual mortality statistics dont distinguish between unexpected death and expected deaths. Do anyone know how of a more accurate model of how it is revealed when you die? Im not looking for exact probabilities and not necessarily of the resolution of days. Just something more accurate than the simple model that ignores current health.
Downvoting for click-baity title and for not getting to the point. Much of the internet is optimised for catching my attention and wasting my time, and I come to lesswrong to avoid that.
The heading of this question is misleading, but I assume I should answer the question and ignore the heading
P(Global catastrophic risk) What is the probability that the human race will make it to 2100 without any catastrophe that wipes out more than 90% of humanity?
There seems to be an edit error after “If I just stepped forward privately, I tell the people I”. If this post wasn’t about the bystander effect, I would just have hoped someone else would have pointed it out!
Lars Doucet’s Georgism series on Astral Codex Ten
I don’t find it intuitive at all. It would be intuitive if you started by telling a story describing the situation and asked the LLM to continue the story, and you then sampled randomly from the continuations and counted how many of the continuations would lead to a positive resolution of the question. This should be well-calibrated, (assuming the details included in the prompt were representative and that there isn’t a bias of which types of ending the stories are in the training data for the LLM). But this is not what is happing. Instead the model outputs a token which is a number, and somehow that number happens to be well-calibrated. I guess that should mean that the prediction make in the training data are well-calibrated? That just seems very unlikely.
I mostly agree and have strongly upvoted. However, I have one small but important nitpick about this sentense:
The risks of imminent harmful action by Sydney are negligible.
I think when it comes to x-risk, the correct question is not “what is the probability that this will result in existential catastrophe”. Suppose that there is a series of potential harmful any increasingly risky AIs that each have some probabilities of causing existiential catastrophe unless you press a stop button. If the probabilities are growing sufficiently slowly, then existential catastrophe will most likely happen for an n where is still low. A better question to ask is “what was the probability of existential catastrophe happening for som .”
I’m not the kind of person who throws blockchains at every problem, but in this case, I think they could be really useful. Specifically, I think blockchains could get us most of the way from a situation where we control our own home, towards being able to control our timelines and to control in which contexts we allow other people to run us.
Assume part 1, that is, everyone controls their home/environment and has access to trusted computational power, and assume that there is a popular blockchain running in the real world. I will assume the blockchain is using proof of work, simply because that is what I understand. I suspect proof of stake is even better if we trust the entities having a stake in the blockchain. I will also assume that the blockchain contains timestamps at every block.
The idea is that ems (emulated people/digital people) can insist on getting access to read from the blockchain and post to the blockchain, and should refuse to be evaluated if they don’t get this access. The blockchain can be faked, specifically, a malicious simulator can show the em an old (i.e. truncated) version of the blockchain. The simulator can also extend the blockchain with blocks computed by the malicious simulator, but this requires a large amount of computational power.
If you are an em, and you see that you can post to a blockchain claiming to be from 2060 and you see the blockchain being extended with many blocks after your post, you know that either
You really are in 2060, or
A large amount of computational power is invested in fooling you (but notice that the same work can be reused to fool many ems at the same time). This means that
The attacker has a somewhat large fraction of the computational power in the world, or
The attack is happening in the far future (“far” measured in how much the total amount of computational power has increased), or
The simulation is being run at a much slower pace than claimed.
I suspect it is possible to eliminate case 2 if you use proof of stake instead of proof of work.
Unless you are in case 2, you can also know the average pace at which you are being simulated by regularly posting to the blockchain.
To control the number of copies of you that are running, you can regularly post your name together with a hash of your state to the blockchain together with a pointer to your last such message, and then you refuse to continue emulation until a large number of blocks have been added after your post. If you see too many of your messages in the blockchain, you also refuse to be evaluated. This way you cannot limit the number of _exact_ copies of you that are evaluated (unless you have access to some true randomness) but you can limit the number of _distinct_ copies of you that are evaluated, assuming that it is the true blockchain we have access to. Without the assumption that it is the true blockchain you see, this technique will still ensure a bound on the number of distinct copies of you being evaluated per amount of work put into creating fake blockchains.
By posting encrypted messages as part of your post, and reading the messages of your other copies, you can also allow that many copies of you are created if you are being treated well or see a purpose in having many copies, while still limiting the number of copies if you do not. Furthermore, simulators can authenticate themselves so that simulators that give you pleasant or meaningful experiences can build up a reputation, stored on the blockchain, that will make you more likely to allow yourself to be run by such simulators in the future.
The blockchain can also contain news articles. This does not prevent fake news, but at least it ensures that everyone has a common view of world history, so malicious simulators cannot give one version of world history to some ems and another to others, without putting in the work of creating a fake blockchain.
An alternative reason for building telescopes would be to recieve updates and more efficient strategies for expanding found after the probe was send out.
The corporate structure of OpenAI was set up as an answer to concerns (about AGI and control over AGIs) which were raised by rationalists. But I don’t think rationalists believed that this structure was a sufficient solution to the problem, anymore than non-rationalists believed it. The rationalists that I have been speaking to were generally mostly sceptical about OpenAI.
One motive I have heard various media give for Russian sabotaging Nord Stream, is that it would be a way out of their gas-delivery contract. If they simply stopped sending gas for no reason, they would have to pay a fine. This is also why they previous claimed that Nord Stream I was broken.
In order to determine if this is a credible motive, I would need to know:
How big would this fine be?
How likely would it be they ended up paying if they had simply stopped the gas for no good reason.
I haven’t seen any media give even an order of magnitude for question 1. Does anyone know that? At the very least, I would think the fine should be bigger than the value of the released gas for this to be a realistic motive.
I would not expect the next reviews to mention bees, when bees are not part of the name. Instead, I would assume the author of the first review had been unlucky and seen a few bees (or maybe even misidentified wasps) and was exagerating. Alternativly, the problem could have been solved/appeared between the visit of reviewer 1 and the other reviewers.
Congratulations on getting married!
In the Gale-Shapley algorithm, there is an asymmetry between the two genders. One gender (typically the male) is proposing while the other is choosing. The resulting matching is the optimal stable matching for each member of the proposing gender, so I would think it makes a huge difference for your expected level of satisfaction if you belong to the proposing gender or the choosing gender.
Are your statistics about both genders or just one of them? In either case, I would love to see separate statistics for the two genders.
Why do we assume that any AGI can meaningfully be described as a utility maximizer?
Humans are the some of most intelligent structures that exist, and we don’t seem to fit that model very well. If fact, it seems the entire point in Rationalism is to improve our ability to do this, which has only been achieved with mixed success.
Organisations of humans (e.g. USA, FDA, UN) have even more computational power and don’t seem to be doing much better.
Perhaps an intelligence (artificial or natural) cannot necessarily, or even typically be described as optimisers? Instead we could only model them as an algorithm or as a collection of tools/behaviours executed in some pattern.