The ladder of abstraction can be continued indefinitely. This is not a problem.
Richard_Kennaway
I recall another over-ambitious project from way back, when USENET was the place to discuss everything and spam had not been invented. This was a time when email typically took anything from minutes to days to reach someone, depending on how well-connected both sender and recipient were, and USENET posts likewise. And there was nothing faster.
Unfortunately, I’ve never found any archive of USENET that goes back far enough to find this again, so the following is from memories that must be over 30 years old. I would be interested to know if anyone else remembers.
Someone had the idea of what we nowadays call an MMORPG: a massively multi-player online game of galactic exploration and conquest. Of course, this would not happen in real time. Nothing did in those days except your interaction with the machine directly in front of you, and not always then. No, this would be roughly turn-based. You would submit your moves, and a few days later you would see the outcome. Imagine Eve Online played by postal mail.
The global state of the whole game would be maintained using the same protocols that USENET itself used to distribute articles. (USENET had no centralised store, like a bulletin board would: every message was copied to every participating machine, each machine communicating with a few neighbours on the net to share new messages.) To save on memory, all of the assets would be pseudo-randomly generated, so the whole geography of a planet could be represented by its random seed, as would the distribution of stars across the galaxy. A huge discussion sprang up across several USENET newsgroups, and ran for months, with people posting idea after idea of what to put in the game and how things would work. It was awesome.
To the best of my knowledge, no code for this game was ever written.
I took from this a warning against continually raising one’s sights and never firing through them, dreaming ever greater dreams but never rising from the opium couch, all exploration and no exploitation.
It is possible that some of the people involved might have been inspired by it to make their own games that did see the light of day, although I’ve never heard anyone say so.
Roko’s Basilisk, as told in the oldest extant collection of jokes, the Philogelos, c. 4th century AD.
A pedant having fallen into a pit called out continually to summon help. When no one answered, he said to himself, “I am a fool if I do not give all a beating when I get out in order that in the future they shall answer me and furnish me with a ladder.”
H/T Dynomight.
Richard_Kennaway’s Shortform
Previously on LessWrong. (See the PNSE paper and the comment thread.)
It’s clearly mantra meditation that does it for Martin. Look at how red in the face he suddenly goes at 47:38 − 47:55.
Plato’s dialogues are all examples of Socrates’ prompt engineering. See also the old political wisdom that if you can set the agenda, it doesn’t matter how people vote.
“I don’t like spinach, and I’m glad I don’t, because if I liked it i’d eat it, and I hate it.”
Every new experience may change your preferences. Seeking out new experiences will predictably have this effect. Openness to experience requires openness to seeing your preferences changing, even if you do not know in what direction. This has been going on since birth, as no-one is born with the preferences they will have as adults. There is a process by which they develop, which does not, or at least need not, stop on reaching adulthood.
What does “felt sense” mean in contrast to simply “sense”? It has never been clear to me what work the word “felt” is doing there, other than making it a term of art that tells me that the writer is referencing Gendlin.
Well…I’d definitely read the book.
As long as this distribution drops off faster than 1/x as the offer increases, then arbitrarily large offers are overwhelmed by vast implausibility and their EV becomes arbitrarily small.
This has the problem that you have no assurance that the distribution does drop off sufficiently fast. It would be convenient if it did, but the world is not structured for anyone’s convenience.
Low value from technology and high risk might imply that a return to pre-industrial agrarian life maximizes human value.
A return to pre-industrial agrarian life is impossible. What would prevent people from just reinventing the machines? Perhaps someone could gather together a movement — it could be called the Great Leap Backward — to destroy all the libraries, all the computers, all the machines, and all the educated people who might try to rebuild anything. Pre-industrial population was around 800 million (taking the figures for 1750 AD), so they would also have to decide which 10% of the population get to survive into this brave new world.
Yes, that could be a factor.
A few ideas occur to me, all based on the man having a larger range of employers to choose from.
-
More men than women migrate to find work.
-
Men are willing to commute further.
-
Men are more willing than women to apply for jobs that they do not meet the requested qualifications for.
The briefest Googling suggests that there are studies to support all of these, but I’m not going to search any further for the needle.
These three all look like manifestations of a single characteristic, “initiative”, or “get-up-and-go”, but neither of those terms is good in a search engine.
-
For a superintelligent AI, alignment might as well be binary, just as for practical purposes you either have a critical mass of U235 or you don’t, notwithstanding the narrow transition region. But can you expand the terms “weakly aligned” and “weakly superintelligent”? Even after searching alignmentforum.org and lesswrong.org for these, their intended meanings are not clear to me. One post says:
weak alignment means: do all of the things any competent AI researcher would obviously do when designing a safe AI.
For instance, you should ask the AI how it would respond in various hypothetical situations, and make sure it gives the “ethically correct” answer as judged by human beings.
My shoulder Eliezer is rolling his eyes at this.
ETA: And here I find:
To summarize, weak alignment, which is what this post is mostly about, would say that “everything will be all right in the end.” Strong alignment, which refers to the transient, would say that “everything will be all right in the end, and the journey there will be all right, too.”
I find it implausible that it is easier to build a machine that might destroy the world but is guaranteed to eventually rebuild it, than to build one that never destroys the world. It is easier to not make an omelette than it is to unmake one.
Proving things of one particular program is not useful in this context. What is needed is to prove properties of all the AIs that may come out of whatever one’s research program is, rejecting those that fail and only accepting those whose safety is assured. This is not usefully different from the premise of Rice’s theorem.
Hoping that the AI happens to be aligned is not even an alignment strategy.
This is an argument I don’t think I’ve seen made, or at least not made as strongly as it should be. So I will present it as starkly as possible. It is certainly a basic one.
The question I am asking is, is the conclusion below correct, that alignment is fundamentally impossible for any AI built by current methods? And by contraposition, that alignment is only achievable, if at all, for an AI built by deliberate construction? GOFAI never got very far, but that only shows that they never got the right ideas.
The argument:
A trained ML is an uninterpreted pile of numbers representing the program that it has been trained to be. By Rice’s theorem, no nontrivial fact can be proved about an arbitrary program. Therefore no attempt at alignment based on training it, then proving it safe, can work.
Provably correct software is not and cannot be created by writing code without much concern for correctness, then trying to make it correct (despite that being pretty much how most non-life-critical software is built). A fortiori, a pile of numbers generated by a training process cannot be tweaked at all, cannot be understood, cannot be proved to satisfy anything.
Provably correct software can only be developed by building from the outset with correctness in mind.
This is also true if “correctness” is replaced by “security“.
No concern for correctness enters into the process of training any sort of ML model. There is generally a criterion for judging the output. That is how it is trained. But that only measures performance — how often it is right on the test data — not correctness — whether it is necessarily right for all possible data. For it is said, testing can only prove the presence of faults, never their absence.
Therefore no AI built by current methods can be aligned.
“FOOM” in the sense common here has got into Wiktionary: A sudden increase in artificial intelligence such that an AI system becomes extremely powerful.
The thread is not just about these exceptions, but about everyone else changing everything they say about “male” and “female” to deal with the supposed puzzle of the existing notions being fuzzy around the edges. All notions are fuzzy around the edges. Making more boxes does not solve the problem, so far as there is one, of there being boxes.
Expecting there to be an article below the title.