I think this is the idea: people can form habits, and habits have friction—you’ll keep doing them even if they’re painful (they oppose momentary preferences, as opposed to reflective preferences). But you probably won’t adopt a new habit if it’s painful. Therefore, to successfully build a habit that changes your actions from momentary to reflective, you should first adopt a habit, then make it painful—don’t combine the two steps.
TheWakalix
[I think that in general, comments with less than −10 karma should have at least one comment explaining why.]
I downvoted you because I believe you to be strongly false: either lying or deluded. This is due to my prior for “somebody solves two Hard Problems in a matter of days” being much lower than my prior for “somebody claims to have solved two Hard Problems in a matter of days”. In particular, people in altered mental states are often much more susceptible to false feelings of enlightenment. (See “Mysticism and Pattern-Matching” by Scott Alexander for related ideas and support for this argument.)
[I originally wrote this as a description of my beliefs about why people in general downvoted the parent, before remembering the LW comment rules (describe your ideas, not your idea of the consensus). This is one data point for the rules being effective.]
Note: the Filter might not exist. In a nutshell, the Fermi paradox can be dissolved by realizing that “average number of civilizations per galaxy” is less important than “probability of galaxy containing single civilization”. (Note: depending on your anthropics, this may or may not actually dissolve the paradox.)
I don’t think you’ve understood this article if that’s your response. The point of the article is that real human beings can in fact set up GoFundMe pages, and many more things, but economic models rarely include all these options. It is only through restricting the options to be considered that we can model unboundedly rational agents. Stuart Armstrong is trying to raise awareness of the limitations of restricted-option models.
(I’m not saying that to be rude, but because I think people can benefit from considering the possibility “I have completely misunderstood what this person is trying to tell me”, and responses like yours are mostly only made by people who have completely misunderstood. There’s always the possibility that I’m the completely wrong one—if so, I’d be glad to understand the intended meaning your post is trying to convey, and which I am not seeing.)
Strategies that I’ve found helpful:
If something doesn’t seem tractable, try flipping between algebraic and geometric interpretations of a problem. Problems 1 and 3 fell to this approach.
Specific solutions (or suggestive handwaving):
Problem 1:
I thought of it like parity—going left to right, each unichromatic edge doesn’t change the color, while each bichromatic edge does. So to have an overall change, we need either 1 bichromatic edge, or 3 (1 and 2 that cancel), or 5 (1 and 4 that cancel)...
Problem 2:
I couldn’t understand this one at first. After checking Wikipedia, I think that refers to the space that each point in the sequence lies within. An example of a finite sequence in would then be
Problem 3:
Consider the unit square. We need to draw one continuous line, going from left to right, that covers the entire vertical extent of the square. No matter how you do that, you need to cross the diagonal line from the bottom left to the top right.
Why? Because you need to touch the top and the bottom edges. You can’t do that at the bottom-left or top-right corners, since then you’d touch the diagonal line. But then the point where you touch the top edge is entirely within the top triangle, and it cannot touch the bottom edge without entering the bottom triangle. Switching between triangles is identical to crossing the diagonal line.
As for why this isn’t true if the set is open rather than closed: if we exclude the edges from our consideration of “does it intersect the diagonal”, then it’s fairly trivial to construct a curve that stays inside one triangle and has a codomain of (0,1). should work.
Why haven’t we seen a learning algorithm teaching itself chess intelligence starting with nothing but the rules?
We have now, depending on how you interpret “teach itself”. It wasn’t given anything but the rules and how to play against itself.
Note: “it’s justified by being true” doesn’t help distinguish cults. You seem to be aware of this, though, because you still count that component of cultishness as true.
I believe Yudkowsky discussed this some in his writings against Modesty, in Inadequate Equilibria. [recommendation due to relevance]
lots of content gets absurdly, unwarrantedly high/low karma totals because people’s opinions are correlated
How is this absurd and unwarranted? The numerical value doesn’t have any inherent meaning (as it would if any posters who received at least 1000 karma were given moderating powers, ferex). Is it that it produces an unusual vote distribution? A possible solution would be to adjust total votes downward by a factor that increases with total votes, if this is a problem, but I disagree that any solution is needed.
“People’s opinions are correlated” is just another way of stating “many people agree about X”, and that sounds like something that the voting system should be able to record (perhaps the only thing—isn’t the voting system just a way of recording public opinion on a post?).
Votes determine comment order, which is a counterexample to my claim of “the numbers don’t really matter,” but that’s scale-invariant so my point holds. But perhaps low-value posts are inflated more than high-value posts? But if LW voters tend to upvote low-value posts, then I think there’s a larger issue that can’t be solved by people occasionally throwing a wrench into inflating post-votes (how do we know that the strategic downvotes will correlate with low-value posts when upvotes can’t do the same?).
lots of content gets no upvotes or downvotes at all because people are trying to correct for the possibility that things will be over-voted (even though they can see with their own eyes whether a vote total is currently too high or too low).
All the votes, past and future, are combined into one total. If someone aims for the final vote to be X, and their expected final vote is X, then why should they vote? That’s just rational strategic voting. (However, possible details that reverse the optimal decision: everybody acts the same way and nobody votes. But wouldn’t people realize that they all think that way? Alternatively, snowballing votes means that your vote could tip the final vote to either zero or larger than your wanted value, but only if people look at the total vote when deciding how to vote, which wouldn’t happen in this hypothetical scenario where people vote from their hearts.)
I agree that people have preferences about vote sums, and that each individual’s preferences can be better realized through strategic voting. The crux is that I think that strategic voting worsens our ability to estimate the collective opinion of LW on a post. Strategic voters act to absorb votes past a certain point that they choose, which means that the added presence of non-strategic voters may not have any effect on the total vote. (Also note the order dependence, which probably indicates something wrong.) Perhaps we could accept these costs in exchange for some gain, but I don’t see what collective gain there is from strategic voting.
the paperclipper, which from first principles decides that it must produce infinitely many paperclips
I don’t think this is an accurate description of the paperclip scenario, unless “first principles” means “hardcoded goals”.
Future GPT-3 will be protected from hyper-rational failures because of the noisy nature of its answers, so it can’t stick forever to some wrong policy.
Ignoring how GPT isn’t agentic and handwaving an agentic analogue, I don’t think this is sound. Wrong policies make up almost all of policyspace; the problem is not that the AI might enter a special state of wrongness, it’s that the AI might leave the special state of correctness. And to the extent that GPT is hindered by its randomness, it’s unable to carry out long-term plans at all—it’s safe only because it’s weak.
I’ve changed my mind—I think strategic voting might send more information than karma-blind voting. It counteracts visibility spirals as you describe. There might also be another effect: consider a community of identical, deterministic, karma-blind voters. Disregarding visibility spirals, everything gets sorted into five categories (corresponding to the number of ways for a single user to vote). In reality, deterministic and karma-blind voters aren’t identical, so karma still varies smoothly. But is “people are different” the only way information should be sent? Doesn’t a group of identical voters hold more than a quint of useful information? This is why I have a vague suspicion that strategic voters can send more information—they send more information in a degenerate case.
Typo thread:
″...Shakespear, by some reasonable...”
Auror Nobbs and Auror Colon
Good one. (That’s a Discworld reference, in case anybody hasn’t already read that great series. And Nobbs and Colon are rather incompetent watchmen.)
3) strategic voting. I’m far more likely to downvote a post or comment if it seems mediocre but has high karma than if it seems mediocre and has low karma. Same for upvoting—I don’t bother with things already upvoted by others.
I’ve noticed many people saying this, and I don’t see the value in voting that way. Your vote should carry information about your preferences about posts, not your preferences about the displayed information about the collective preferences about posts. That’s the best way to capture the collective preferences, isn’t it? As an example of the flaws of the latter perspective, changing the view order of a post can change its final karma, even though averages don’t have an order.
Edit: strategic voting in general seems similar to Defect: it gives the individual greater effectiveness at conveying their preferences, but makes the collective preference estimates less accurate. (It’s a negative-sum choice.)
Re also also: the Reverse Streetlight effect will probably come into play. It’ll optimize not just for early deception, but for any kind of deception we can’t detect.
I’m confused. I already addressed the possibility of modeling the external world. Did you think the paragraph below was about something else, or did it just not convince you? (If the latter, that’s entirely fine, but I think it’s good to note that you understand my argument without finding it persuasive. Conversational niceties like this help both participants understand each other.)
An AI might model a location that happens to be its environment, including its own self. But if this model is not connected in the right way to its consequentialism, it still won’t take over the world. It has to generate actions within its environment to do that, and language models simply don’t work that way.
Or to put it another way, it understands how the external world works, but not that it’s part of the external world. It doesn’t self-model in that way. It might even have a model of itself, but it won’t understand that the model is recursive. Its value function doesn’t assign a high value to words that its model says will result in its hardware being upgraded, because the model and the goals aren’t connected in that way.
T-shirt slogan: “It might understand the world, but it doesn’t understand that it understands the world.”
You might say “this sort of AI won’t be powerful enough to answer complicated technical questions correctly.” If so, that’s probably our crux. I have a reference class of Deep Blue and AIXI, both of which answer questions at a superhuman level without understanding self-modification, but the former doesn’t actually model the world and AIXI doesn’t belong in discussions of practical feasibility. So I’ll just point at the crux and hope you have something to say about it.
You might say, as Yudkowsky has before, “this design is too vague and you can attribute any property to it that you like; come back when you have a technical description”. If so, I’ll admit I’m just a novice speculating on things they don’t understand well. If you want a technical description then you probably don’t want to talk to me; someone at OpenAI would probably be much better at describing how language models work and what their limitations are, but honestly anyone who’s done AI work or research would be better at this than me. Or you can wait a decade and then I’ll be in the class of “people who’ve done AI work or research”.
As I see it, Rob is defending the use of [(possibly shared) intuition?] in an argument, since not everything can be feasibly and quickly proved rigorously to the satisfaction of everyone involved:
These are the kinds of claims where it’s certainly possible to reach a confident conclusion if (as it happens) the effect size is large, but where there will be plenty of finicky details and counter-examples and compressing the evidence into an easy-to-communicate form is a pretty large project. A skeptical interlocutor in those cases could reasonably doubt the claim until they see a lot of the same evidence (while acknowledging that other people may indeed have access to sufficient evidence to justify the conclusion).
(My summary is probably influenced by my memory of Wei Dai’s top-level comment, which has a similar view, so it’s possible that Rob wouldn’t use the word “intuition”, but I think that I have the gist of his argument.)
It appears that Yudkowsky simply wasn’t trying to convince a skeptic of memetic collapse in this post—Little Fuzzy provided more of an example than a proof. This is more about connecting the concepts “memetic collapse” and “local validity” and some other things. Not every post needs to prove the validity of each concept it connects with. And in fact, Yudkowsky supported his idea of memetic collapse in the linked Facebook post. Does he need to go over the same supporting arguments in each related post?
Yes, but ideally our prediction methods would allow us to predict events more accurately than flipping a coin does.
I’m not very new, but I’ve been mostly lurking, so I think I’ll introduce myself. (Note: 1k words.)
Basic information: high school student, fairly socially clueless. Probably similar to how most people here were as teenagers (if not now) - smart, nerdy, a bit of a loner, etc. I’m saying this because I think it’s relevant enough in limited ways. (My age is relevant to the life plans I describe. The social cluelessness tells you that you should ignore any odd signals I send between the lines, because I didn’t intend to send them. Is there a conversational code, similar in type to Crocker’s rules, that says “I will send as much important information explicitly as possible, please err on the side of ignoring implicit signals”?)
My first introduction to the rationalist community was through Scott Alexander. A few years ago, I was in an online discussion about gender, it turned to tolerance, and somebody linked I Can Tolerate Anything Except The Outgroup. I got hooked on SSC’s clarity and novel ideas, and eventually that led me to LW. I’ve read HPMOR, the Sequences, and so on. It’s taken a while for it to sink in, but it’s had a large effect on my views. Prior to this, I was a fairly typical young atheist nerd, so the changes aren’t very drastic, but I often find myself using the ideas I’ve gotten through the Sequences and the mindset of analytic truthseeking. The object-level belief changes I’ve had are the obvious ones: many worlds, cryonics, intelligence explosion, effective altruism, etc. I’m a humanist transhumanist reductionist materialist atheist, like almost everyone else here. That’s a cluster-membership description, not tribe-membership. Language doesn’t make it easy to Keep Your Identity Small.
I’ve always wanted to go into a career in STEM. I’ve loved mathematics from a young age, and I’ve done pretty well at it too. I started taking university calculus in middle school, and upper-division university mathematics in high school. (I’m not saying that to brag—that wouldn’t even be effective here anyway—but to show that I’m not just a one-in-ten “good at math” person who thinks very highly of themselves. I think I’m at the one-in-ten-thousand level, but I don’t have high confidence in that estimate.) A few months ago, I decided that the best way to achieve my goals was to work in the field of AI risk research. I think I can make progress in that field, and AI risk is probably the most important field in history, so it’s the best choice. (Humanity needs to solve AI risk soon. My 50% estimate for the Singularity is 2040-2060, and the default is we all die. But you’ve heard this before.) I aim to work at MIRI, or a similar organization if that doesn’t work out. It’s a rather high goal I’ve set for myself, but if I can’t have immodest ambitions here, where can I have them? I’ve been accepted to a (roughly) top 10 university for Math/CS, and I read somewhere (80k Hours?) that’s the rough talent level necessary to do good AI risk research, so I don’t think it’s an impossible goal.
My biggest failure point is my inability to carry out goals. That’s what my inner Murphy says would cause my failure to get into MIRI and do good work. That’s probably the most important thing I’m currently trying to get out of LW—the Hammertime sequence looks promising. If anyone has any good recommendations for people who can’t remember to focus, I’d love them.
In fact, any recommendations would be greatly appreciated. Such as: what would you say to someone who’s going into college? What would you say to someone who wants to work in AI risk?
I’m currently working through the MIRI research guide, starting with Halmos’s Naive Set Theory. If anyone else is doing this and would like a study partner, we should study together.
I’ve read some things about AI risk, both through the popularizations available on LW/SSC and through a couple of papers. I’ve had a few ideas already. My Outside View is sane, and I know there’s a very low chance that I’ve seen something that everyone else missed. Should I write a post on LW about it anyway?
To give an example, here’s the idea on my mind right now: it’s probably not possible to encode all our values explicitly into an AI. The obvious solution is to build into it the ability to learn values. This means it’ll start in a state of “moral ignorance”, and learn what it “should want to do” by looking at people. I’m not saying it’ll copy people’s actions, I’m saying its actions will have to be somehow entangled with what humans are and do. Information theory and so on. The crucial point: before it “opens its eyes”, this AI is not a classical consequentialist, right? Classical consequentialists have a ranking over worlds that doesn’t vary by each world. This AI’s terminal goals change depending on which possible world it’s in! I want to explore the implications—are there pitfalls? Does this help us solve problems? What is the best way to build this kind of agent? Should we even build it this way or not? I also want to formalize this kind of agent. It seems very similar to UDT in a sense, so perhaps it’s a simple extension of UDT. But there’s probably complications, and it’s worth turning the fuzzy ideas into math.
Some questions I have: does this seem like a possibly fruitful direction to look? Has someone already done something like this? What advice do you have for someone trying to do what I’m doing? Is there a really good AI risk paper that I could look at and try to mimic in terms of “this is how you formalize things, these are the sorts of questions you need to answer, etc.”? Is there anyone who’d be interested in mentoring a young person who’s interested in the field? (Connotation clarification: communicating, giving advice, kind of a back-and-forth maybe? I don’t know what’s okay to ask for and what’s not, because I’m a young clueless person. But I’m really interested and motivated, and Asking For Help is important, so I’ll put this out there and hope people interpret it charitably.)