I think you’re right. I think inline comments are a good personal workflow when engaging with a post (when there’s a post I want to properly understand, I copy it to a google doc and comment it), but not for communicating the engagement
My understanding is the first thing is what you get with UDASSA and the second thing would be what you get is if you think the Solomonoff prior is useful for predicting your universe for some other reason (ie not because you think the likelihood of finding yourself in some situation covaries with the Solomonoff prior’s weight on that situation)
It is at least the case that OpenAI has sponsored H1Bs before: https://www.myvisajobs.com/Visa-Sponsor/Openai/1304955.htm
He treated it like a game, even though he was given the ability to destroy a non-trivial communal resource.
I want to resist this a bit. I actually got more value out of the “blown up” frontpage then I would have from a normal frontpage that day. A bit of “cool to see what the LW devs prepared for this case”, a bit of “cool, something changed!” and some excitement about learning something.
That’s a very broad definition of ‘long haul’ on duration and on severity, and I’m guessing this is a large underestimate of the number of actual cases in the United Kingdom
If the definition is broad, shouldn’t it be an overestimate?
My attempt to summarize the alignment concern here. Does this seem a reasonable gloss?
It seems plausible that competitive models will not be transparent or introspectable. If you can’t see how the model is making decisions, you can’t tell how it will generalize, and so you don’t get very good safety guarantees. Or to put it another way, if you can’t interact with the way the model is thinking, then you can’t give a rich enough reward signal to guide it to the region of model space that you want
Most importantly, the success of the scheme relies on the correctness of the prior over helper models (or else the helper could just be another copy of GPT-Klingon)
I’m not sure I understand this. My understanding of the worry: what if there’s some equilibrium where the model gives wrong explanations of meanings, but I can’t tell using just the model to give me meanings.
But it seems to me that having the human in the loop doing prediction helps a lot, even with the same prior. Like, if the meanings are wrong, then the user will just not predict the correct word. But maybe this is not enough corrective data?
As someone who didn’t receive the codes, but read the email on Honoring Petrov Day, I also got the sense it wasn’t too serious. The thing that would most give me pause is “a resource thousands of people view every day”.
I’m not sure I can say exactly what seems lighthearted about the email to me. Perhaps I just assumed it would be, and so read it that way. If I were to pick a few concrete things, I would say the phrase “with our honor intact” seems like a joke, and also “the opportunity to not destroy LessWrong” seems like a silly phrase (kind of similar to “a little bit the worst thing ever”). On reflection, yep, you are getting an opportunity you don’t normally get. But it’s also weird to have an opportunity to perform a negative action.
Also, it still seems to me that there’s no reason anyone who was taking it seriously would blow up LW (apart from maybe Jeff Kauffman). So if there’s a real risk of someone blowing it up, it must not be that serious.
Suppose phishing attacks do have an 80%+ success rate. I have been the target of phishing attempts 10s of times, and never fallen for it (and I imagine this is not unusual on LW). This suggests the average LWer should not expect to fall victim to a phishing attempt with 80% probability even if that is the global average
This summary was helpful for me, thanks! I was sad cos I could tell there was something I wanted to know from the post but couldn’t quite get it
In a Stag Hunt, the hunters can punish defection and reward cooperation
This seems wrong. I think the argument goes “the essential difference between a one-off Prisoner’s Dilemma and an IPD is that players can punish and reward each other in-band (by future behavior). In the real world, they can also reward and punish out-of-band (in other games). Both these forces help create another equilibrium where people cooperate and punishment makes defecting a bad idea (though an equilibrium of constant defection still exists). This payoff matrix is like that of a Stag Hunt rather than a one-off Prisoner’s Dilemma”
I think it’s probably true that the Litany of Gendlin is irrecoverably false, but I feel drawn to apologia anyway.
I think the central point of the litany is its equivocation between “you can stand what is true (because, whether you know it or not, you already are standing what is true)” and “you can stand to know what is true”.
When someone thinks, “I can’t have wasted my time on this startup. If I have I’ll just die”, they must really mean “If I find out I have I’ll just die”. Otherwise presumably they can conclude from their continued aliveness that they didn’t waste their life, and move on. The litany is an invitation to allow yourself to have less fallout from acknowledging or finding out the truth because you finding it out isn’t what causes it to be true, however bad the world might be because it’s true. A local frame might be “whatever additional terrible ways it feels like the world must be now if X is true are bucket errors”.
So when you say “Owning up to what’s true makes things way worse if you don’t have the psychological immune system to handle the negative news/deal with the trauma or whatever”, you’re not responding to the litany as I see it. The litany says (emphasis added) “Owning up to it doesn’t make it worse”. Owning up to what’s true doesn’t make the true thing worse. It might make things worse, but it doesn’t make the true thing worse (though I’m sure there are, in fact, tricky counterexamples here)
(The Litany of Gendlin is important to me, so I wanted to defend it!)
I wonder why it seems like it suggests dispassion to you, but to me it suggests grace in the presence of pain. The grace for me I think comes from the outward- and upward-reaching (to me) “to be interacted with” and “to be lived”, and grace with acknowledgement of pain comes from “they are already enduring it”
Wondering if these weekly talks should be listed in the Community Events section?
I like this claim about the nature of communities. One way people can Really Try in a community is by taking stands against the way the community does things while remaining part of the community. I can’t think of any good solutions for encouraging this without assuming closed membership (or other cures worse than the disease)
I vote for GWP or your favorite timelines model
But, that is indeed a clunkier statement
I once heard someone say, “I’m curious about X, but only want to ask you about it if you want to talk about it” and thought that seemed very skillful.
A repeating block I have with increasing capture is the tension between having enough notebooks to be convenient and having one’s notes not be hopelessly scattered.
To expand: I strongly prefer paper notes to digital. I want to have notetaking stuff with my everywhere. I want maintaining access to notetaking to be convenient (robust to changing from gym clothes to jeans etc). I want to be able to trust that I will look at / can find a given note in the future.
I’ve never quite cracked getting all of these lined up. The closest I’ve come is having a pocket notebook everywhere I can think of, but laundry or removing notebooks to read them at a desk tends to break this system.
I expect “x imported out of y”, or “x imported, y remain” to be more motivating than the current “y remain” on the import progress bar.
My peevish reaction to this is (sarcastically) “Finally, there’s a 1400 word popularization of that 850 word blog post”. I dunno why I found that so annoying. It does seem valuable to have multiple explanations available for good concepts
On the literature that addresses your question: here is a classic LW post on this sort of question.
You point out that length of a description in English and length in code don’t necessarily correlate. I think for English sentences that are actually constraining expectations, there is a fairly good correlation between length in English and length in code.
There’s the issue that the high-level concepts we use in English can be short, but if we were writing a program from scratch using those concepts, the expansion of the concepts would be large. When I appeal to the concept of a buffer overflow when explaining how someone knows secrets from my email, the invocatory phrase “buffer overflow” is short, but the expansion out in terms of computers and transistors and semiconductors and solid state physics is rather long.
But I’m in the game of trying to explain all of my observations. I get to have a dictionary of concepts that I pay the cost for, and then reuse the words and phrases in the dictionary in all my explanations nice and cheaply. Similarly, the computer program that I use to explain the world can have definitions or a library of code, as long as I pay the cost for those definitions once.
So, I’m already paying the cost of the expansion of “buffer overflow” in my attempt to come up with simple explanations for the world. When new data has to be explained, I can happily consider explanations using concepts I’ve already paid for as rather simple.