LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon(Raymond Arnold)
(Though there might be actions a first-time player can take to help pin down the rules of the game, that an experienced player would already know; I’m unclear on whether that counts for purposes of this exercise.)
I think one thing I meant in the OP was more about “the player can choose to spend more time modeling the situation.” Is it worth spending an extra 15 minutes thinking about how the longterm game might play out, and what concerns you may run into that you aren’t currently modeling? I dunno! Depends on how much better you become at playing the game, by spending those 15 minutes.
This is maybe a nonstandard use of “value of information”, but I think it counts.
Seems big if true and fairly plausible. I’d be interested in chipping in to pay for someone to come up with a methodology for investing this more and then running at it if the methodology seemed good.
(also it’s occurring to me it’d be cool to have a “Dollars!/Unit of Caring” react)
I’m not mesaoptimizer, but, fyi my case is “I totally didn’t find IFS type stuff very useful for years, and the one day I just suddenly needed it, or at least found myself shaped very differently such that it felt promising.” (see My “2.9 trauma limit”)
My general plan is to mix “work on your real goals” (which takes months to find out if you were on the right track) and “work on faster paced things that convey whether you’ve gained some kind of useful skill you didn’t have before”.
My goal right now is to find (toy, concrete) exercises that somehow reflect the real world complexity of making longterm plans, aiming to achieve unclear goals in a confusing world.
Things that seem important to include in the exercise:
“figuring out what the goal actually is”
“you have lots of background knowledge and ideas of where to look next, but the explosion of places you could possibly look is kinda overwhelming”
managing various resources along the way, but it’s not obvious what those resources are.
you get data from the world (but, not necessarily the most important data)
it’s not obvious how long to spend gathering information, or refining your plan
it’s not obvious whether your current strategy is anywhere close to the best one
The exercise should be short (ideally like a couple hours but maybe a day or a hypothetically a week), but, somehow metaphorically reflects all those things.
Previously I asked about strategy/resource management games you could try to beat on your first try. One thing I bump into is that often the initial turns are fairly constrained in your choices, only later does it get complex (which is maybe fine, but, for my real world plans, the nigh-infinite possibilities seem like the immediate problem?)
why is it bad to lose/regain?
Lots of people have mentioned various flavors of roguelikes. One of my goals is to have games in different genres. I agree that roguelikes are often a good source of the qualities I’m looking for here but part of the point is to try applying the same skills on radically different setups.
Another thing I’m interested in is “ease of setup”, where you can download the game, open it up, and immediately be in the experience instead of having to do a bunch of steps to get there.
Say more?
too acronymed for me :(
One-shot strategy games?
I was going off a vague sense from having talked to a few people who had scanned the literature more than I.
Right now I’m commissioning a lit review about “transfer learning”, “meta learning”, and things similar to that. My sense so far is that there aren’t a lot of super impressive results, but part of that looks like it’s because it’s hard to teach people relevant stuff in a “laboratory”-esque setting.
My Anthropic take, which is sort of replying to this thread between @aysja and @LawrenceC but felt enough of a new topic to just put here.
It seems overwhelmingly the case that Anthropic is trying to thread some kind of line between “seeming like a real profitable AI company that is worth investing in”, and “at the very least paying lip service to, and maybe, actually taking really seriously, x-risk.”
(This all goes for OpenAI too. OpenAI seems much worse on these dimensions to me right now. Anthropic feels more like it has the potential to actually be a good/safe org in a way that OpenAI feels beyond hope atm, so I’m picking on Anthropic)
For me, the open, interesting questions are:
Does Dario-and-other-leadership have good models of x-risk, and mitigation methods thereof?
How is the AI Safety community supposed to engage with an org that is operating in epistemically murky territory?
Like, it seems like Anthropic is trying to market itself to investors and consumers as “our products are powerful (and safe)”, and trying to market itself to AI Safety folk as “we’re being responsible as we develop along the frontier.” These are naturally in tension.
I think it’s plausible (although I am suspicious) that Anthropic’s strategy is actually good. i.e. maybe you really do need to iterate on frontier AI to do meaningful safety work, maybe you do need to stay on the frontier because the world is accelerating whether Anthropic wants it to or not. Maybe pausing now is bad. Maybe this all means you need a lot of money, which means you need investors an consumers to believe your product is good.
But, like, for the AI safety community to be epistemically healthy, we need to have some way of engaging with this question.
I would like to live in a world where it’s straightforwardly good to always spell out true things loudly/clearly. I’m not sure I have the luxury of living in that world. I think I need to actually engage with the possibility that it’s necessary for Anthropic to murkily say one thing to investors and another thing to AI safety peeps. But, I do not think Anthropic has earned my benefit of the doubt here.
But, the way I wish the conversation was playing out was less like “did Anthropic say a particular misleading thing?” and more like “how should EA/x-risk/safety folk comport themselves, such that they don’t have to trust Anthropic? And how should Anthropic comport itself, such that it doesn’t have to be running on trust, when it absorbs talent and money from the EA landscape?”
I feel some kinda missing mood in these comments. It seems like you’re saying “Anthropic didn’t make explicit commitments here”, and that you’re not weighting as particularly important whether they gave people different impressions, or benefited from that.
(AFAICT you haven’t explicitly stated “that’s not a big deal”, but, it’s the vibe I get from your comments. Is that something you’re intentionally implying, or do you think of yourself as mostly just trying to be clear on the factual claims, or something like that?)
I think something going on here is the hypothetical “you actually have to pick one of these two” is pretty weird, normally you have the option to walk away. If I find myself in such a hypothetical it seems more likely “well, somehow I’m gonna have to make use of these coupons” in a way that doesn’t seem normally true.
Two interesting observations from this week, while interviewing people about their metacognitive practies.
@Garrett Baker said that he had practiced memorizing theorems for linear algera awhile back, and he thinks this had (a side effect?) of creating a skill of “memorizing stuff quickly”, which then turned into some kind of “working memory management” tool. It sounded something like “He could quickly memorize things and chunk them, and then he could do that on-the-fly while reading math textbooks”.
@RobinGoins had an experience of not being initially able to hold all their possible plans/goals/other in working memory, but then did a bunch of Gendlin Focusing on them, and then had an easier time holding them all. It sounds like the Gendlin Focusing was playing a similar role to the “fast memorization” thing, of “finding a [nonverbal] focusing handle for a complex thing”, where the focusing handle was able to efficiently unpack into the full richness of the thing they were trying to think about.
Both of these are interesting because they hint at a skill of “rapid memorization ⇒ improved working memory”.
@gwern has previously written about Dual N Back not actually working that well at improving IQ. It seems like history is littered with corpses of people trying to improve IQ or g, so I’m not too optimistic here. My current assumption/guess is that the Dual N Back stuff trained a particular skill that turned out not to transfer to other domains.
But, like, even if “rapidly memorize math proofs” didn’t generalize to anything other than memorizing math proofs, it feels plausible to me that this could at least help with situations where that particular skill is useful, and might be worth it even without domain transfer.
And I could imagine that there’s something of a skill of “learn to rapidly chunk content in a given domain”, which doesn’t automatically translate to other domains, but which makes it easier to learn to chunk new types of domains, similar to how learning one language doesn’t let you speak all languages but makes it easier to learn ones.
Curated.
It’s been awhile since I properly boggled at this topic. I remember reading Beyond the Reach of God and feeling like it conveyed something that had managed to never come across despite all my years of exploring atheism as a topic. Like, I already believed there wasn’t a God and that bad things happened, but somehow it made me do a doubletake and go like “no, really tho”.
But I’ve remained a little confused or at a loss for words about “what exactly was happening, when I read Beyond the Reach of God?”, and I feel like this post does a good job putting words to that and exploring it in detail.
I think this post is more longwinded and meandering than I’d like, but I think it also somewhat benefits from that, since it serves as a kind of guided meditation on the topic that isn’t necessarily right to “rush”.
I wasn’t quite sure from your phrasings:
Do you think replacing (or at least combining) LW Review with the Open Problems frame would be an improvement on that axis?
Also: does it seem useful to you to measure overall progress on [the cluster of good things that the rationality and/or alignment community are pointed at?]?
I’m not 100% sure I got your point.
I think (but am unsure) that what I care about is more like a metric for “is useful intellectual progress getting made” (whether or not LessWrong-the-website was causal in that progress).
The point here is not to evaluate the Lightcone team’s work, but for the community to have a better benchmark for it’s collective progress (which then hopefully, like, improves credit-assignment which then hopefully improves our ability to collectively focus on useful stuff as the community scales)
This point does seem interesting though and maybe a different frame than I had previously been thinking in:
The marginal contribution of LW is more in making it more likely that better posts are read and in making various conversations happen (with a variety of other diffuse potential advantages).
Those numbers sound reasonable to me (i.e. I might give similar numbers, although I’d probably list different posts than you)
Another angle I’ve had here: in my preferred world, the “Best of LessWrong” page leaves explicit that, in some sense, very few (possibly zero?) posts actually meet the bar we’d ideally aspire to. The Best of LessWrong page highlights the best stuff so far, but I think it’d be cool if there was a deliberately empty, aspirational section.
But, then I feel a bit stuck on “what counts for that tier?”
Here’s another idea:
Open Problems
(and: when voting on Best of LessWrong, you can ‘bet’ that a post will contribute to solving an Open Problem)
Open Problems could be a LessWrong feature which is basically a post describing an important, unsolved problem. They’d each be owned by a particular author or small group, who get to declare when they consider the problem “solved.” (If you want people to trust/care about the outcome of particular Open Problem, you might choose two co-owners who are sort of adversarial collaborators, and they have to both agree it was solved)
Two use-cases for Open Problems could be:
As a research target for an individual researcher (or team), i.e. setting the target they’re ultimately aiming for.
As a sort of X-Prize, for others to attempt to contribute to.
So we’d end up with problem statements like:
“AI Alignment for superintelligences is solved” (maybe Eliezer and Paul cosign a problem statement on that)
You (Ryan) and Buck could formulate some kind of Open Problem on AI Control
I’d like to be some kind of “we have a rationality training program that seems to demonstrably work”
And then there’s a page that highlights “these are the open problems people on LessWrong have upvoted the most as ‘important’”, and “here are the posts that people are betting will turn out to be relevant to the final solution.” (maybe this is operationalized as, like, a manifold market bet about whether the problem-author will eventually declare a given post to be an important contribution)
Yeah I do not super stand by how I phrased it in the post. But also your second paragraph feels wrong to me too – in some sense yes Chess and Slay the Spire hidden information are “the same”, but, like, it seems at least somewhat important that in Slay the Spire there are things you can’t predict by purely running simulations forward, you have to have a probability distribution over pretty unknown things.
(I’m not sure I’ll stand by either this or my last comment, either. I’m thinking out loud, and may have phrased things wrong here)