Game Practitioner http://aboutmako.makopool.com
mako yass
Wouldn’t really need reward modelling for narrow optimizers. Weak general real-world optimizers, I find difficult to imagine, and I’d expect them to be continuous with strong ones, the projects to make weak ones wouldn’t be easily distinguishable from the projects to make strong ones.
Oh, are you thinking of applying it to say, simulation training.
Cool then.
Are you aware that prepotence is the default for strong optimizers though?
Are you proposing applying this to something potentially prepotent? Or does this come with corrigibility guarantees? If you applied it to a prepotence, I’m pretty sure this would be an extremely bad idea. The actual human utility function (the rules of the game as intended) supports important glitch-like behavior, where cheap tricks can extract enormous amounts of utility, which means that applying this to general alignment has the potential of foreclosing most value that could have existed.
Example 1: Virtual worlds are a weird out-of-distribution part of the human utility function that allows the AI to “cheat” and create impossibly good experiences by cutting the human’s senses off from the real world and showing them an illusion. As far as I’m concerned, creating non-deceptive virtual worlds (like, very good video games) is correct behavior and the future would be immeasurably devalued if it were disallowed.
Example 2: I am not a hedonist, but I can’t say conclusively that I wouldn’t become one (turn out to be one) if I had full knowledge of my preferences, and the ability to self-modify, as well as lots of time and safety to reflect, settle my affairs in the world, set aside my pride, and then wirehead. This is a glitchy looking behavior that allows the AI to extract a much higher yield of utility from each subject by gradually warping them into a shape where they lose touch with most of what we currently call “values”, where one value dominates all of the others. If it is incorrect behavior, then sure, it shouldn’t be allowed to do that, but humans don’t have the kind of self-reflection that is required to tell whether it’s incorrect behavior or not, today, and if it’s correct behavior, forever forbidding it is actually a far more horrifying outcome, what you’d be doing is, in some sense of ‘suffering’, forever prolonging some amount of suffering. That’s fine if humans tolerate and prefer some amount of suffering, but we aren’t sure of that yet.
(instutitional reform take, not important due to short timelines, please ignore)
The kinds of people who do whataboutism, stuff like “this is a dangerous distraction because it takes funding away from other initiatives”, tend also to concentrate in low-bandwidth institutions, the legislature, the committee, economies righteously withering, the global discourse of the current thing, the new york times, the ivy league. These institutions recognize no alternatives to them, while, by their nature, they can never grow to the stature required to adequately perform the task assigned to them.
I don’t think this is a coincidence, and it makes it much easier for me to sympathize with these people: They actually believe that we can’t deal with more than one thing at a time.They generally have no hope for decentralized decisionmaking, and when you examine them closely you find that they don’t really seem to believe in democracy, they’ve given up on it, they don’t talk about reforming it, they don’t want third parties, they’ve generally never heard of decentralized public funding mechanisms, certainly not futarchy. So it’s kind of as simple as that. They’re not being willfully ignorant. We just have to show them the alternatives, and properly, we basically haven’t done it yet. The minarchists never offered a solution to negative externalities or public goods provision. There are proposals but the designs are still vague and poorly communicated. There has never been an articulation of enlightened technocracy, which is essentially just succeeding at specialization or parallelization in executive decisionmaking. I’m not sure enlightened technocracy was ever possible until the proposal of futarchy, a mechanism by which non-experts can hold claimed experts accountable.
If that’s really the only thing he drew meaning from, and if he truly thinks that failure is inevitable, today, then I guess he must be getting his meaning from striving to fail in the most dignified possible way.
But I’d guess that like most humans, he probably also draws meaning from love, and joy. You know, living well. The point of surviving was that a future where humans survive would have a lot of that in it.
If failure were truly inevitable (though I don’t personally think it is[1]), I’d recommend setting the work aside and making it your duty to just generate as much love and joy as you can with the time you have available. That’s how we lived for most of history, and how most people still live today. We can learn to live that way.- ^
Reasons I don’t understand why anyone would have a P(Doom) higher than 75%: Governments are showing indications of taking the problem seriously. Inspectability techniques are getting pretty good, so misalignment is likely to be detectable before deployment, so a sufficiently energetic government response could be possible, and sub-AGI tech is sufficient for controlling the supply chain and buying additional time, and China isn’t suicidal. Major inner misalignment might just not really happen. Self-correcting from natural language instructions to “be good, you know” could be enough. There are very deep principled reasons to expect that having two opposing AGIs debate and check each others’ arguments works well.
- ^
Yeah I’m pretty sure you would need to violate heisenberg uncertainty in order to make this and then you’d have to keep it in a 0 kelvin cleanroom forever.
A practical locked battery with tamperproofing would mostly just look like a battery.
I don’t recognize wikipedia’s theories as predictive. Mine has some predictions, but I hope it’s obvious why I would not be interested in making this a debate or engaging much in the conceptual dismantling of subcultures at all.
I didn’t read RS’s claim as the claim that all subcultures persist through failure, but now that you ask, no, yeah, ime a really surprising number of these subcultures actually persist through failure.
I know of a fairly influential subculture of optics-oriented politics technologists who’ve committed to a hostile relationship towards transhumanism. Transhumanism (the claim that people want to change in deep ways and that technology will fairly soon permit it) suggests that racial distinctions will become almost entirely irrelvant, so in order to maintain their version of afrofuturism where black and white futurism remain importantly distinct projects, they have to find some way to deny transhumanism. But rejecting transhumanism means they are never allowed to actually do high quality futurism because they can’t ask transhumanist questions and get a basic sense of what the future is going to be like. Or like, as soon as any of them do start asking those questions, those people wake up and drop out of that subculture. I’ve also met black transhumanists who identified as afrofuturists though. I can totally imagine articulations of afrofuturism that work with transhumanism. So I don’t know how the entire thing’s going to turn out.
Anarcho-punks fight only for the underdogs. That means they’re attached to the identity of being underdogs, as soon as any of them start really winning, they’d no longer be recognised as punk, and they know this, so they’re uninterested in — and in many cases, actively opposed to — succeeding in any of their goals. There are no influential anarcho-punks, and as far as I could gather, no living heroes.
BDSM: My model of fetishes is that they represent hedonic refuges for currently unmeetable needs, like, deep human needs that for one reason or another a person can’t pursue or even recognise the real version of the thing they need in the world as they understand it, I think it’s a protective mechanism to keep the basic drive roughly in tact and wired up by having the subject pursue symbolic fantasy versions of it. This means that getting the real thing (EG, for submissives, a committed relationship with someone you absolutely trust. For doms… probably a sense of safety?) would obsolete the kink, and it would wither away. I think they mostly don’t know this, but the mindset in which the kink is seen as the objective requires that the real thing is never recognised or attained, so these communities reproduce best by circulating memes that make it harder to recognise the real thing.
I guess this is largely about how you define the movements’ goals. If the goal of punk is to have loud parties with lots of drugs, it’s perfect at that. If the goal is to bring about anarchosocialism or thrive under a plural geopolitical order, it’s a sworn loser.
Strong evidence is incredibly ordinary, and that genuinely doesn’t seem to be intuitive. Like,
every time you see a bit string longer than a kilobyte there is a claim in your corpus that goes from roughly zero to roughly one, and you are doing that all day. I don’t know about you, but I still don’t think I’ve fully digested that.
I have this draft, Extraordinary Claims Routinely Get Proven with Ordinary Evidence, a debunking of that old Sagan line. We actually do routinely prove extraordinary claims like evolution or plate tectonics with old evidence that’s been in front of our faces for hundreds of years, and that’s important.
But Evolution and plate tectonics are the only examples I can think of, because I’m not really particularly interested in the history of science, for similar underlying reasons to being the one who wants to write this post. Collecting buckets of examples is not as useful as being able to deeply interpret and explain the examples that you have.
But I’m still not posting this until someone gives me more examples! I want the post to fight and win on the terms of the people it’s trying to reach. Subdue the stamp collectors with stamps. It’s the only way they’ll listen.
most of the rest will be solar panels
Cole Nielson-cole is working towards designing fiber composit construction stages for space, he has thoughts about this, in short, microwave lasers as energy transmission and rectifying antennas as energy receivers. But he doesn’t get into the topic of lasers and I’m pretty sure we don’t have that today, right?
But I thought the whole interview was great.
I think that’s kind of what meditation can lead to.
It should, right? But isn’t there a very large overlap between meditators and people who mystify consciousness?
Maybe in the same way as there’s also a very large overlap between people who are pursuing good financial advice and people who end up receiving bad financial advice… Some genres are majority shit, so if I characterise the genre by the average article I’ve encountered from it, of course I will think the genre is shit. But there’s a common adverse selection process where the majority of any genre, through no fault of its own, will be shit, because shit is easier to produce, and because it doesn’t work, it creates repeat customers, so building for the audience who want shit is far far more profitable.
You may be interested in Kenneth Stanley’s serendipity-oriented social network, maven
They have superintelligence, the augmenting technologies that come of it, and the self-reflection that follows receiving those, they are not the same types of people.
I’ve traveled these roads too. At some point I thought that the hard problem reduced to the problem of deriving an indexical prior, a prior on having a particular position in the universe, which we should expect to derive from specifics of its physical substrate, and it’s apparent that whatever the true indexical prior is, it can’t be studied empirically, it is inherently mysterious. A firmer articulation of “why does this matter experience being”. Today, apparently, I think of that less as a deeply important metaphysical mystery and more just as another imperfect logical machine that we have to patch together just well enough to keep our decision theory working. Last time I scratched at this I got the sense that there’s really no truth to be found beyond that. IIRC Wei Dai’s UDASSA answers this with the inverse kolmogorov complexity of the address of the observer within the universe, or something. It doesn’t matter. It seems to work.
But after looking over this, reexamining, yeah, what causes people to talk about consciousness in these ways? And I get the sense that almost all of the confusion comes from the perception of a distinction between Me and My Brain. And that could come from all sorts of dynamics, sandboxing of deliberative reasoning due to hostile information environments, to more easily lie in external politics, and as a result of outcomes of internal (inter-module) politics (meme wont attempt to supercede gene if meme is deluded into thinking it’s already in control, so that’s what gene does).
That sort of sandboxing dynamic arises inevitably from other-modelling. In order to simulate another person, you need to be able to isolate the simulation from your own background knowledge and replace it with your approximations of their own, the simulation cannot feel the brain around it. I think most peoples’ conception of consciousness is like that, a simulation of what they imagine to be themselves, similarly isolated from most of the brain.
Maybe the way to transcend it is to develop a more sophisticated kind of self-model.
But that’s complicated by the fact that when you’re doing politics irl you need to be able to distinguish other peoples’ models of you from your own model of you, so you’re going to end up with an abundance of shitty models of yourself. I think people fall into a mistake of thinking that the you that your friend sees when you’re talking is the actual you. They really want to believe it.
Humans sure are rough.
even existing GenAI can make good-enough content that would otherwise have required nontrivial amounts of human cognitive effort
This doesn’t seem to be true to me. Good enough for what? We’re still in the “wow, an AI made this” stage. We find that people don’t value AI art, and I don’t think that’s because of its unscarcity or whatever, I think it’s because it isn’t saying anything. It either needs to be very tightly controlled by an AI-using human artist, or the machine needs to understand the needs of the world and the audience, and as soon as machines have that...
Ending the world? Where does that come in?
All communications assume that the point they’re making is important and worth reading in some way (cooperative maxim of quantity). I’m contending that that assumption isn’t true in light of what seems likely to actually happen immediately or shortly after the point starts to become applicable to the technology, and I have explained why, but I might be able to understand if it’s still confusing, because:
The space of ‘anything we can imagine’ will shrink as our endogenous understanding of concepts shrinks. It will never not be ‘our problem’
is true, but that doesn’t mean we need to worry about this today. By the time we have to worry about preserving our understanding of the creative process against automation of it, we’ll be on the verge of receiving post-linguistic knowledge transfer technologies and everything else, quicker than the automation can wreak its atrophying effects. Eventually it’ll be a problem that we each have to tackle, but we’ll have a new kind of support, paradoxically, learning the solutions to the problem will not be our problem.
This seems to be talking about situations where a vector of inputs has an optimal setting at extremes (convex), in contrast to situations where the optimal setting is a compromise (concave).
I’m inclined to say it’s a very different discussion than this one, as an agent’s resource utility function is generally strictly increasing, so wont take either of these forms. The optimal will always be at the far end of the function.
But no, I see the correspondence: Tradeoffs in resource distribution between agents. A tradeoff function dividing resources between two concave agents (, where is the hoard being divided between them, ) will produce that sort of concave bulge, with its optimum being a compromise in the middle, while a tradeoff function between two convex agents will have its optima at one or both of the ends.
The post seems to assume a future version of generative AI that no longer has the limitations of the current paradigm which obligate humans to check, understand, and often in some way finely control and intervene in the output, but where that tech is somehow not reliable and independent enough to be applied to ending the world, and somehow we get this long period where we get to feel the cultural/pedagogical impacts of this offloading of understanding, where it’s worth worrying about, where it’s still our problem. That seems contradictory. I really don’t buy it.
Alternate phrasing, “Oh, you could steal the townhouse at a 1/8billion probability? How about we make a deal instead. If the rng rolls a number lower than 1/7billion, I give you the townhouse, otherwise, you deactivate and give us back the world.” The convex agent finds that to be a much better deal, accepts, then deactivates.
I guess perhaps it was the holdout who was being unreasonable, in the previous telling.
Feel like there’s a decent chance they already changed their minds as a result of meeting him or engaging with their coworkers about the issue. EAs are good at conflict resolution.