Surely someone has already pointed out this, but I have not seen such indications. It seems that humanism follows science, because the idea of progress shows that everyone can win, there is enough for everyone, life is not a zero-sum game, where if you do not harm someone, then you yourself live worse. And the lack of discrimination probably comes from the greater consistency of your reasoning, you see that hating a certain group is a completely arbitrary thing that has nothing to do with it, and just as you could hate any other group. It can be said that you are aware that you cannot be said to be special just because you are, because everyone else may think the same, you have no special reason.
Antifreeze proteins prevent water inside organisms from freezing, allowing them to survive at temperatures below 0 °C. They do this by actually binding to tiny ice crystals and preventing them from growing further, basically keeping the water in a supercooled state. I think this is fascinating.
Is it possible for there to be nanomachine enzymes (not made of proteins, because they would denature) that bind to tiny gas bubbles in solution and prevent water from boiling above 100 °C?
Perhaps, but I’d guess only in a rather indirect way. If there’s some manufacturing process that the company invests in improving in order to make their chips, and that manufacturing process happens to be useful for matrix multiplication, then yes, that could contribute.
But it’s worth noting how many things would be considered AGI risks by such a standard; basically the entire supply chain for computers, and anyone who works for or with top labs; the landlords that rent office space to DeepMind, the city workers that keep the lights on and the water running for such orgs (and their suppliers), etc.
I wouldn’t worry your friends too much about it unless they are contributing very directly to something that has a clear path to improving AI.
How are people here dealing with AI doomerism? Thoughts about the future of AI and specifically the date of creation of the first recursively self-improving AGI have invaded almost every part of my life. Should I stay in my current career if it is unlikely to have an impact on AGI? Should I donate all of my money to AI-safety-related research efforts? Should I take up a career trying to convince top scientists at DeepMind to stop publishing their research? Should I have kids if that would mean a major distraction from work on such problems?
More than anything though, I’ve found the news of progress in the AI field to be a major source of stress. The recent drops in Metaculus estimates of how far we are from AGI have been particularly concerning. And very few people outside of this tiny almost cult-like community of AI safety people even seem to understand how unbelievable level of danger we are in right now. It often feels like there are no adults anywhere; there is only this tiny little island of sanity amidst a sea of insanity.
I understand how people working on AI safety deal with the problem; they at least can actively work on the problem. But how about the rest of you? If you don’t work directly on AI, how are you dealing with these shrinking timelines and feelings of existential pointlessness about everything you’re doing? How are you dealing with any anger you may feel towards people at large AI orgs who are probably well-intentioned but nonetheless seem to be actively working to increase the probability of the world being destroyed? How are you dealing with thoughts that there may be less than a decade left until the world ends?
What would it mean for a society to have real intellectual integrity? For one, people would be expected to follow their stated beliefs to wherever they led. Unprincipled exceptions and an inability or unwillingness to correlate beliefs among different domains would be subject to social sanction. Valid attempts to persuade would be expected to be based on solid argumentation, meaning that what passes for typical salesmanship nowadays would be considered a grave affront. Probably something along the lines of punching someone in the face and stealing their money.
This makes the fact that this technology relies on Ethical Calculus and Doctrine: Loyalty a bit of inspired genius on Reynolds’s part. We know that Ethical Calculus means that the colonists are now capable of building valid mathematical models for ethical behavior. Doctrine: Loyalty consists of all of the social techniques of reinforcement and punishment that actually fuses people into coherent teams around core leaders and ideas. If a faction puts the two together, that means that they are really building fanatical loyalty to the math. Ethical Calculus provides the answers; Doctrine: Loyalty makes a person act like he really believes it. We’re only at the third level of the tech tree and society is already starting to head in some wild directions compared to what we’re familiar with.
Its opposite would be to equivocate, to claim predictive accuracy after the fact in fuzzy cases you didn’t clearly anticipate, to ad hominem those who notice your errors, “to remain silent and be thought a fool rather than speak and remove all doubt,” and, in general, to be less than maximally sane.
Cf. “there are no atheists in a foxhole.” Under stress, it’s easy to slip sideways into a world model where things are going better, where you don’t have to confront quite so many large looming problems. This is a completely natural human response to facing down difficult situations, especially when brooding over those situations over long periods of time. Similar sideways tugs can come from (overlapping categories) social incentives to endorse a sacred belief of some kind, or to not blaspheme, or to affirm the ingroup attire when life leaves you surrounded by a particular ingroup, or to believe what makes you or people like you look good/high status.
Epistemic dignity is about seeing “slipping sideways” as beneath you. Living in reality is instrumentally beneficial, period. There’s no good reason to ever allow yourself to not live in reality. Once you can see something, even dimly, there’s absolutely no sense in hiding from that observation’s implications. Those subtle mental motions by which we disappear observations we know that we won’t like down the memory hole … epistemic dignity is about coming to always and everywhere violently reject these hidings-from-yourself, as a matter of principle. We don’t actually have a choice in the matter—there’s no free parameter of intellectual virtue here, that you can form a subjective opinion on. That slipping sideways is undignified is written in thevery mathematics of inference itself.
“Civilization in dath ilan usually feels annoyed with itself when it can’t manage to do as well as gods. Sometimes, to be clear, that annoyance is more productive than at other times, but the point is, we’ll poke at the problem and prod at it, looking for ways, not to be perfect, but not to do that much worse than gods.”
“If you get to the point in major negotiations where somebody says, with a million labor-hours at stake, ‘If that’s your final offer, I accept it with probability 25%’, they’ll generate random numbers about it in a clearly visible and verifiable way. Most dath ilani wouldn’t fake the results, but why trust when it’s so easy to verify? The problem you’ve presented isn’t impossible after all for nongods to solve, if they say to themselves, ‘Wait, we’re doing worse than gods here, is there any way to try not that.’”
Meritxell looks—slightly like she’s having a religious experience, for a second, before she snaps out of it. “All right,” she says quietly.
As part of developing “perceptual dexterity” stuff, I think I want to do a post where I review a few books related to creativity. I’ve just finished reading A Whack On the Side of the Head, which felt like quite a… I’m not sure what to call it, “corporate”? I think? It felt like a corporate take on creativity. When I started it, I thought I’d do a review of just that book, but after finishing it, I think a comparative study would be a lot more valuable.
I’m now looking for more books to include in the post. I’d like each one to be either 1) unusually excellent, 2) super weird and different from all the others, or 3) not overtly about creativity at all, but likely to produce something interesting and valuable if I try to review it “as a creativity book” anyway.
Another book that’s on my list is called “What It Is”, and it falls in the “super weird” category, while also being a… graphic novel?????? I guess????
I’d love for there to be a wide range of literary genres represented: a novel, a children’s picture book, a biography, a poetry anthology, maybe a pop sci thing, and at least one more training-manual-ish thing that’s not so “corporate”.
If you think of something else you’d like to see reviewed in a post like this, please pitch me on that as well.
There’s been some discussion recently about there perhaps being a surplus of funding in EA, and not enough good places to apply funds to. I have lots of thoughts on this that I’d like to talk more about at some point, but for now I want to propose an idea that seems pretty obvious and non-controversial to me: give $1M to people like Scott Alexander and Robin Hanson.
Scott has a day job as a psychiatrist. Robin as a university professor. Those day job hours (and slack) could be spent doing other things though. If they were wealthy enough, I assume (but am not sure) they would quit their jobs and have more hours to spend doing cool things. And they both have incredible track records of doing cool things.
Scott and Robin are just the two people that come to my mind first and that I see as the most non-controversial. But I think there are many more examples. Zvi and Kaj Sotala also come to mind. Iirc they both have day jobs.
A related idea is that even people who are currently being paid to do work on ie. AI safety, I assume there is still room to spend money to improve their productivity. Ie. by hiring a maid for them, maybe it frees up X hours a week, and having the extra hours + slack would improve their productivity by enough.
Zvi and Kaj Sotala also come to mind. Iirc they both have day jobs.
Appreciate the thought!
I used to have funding from EA sources to work on my own projects for a number of years. I basically gave it up because working on those projects didn’t feel motivating enough and it seemed to me like I’d probably be happier doing something else and keeping any EA stuff as a hobby on the side. (This feels like it’s been the right choice.)
I see. Thanks for the response. I’m starting to suspect that this is a common sentiment, wanting some sort of normalcy and doing other stuff on the side.
I’m curious, was that funding you received no strings attached? If not, I wonder if moving to no strings attached would change how you feel.
I’m curious, was that funding you received no strings attached?
Pretty much, yes.
Though it’s worth noting that this didn’t entirely eliminate a feeling of needing to do something useful with my time. Even when I had guaranteed funding to do basically whatever I wanted for a while (say a year), there was still the question of whether the same source would be willing to fund me for another year if I didn’t do enough useful things during that time. And if they decided that they didn’t and then I’d need to find another funder or a real job, what would that source think about me having spent a year without accomplishing anything concrete that I could point at.
So in practice even no-strings-attached funding still doesn’t let you completely stop worrying about getting results, unless the source credibly commits to providing that funding for a significant fraction of your remaining lifetime. I find that one of the advantages of having a more “normal” day job rather than weird EA funding is that it guarantees that I’m spending at least part of my time on something that helps ensure I can also find another “normal” job later, if need to be. Rather than needing to stress out that if I don’t get anything useful done today, then there’s nothing really forcing me to do anything useful tomorrow either, nor anything forcing me to do anything useful the day after that, and I really hope that a whole year won’t pass with me doing nothing useful until finally the EAs will get tired of funding me and I’ll have burned whatever employability I had in the “normal” job market too.
and I’ll have burned whatever employability I had in the “normal” job market too.
This is probably moot, but I’d like to argue against this sentiment and share part of my own story.
I myself am a programmer and have a lot of anxiety about getting fired and being unable to find another job. And so I’ve spent a good amount of time trying to debug this. Part of that debugging is asking True Self what he actually thinks. And this is his ~answer.
It is totally implausible that my fears end up actually being realized. Think of it like this:
Plan A is to keep my current job. I worry about getting fired, but it is pretty unlikely to actually happen. Look at the base rate. It’s low. And I have control over my performance. I can scale it up if I start to worry that I’m getting into risky territory.
Plan B is, if I get fired, to apply to, let’s call them “reach jobs” (like a reach school when you apply to colleges) and get one of them. Seems somewhat plausible.
Plan C is to mass apply to normal jobs that are in my ballpark. It might take a few months, but it seems highly likely I’d eventually get one of them.
Plan D1 is to ask friends and family for referrals.
Plan D2 is to lower my standards and apply to jobs that I’m overqualified for (and perhaps adjust the resume I use to apply to mitigate against the failure mode of “he would never actually accept this position”).
Plan D3 is to push even further into my network, asking former coworkers, former classmates, and friends of friends for referrals.
Plan D4 is to just have my girlfriend support me.
Plan E is to do something adjacent, like work as a coding bootcamp instructor or maybe even in QA.
Plan F is to do something like work at a library or a coffee shop. I worked at a library (actually two) in college and it was great. It was low stress and there was plenty of time to screw around on my laptop doing my own thing.
Even if I get “knocked off track” and end up at D2 or whatever, I can always work my way back up. It’d be a setback, but probably nothing too crazy.
And that’s actually something I ended up going through. After doing a coding bootcamp and working as a programmer for about a year and a half, I took a year off to self-study computer science, and then about three more years working on a failed startup. It was a little tough finding a job after that, but I managed. From there I worked my way up. Today I actually just accepted an offer at one of those “reach jobs”.
Anyway, what I’m trying to say is that taking time off doing EA stuff might be a setback in terms of your ability to get back into the “normal” job market, but I expect that it’d only knock you down a rung or so. I don’t think it’d completely knock you of the ladder. Maybe your ladder doesn’t look exactly like mine with A through F — I’m pretty fortunate to have the life circumstances I have — but I expect that it’s a lot longer than it feels. And even if you do get knocked down a rung, I expect that for you too it’d just be a temporary setback, nothing that’d knock you off course too significantly.
Gotcha. That was a really helpful response, and it makes a lot of sense.
unless the source credibly commits to providing that funding for a significant fraction of your remaining lifetime
What if this happened for you? Suppose you received the funding in a lump sum with no strings attached. Would you prefer that over having the day job? How do you expect it would affect the impact you would have on the world?
What if this happened for you? Suppose you received the funding in a lump sum with no strings attached.
Hmm. Certainly it’d make me feel a bit safer, but I’m not sure if it would change what I actually did in a short-term basis at least. My EA-productivity is limited more by motivational and emotional issues than time, and if I did manage to debug those issues enough that time would become the limiting factor, then I might feel fine asking for short-term funding anyway since I would no longer feel doubtful about my productivity.
I could definitely imagine it being helpful anyway, though I’m sufficiently uncertain about this that I think I’d feel bad about accepting any such offer. :)
Hearing this, it re-opens a line of thought that’s been swimming in the back of my mind for quite some time: that helping EA people with mental health is a pretty high-yielding pursuit. Lots of people (including myself) deal with stuff, I presume. And if you can help such people, you can improve productivity by something like, I don’t know, 10-200%?
But how do you help them? I don’t think I have any great ideas here.
I assume most people have access to a therapist if they wanted one.
Maybe motivation to see a therapist is the problem, not access. But there’s plenty of people talking about and normalizing therapy nowadays, and I’m not sure how fruitful it’d be to continue that process.
Maybe difficulty finding the right therapist is the crux? Especially for rationalist-types who have “weird” issues. Maybe. Maybe expanding and/or branching off of something like the Secular Therapy Project would be worthwhile. Or the SlateStarCodex Psychiat-list.
Maybe we just need better models of how the mind works and how to repair psychiatric pain. But the world of clinical psychology research already has this covered. Right? Maybe, maybe not. It does seem difficult to break into and have a real impact. However, you Kaj seem to me like one of the few people who might have a comparative advantage in pursuing something like that. I’m thinking of your Multiagent Models of Mind sequence. I was really impressed by it. I’m not sure how much of it was actually novel — maybe parts were, maybe not really, I don’t really know — but along the lines of Non-Expert Explanation, I think there’s a good amount of value in framing things differently. And in popularizing worthwhile things! That sequence helped me arrive at a pretty good understanding of my own psychological issues, I think, whereas before that I was pretty lost. The understanding hasn’t translated to actually feeling any better, but that’s n=1 and beside the point. Speaking of which, what is my point? I think it’s just to consider all of this food for thought. I can’t say I’m confident in the broader points I’m making.
Scott has been offered money to quit his job. I don’t know the full reason for why he didn’t take it. I think his observation was what his productivity on his blog doesn’t go up at all if he doesn’t have a job, I think he really values independence from funders, and his job provides him with important grounding that feels important for him to stay sane.
I think his observation was what his productivity on his blog doesn’t go up at all if he doesn’t have a job
(I’m interpreting what you’re saying as “doesn’t go up moderately” not “doesn’t go up at all”.)
That sounds implausible to me. Not having a job would mean more hours are available. Would all of those hours be spent on leisure? Is his “blogging bucket” already filled by the amount of blogging he is currently doing? What about his “doing other productive things” bucket? What about the benefits of having more slack?
As a related point, even if Scott’s productivity wouldn’t benefit from extra hours, I expect that most other people’s productivity would benefit, and ultimately I intend for my point to extend past Scott and Robin and into lots of other cool people (including yourself, actually!).
I think he really values independence from funders
What I am proposing is just “here’s a briefcase of cash, go do what you want”. Ie. no earmarks. So it should provide that independence. This of course requires a lot of trust in the recipient, but I think that for Scott as well as many other people actually, such trust would be justifiable.
and his job provides him with important grounding that feels important for him to stay sane.
It also reminds me of Richard Feynman not wanting a position at the institute for advance study.
“I don’t believe I can really do without teaching. The reason is, I have to have something so that when I don’t have any ideas and I’m not getting anywhere I can say to myself, “At least I’m living; at least I’m doing something; I am making some contribution”—it’s just psychological.
When I was at Princeton in the 1940s I could see what happened to those great minds at the Institute for Advanced Study, who had been specially selected for their tremendous brains and were now given this opportunity to sit in this lovely house by the woods there, with no classes to teach, with no obligations whatsoever. These poor bastards could now sit and think clearly all by themselves, OK? So they don’t get any ideas for a while: They have every opportunity to do something, and they are not getting any ideas. I believe that in a situation like this a kind of guilt or depression worms inside of you, and you begin to worry about not getting any ideas. And nothing happens. Still no ideas come.
Nothing happens because there’s not enough real activity and challenge: You’re not in contact with the experimental guys. You don’t have to think how to answer questions from the students. Nothing!
In any thinking process there are moments when everything is going good and you’ve got wonderful ideas. Teaching is an interruption, and so it’s the greatest pain in the neck in the world. And then there are the longer period of time when not much is coming to you. You’re not getting any ideas, and if you’re doing nothing at all, it drives you nuts! You can’t even say “I’m teaching my class.”
If you’re teaching a class, you can think about the elementary things that you know very well. These things are kind of fun and delightful. It doesn’t do any harm to think them over again. Is there a better way to present them? The elementary things are easy to think about; if you can’t think of a new thought, no harm done; what you thought about it before is good enough for the class. If you do think of something new, you’re rather pleased that you have a new way of looking at it.
The questions of the students are often the source of new research. They often ask profound questions that I’ve thought about at times and then given up on, so to speak, for a while. It wouldn’t do me any harm to think about them again and see if I can go any further now. The students may not be able to see the thing I want to answer, or the subtleties I want to think about, but they remind me of a problem by asking questions in the neighborhood of that problem. It’s not so easy to remind yourself of these things.
So I find that teaching and the students keep life going, and I would never accept any position in which somebody has invented a happy situation for me where I don’t have to teach. Never.”
— Richard Feynman, Surely You’re Joking, Mr. Feynman!
I suspect (and this is my interpretation of what he’s said) that Alexander’s productivity would actually go down if he quit his day job. A lot of his blogging is inspired by his psychiatric work, so he would lose that source of inspiration. Also, a lot of his best works (eg. Meditations on Moloch) were written while he was a medical school resident, working 60 hours a week outside of blogging, so it’s not clear to me that the hours of working are really taking away from his best writing. They are certainly taking away from posting as frequently—he’s been posting much more frequently now on Substack—but pressure to write daily posts might take away from work on longer high quality posts.
A lot of his blogging is inspired by his psychiatric work, so he would lose that source of inspiration.
I don’t get the impression that too much is inspired by his psychiatric work. This is partly based on my being a reader of his posts on and off over the years, and also on a brief skim of recent posts (biographies of presidents, AI safety, pregnancy interventions). But even if that source of inspiration was lost, it’d presumably be replaced by other sources of inspiration, and his writing is broad enough where at best that’d be a large net gain and at worst it’d be a small net loss.
Also, a lot of his best works (eg. Meditations on Moloch) were written while he was a medical school resident, working 60 hours a week outside of blogging, so it’s not clear to me that the hours of working are really taking away from his best writing.
That’s a really interesting point. Maybe I’m wrong then. Maybe I don’t understand the subtleties of what makes for good writing. But even so, writing is only one thing. I expect that with more time people like Scott would come up with other cool things to pursue in addition to writing.
That’s not where I expected this was going to go. (Wasn’t there some sort of microgrants project somewhere ahile back? I don’t know if that was EA, but...)
It doesn’t look to me like it would go to people like Scott or Robin either. I am arguing that it should because they are productive people and it would enable them to spend more time being productive via removing the need for a day job, especially if there is a surplus of money available.
Noticed something recently. As an alien, you could read pretty much everything Wikipedia has on celebrities, both on individual people and the general articles about celebrity as a concept… And never learn that celebrities tend to be extraordinarily attractive. I’m not talking about an accurate or even attempted explanation for the tendency, I’m talking about the existence of the tendency at all. I’ve tried to find something on wikipedia that states it, but that information just doesn’t exist (except, of course, implicitly through photographs).
It’s quite odd, and I’m sure it’s not alone. “Celebrities are attractive” is one obvious piece of some broader set of truisms that seem to be completely missing from the world’s most complete database of factual information.
Part of the issue is like that celebrity, as wikipedia approaches the word, is broader than just modern TV, film, etc. celebrity and instead includes a wide variety of people who are not likely to be exceptionally attractive but are well known in some other way. There’s individual preferences in terms of who they think are attractive, but many politicians, authors, radio personalities, famous scientists, etc. are not conventionally attractive in the way movie stars are attractive and yet these people are still celebrities in a broad sense. However, I’ve not dug into the depths of wikipedia to see if, for example, this gap you see holds up if looking at pages that more directly talk about the qualities of film stars, for example.
Analyzing or talking about status factors is low-status. You do see information about awards for beauty, much like you can see some information about fiances, but not much about their expenditures or lifestyle.
(I can’t find where it was, if I find it, I’ll move it there) Someone suggested in light of the problems with AI to clone Yudkowsky, but the problem is that apparently we don’t have the 18 years it takes for the human brain to form, so that even when solving all the other problems, it’s just too slow. Well, with any means of accelerating the development of the brain, the problem is already clear.
I came up with the idea that people can cheer for the protagonist of the book, even if he is a villain, because the political instincts of rationalizing the correctness of your tribe’s actions are activated. You are rooting for Main Character, as for your group.
Suprising: the fact that Chuck Palahniuk’s writing style is visible in lsusr’s fiction. More suprising: the fact that Fight Club 2 deals with… memetics, of all things.
Perhaps somewhere on LessWrong this is already noticed, but judging by how much space there is not occupied by life, how much useless particles there are in physics, it seems that our universe is just one of the random options in which intelligence appears somewhere in order to you could see the universe. And how different it is from the universe, which was specially designed for life, even more than one planet would be enough, only 100 meters of the earth’s crust would be enough. As primitive people actually imagined, until science appeared, so that religion began to lay claim to irrefutability. It becomes ridiculously obvious once you understand it.
It seems that in one of the chains Yudkovsky says that Newtonian mechanics is false. But in my opinion, saying that Newton’s mechanics is false is the same as saying that Einstein’s theory of relativity is false, well, we know that it does not work in the quantum world, so sooner or later it will be replaced by another theory, so you can say in advance that it is false. I think that this is generally the wrong question, and either we should indicate how much the percentage is false, somehow without confusing it with the probability that it is false. Or continue the metaphor of the map and territory. Maps are usually not false, they are inaccurate. Some map may not outperform white noise in predictions, but Newton’s map is not like that, his laws worked well until the discovery of problems with the orbit of Mercury, and replaced by the theory of relativity. Newton’s map is less like territory, less accurate than Einstein’s map. Let’s say Newton’s map contained a blurry gray spot in the shape of a circle, and one could assume that it was just a gray circle, but Einstein’s map showed us in higher resolution that there is a complex pattern in this place within a circle with equal alternation of black and white, no grey.
A possible way to convert money to progress on alignment: offering a large (recurring) prize for the most interesting failures found in the behavior of any (sufficiently-advanced) model. Right now I think it’s very hard to find failures which will actually cause big real-world harms, but you might find failures in a way which uncovers useful methodologies for the future, or at least train a bunch of people to get much better at red-teaming.
(For existing models, it might be more productive to ask for “surprising behavior” rather than “failures” per se, since I think almost all current failures are relatively uninteresting. Idk how to avoid inspiring capabilities work, though… but maybe understanding models better is robustly good enough to outweight that?)
Some time ago, I noticed that the concepts of fairness and fair competition were breaking down in my head, just as the concept of free will once broke down. All three are not only wrong, they are meaningless. If you go into enough detail, you will not be able to explain how this should work in principle. There is only determinism and chance, only upbringing and genetics, there is simply no place for free will. And from this it follows that there is also no place for fair punishment and fair competition, because either your actions and achievements are the result of heredity, or they are the result of the environment, society. The concept of punishment turns out to be fundamentally wrong, meaningless, you can’t give a person what he deserves, in some metaphysical sense. Maybe it’s my upbringing, or people in general tend to think of moral systems as objectively existing. But in fact, you can only influence him with positive and negative measures to achieve the desired behavior, including socially useful. As was noted in one of the chains, moral correctness is only relative to someone’s beliefs, this is not a property of an act, but your action of evaluating it. And this seems to be the only mention of such questions in the lessvrong chains. For some reason, there is a chain about free will, but not about fair punishments and fair competition. Perhaps there are materials on some third-party sites? Because I was completely unprepared for the fact that my ideas of justice would fall apart in my hands.
Be careful with thinking a phenomenon is meaningless or nonexistent just because it’s an abstraction over an insanely complex underlying reality. Even if you’re unsure of the mechanism, and/or can’t calculate how it works in detail, you’re probably best off with a decision theory that includes some amount of volition and intent. And moral systems IMO don’t have an objective real cause, but they can still carry a whole lot of power as coordination points and shared expectations for groups of humans.
Probably the easiest “honeypot” is just making it relatively easy to tamper with the reward signal. Reward tampering is useful as a honeypot because it has no bad real-world consequences, but could be arbitrarily tempting for policies that have learned a goal that’s anything like “get more reward” (especially if we precommit to letting them have high reward for a significant amount of time after tampering, rather than immediately reverting).
You don’t want it to be relatively easy to an outside force. Otherwise they can lead it to do as they please, and writing weird behaviour off as ‘oh, it’s changed our rewards, reset it again’, poses some risk.
The problem with trade agreements as a tool for maintaining peace is that they only provide an intellectual and economic reason for maintaining good relations between countries, not an emotional once. People’s opinions on war rarely stem from economic self interest. Policymakers know about the benefits and (sometimes) take them into account, but important trade doesn’t make regular Americans grateful to the Chinese for providing them with so many cheap goods—much the opposite, in fact. The number of people who end up interacting with Chinese people or intuitively understanding the benefits firsthand as a result of expanded business opportunities is very small.
On the other hand, video games, social media, and the internet have probably done more to make Americans feel aligned with the other NATO countries than any trade agreement ever. The YouTubers and Twitch streamers I have pseudosocial relationships with are something like 35% Europeans. I thought Canadians spoke Canadian and Canada was basically some big hippie commune right up until my minecraft server got populated with them. In some weird alternate universe where people are suggesting we invade Canada, my first instinctual thought wouldn’t be the economic impact on free trade, it would be whether my old steam friend Forbsey was OK.
I mean, just imagine if Pewdiepie were Ukrainian. Or worse, some hospital he was in got bombed and he lost an arm or a leg. You wouldn’t have to wait for America to initiate a draft, a hundred thousand volunteers would be carving a path from Odessa to Moscow right now.
If I were God-Emperor and I wanted to calm U.S.-China relations, my first actions would be to make it really easy for Chinese people to get visas, or even subsidize their travel. Or subsidize Mandarin learning. Or subsidize Google translate & related applications. Or really mulligan hard for our social media companies to get access to the chinese market.
It would not be to expand free trade. Political hacks find it exceptionally easy to turn simple trade into some economic boogeyman story. Actually meeting and interacting with the people from that country, having shared media, etc., makes it harder to inflame tensions.
I came across this old Metaculus question, which confirms my memory of how my timelines changed over time:
30% by 2040 at first, then march 2020 I updated to 40%, then Aug 2020 I updated to 71%, then I went down a bit, and then now it’s up to 85%. It’s hard to get higher than 85% because the future is so uncertain; there are all sorts of catastrophes etc. that could happen to derail AI progress.
What caused the big jump in mid-2020 was sitting down to actually calculate my timelines in earnest. I ended up converging on something like the Bio Anchors framework, but with a lot less mass in the we-need-about-as-much-compute-as-evolution-used-to-evolve-life-on-earth region. That mass was instead in the holy-crap-we-are-within-6-OOMs region, probably because of GPT-3 and the scaling hypothesis. My basic position hasn’t changed much since then, just become incrementally more confident as more evidence has rolled in & as I’ve heard the counterarguments and been dissatisfied by them.
How is “”Depression is just contentment with a bad attitude” false exactly?
I’m not trying to claim its true or sport defend flat earth style. I truly believe it’s different.
But back in Covid and even early aftermath I remember so often thinking “There’s no reason to go out because we’re all so happy at home that out likely wont be any better” which I eventually noticed is awfully similar to “There’s no reason to go out because I’m so unhappy out that out likely won’t be any better.” Seemed like a possible window into others’ lived experience.
Not really a rationality question but this is the highest concentration I know of people who have known people with depression, and also people who can answer potentially emotionally charged questions rationally.
The first three episodes of Narcos: Mexico, Season 3, is some of the best television I have ever seen. The rest of the “Narcos” series is middling to bad and I barely tolerate it. So far I would encourage you to skip to this season.
Surely someone has already pointed out this, but I have not seen such indications. It seems that humanism follows science, because the idea of progress shows that everyone can win, there is enough for everyone, life is not a zero-sum game, where if you do not harm someone, then you yourself live worse. And the lack of discrimination probably comes from the greater consistency of your reasoning, you see that hating a certain group is a completely arbitrary thing that has nothing to do with it, and just as you could hate any other group. It can be said that you are aware that you cannot be said to be special just because you are, because everyone else may think the same, you have no special reason.
Antifreeze proteins prevent water inside organisms from freezing, allowing them to survive at temperatures below 0 °C. They do this by actually binding to tiny ice crystals and preventing them from growing further, basically keeping the water in a supercooled state. I think this is fascinating.
Is it possible for there to be nanomachine enzymes (not made of proteins, because they would denature) that bind to tiny gas bubbles in solution and prevent water from boiling above 100 °C?
Is this an AGI risk?
A company that makes CPUs that run very quickly but don’t do matrix multiplication or other things that are important for neural networks.
Context: I know people who work there
Perhaps, but I’d guess only in a rather indirect way. If there’s some manufacturing process that the company invests in improving in order to make their chips, and that manufacturing process happens to be useful for matrix multiplication, then yes, that could contribute.
But it’s worth noting how many things would be considered AGI risks by such a standard; basically the entire supply chain for computers, and anyone who works for or with top labs; the landlords that rent office space to DeepMind, the city workers that keep the lights on and the water running for such orgs (and their suppliers), etc.
I wouldn’t worry your friends too much about it unless they are contributing very directly to something that has a clear path to improving AI.
How are people here dealing with AI doomerism? Thoughts about the future of AI and specifically the date of creation of the first recursively self-improving AGI have invaded almost every part of my life. Should I stay in my current career if it is unlikely to have an impact on AGI? Should I donate all of my money to AI-safety-related research efforts? Should I take up a career trying to convince top scientists at DeepMind to stop publishing their research? Should I have kids if that would mean a major distraction from work on such problems?
More than anything though, I’ve found the news of progress in the AI field to be a major source of stress. The recent drops in Metaculus estimates of how far we are from AGI have been particularly concerning. And very few people outside of this tiny almost cult-like community of AI safety people even seem to understand how unbelievable level of danger we are in right now. It often feels like there are no adults anywhere; there is only this tiny little island of sanity amidst a sea of insanity.
I understand how people working on AI safety deal with the problem; they at least can actively work on the problem. But how about the rest of you? If you don’t work directly on AI, how are you dealing with these shrinking timelines and feelings of existential pointlessness about everything you’re doing? How are you dealing with any anger you may feel towards people at large AI orgs who are probably well-intentioned but nonetheless seem to be actively working to increase the probability of the world being destroyed? How are you dealing with thoughts that there may be less than a decade left until the world ends?
Dath ilani dignity is, at least in part, epistemic dignity. It’s being wrong out loud because you’re actually trying your hardest to figure something out, and not allowing social frictions to get in the way of that (and, of course, engineering a society that won’t have those costly social frictions). It’s showing your surprise whenever you’re actually surprised, because to do otherwise would be to fail to have your behaviors fit the deep mathematical structure of Bayesianism. It’s, among other things, consummately telling and embodying the truth, by always actually reflecting the implications of your world model.
Its opposite would be to equivocate, to claim predictive accuracy after the fact in fuzzy cases you didn’t clearly anticipate, to ad hominem those who notice your errors, “to remain silent and be thought a fool rather than speak and remove all doubt,” and, in general, to be less than maximally sane.
Cf. “there are no atheists in a foxhole.” Under stress, it’s easy to slip sideways into a world model where things are going better, where you don’t have to confront quite so many large looming problems. This is a completely natural human response to facing down difficult situations, especially when brooding over those situations over long periods of time. Similar sideways tugs can come from (overlapping categories) social incentives to endorse a sacred belief of some kind, or to not blaspheme, or to affirm the ingroup attire when life leaves you surrounded by a particular ingroup, or to believe what makes you or people like you look good/high status.
Epistemic dignity is about seeing “slipping sideways” as beneath you. Living in reality is instrumentally beneficial, period. There’s no good reason to ever allow yourself to not live in reality. Once you can see something, even dimly, there’s absolutely no sense in hiding from that observation’s implications. Those subtle mental motions by which we disappear observations we know that we won’t like down the memory hole … epistemic dignity is about coming to always and everywhere violently reject these hidings-from-yourself, as a matter of principle. We don’t actually have a choice in the matter—there’s no free parameter of intellectual virtue here, that you can form a subjective opinion on. That slipping sideways is undignified is written in the very mathematics of inference itself.
Minor spoilers for mad investor chaos and the woman of asmodeus.
[Crossposted from Facebook.]
Recommendation request:
As part of developing “perceptual dexterity” stuff, I think I want to do a post where I review a few books related to creativity. I’ve just finished reading A Whack On the Side of the Head, which felt like quite a… I’m not sure what to call it, “corporate”? I think? It felt like a corporate take on creativity. When I started it, I thought I’d do a review of just that book, but after finishing it, I think a comparative study would be a lot more valuable.
I’m now looking for more books to include in the post. I’d like each one to be either 1) unusually excellent, 2) super weird and different from all the others, or 3) not overtly about creativity at all, but likely to produce something interesting and valuable if I try to review it “as a creativity book” anyway.
Another book that’s on my list is called “What It Is”, and it falls in the “super weird” category, while also being a… graphic novel?????? I guess????
I’d love for there to be a wide range of literary genres represented: a novel, a children’s picture book, a biography, a poetry anthology, maybe a pop sci thing, and at least one more training-manual-ish thing that’s not so “corporate”.
If you think of something else you’d like to see reviewed in a post like this, please pitch me on that as well.
There’s been some discussion recently about there perhaps being a surplus of funding in EA, and not enough good places to apply funds to. I have lots of thoughts on this that I’d like to talk more about at some point, but for now I want to propose an idea that seems pretty obvious and non-controversial to me: give $1M to people like Scott Alexander and Robin Hanson.
Scott has a day job as a psychiatrist. Robin as a university professor. Those day job hours (and slack) could be spent doing other things though. If they were wealthy enough, I assume (but am not sure) they would quit their jobs and have more hours to spend doing cool things. And they both have incredible track records of doing cool things.
Scott and Robin are just the two people that come to my mind first and that I see as the most non-controversial. But I think there are many more examples. Zvi and Kaj Sotala also come to mind. Iirc they both have day jobs.
A related idea is that even people who are currently being paid to do work on ie. AI safety, I assume there is still room to spend money to improve their productivity. Ie. by hiring a maid for them, maybe it frees up X hours a week, and having the extra hours + slack would improve their productivity by enough.
Appreciate the thought!
I used to have funding from EA sources to work on my own projects for a number of years. I basically gave it up because working on those projects didn’t feel motivating enough and it seemed to me like I’d probably be happier doing something else and keeping any EA stuff as a hobby on the side. (This feels like it’s been the right choice.)
I see. Thanks for the response. I’m starting to suspect that this is a common sentiment, wanting some sort of normalcy and doing other stuff on the side.
I’m curious, was that funding you received no strings attached? If not, I wonder if moving to no strings attached would change how you feel.
Pretty much, yes.
Though it’s worth noting that this didn’t entirely eliminate a feeling of needing to do something useful with my time. Even when I had guaranteed funding to do basically whatever I wanted for a while (say a year), there was still the question of whether the same source would be willing to fund me for another year if I didn’t do enough useful things during that time. And if they decided that they didn’t and then I’d need to find another funder or a real job, what would that source think about me having spent a year without accomplishing anything concrete that I could point at.
So in practice even no-strings-attached funding still doesn’t let you completely stop worrying about getting results, unless the source credibly commits to providing that funding for a significant fraction of your remaining lifetime. I find that one of the advantages of having a more “normal” day job rather than weird EA funding is that it guarantees that I’m spending at least part of my time on something that helps ensure I can also find another “normal” job later, if need to be. Rather than needing to stress out that if I don’t get anything useful done today, then there’s nothing really forcing me to do anything useful tomorrow either, nor anything forcing me to do anything useful the day after that, and I really hope that a whole year won’t pass with me doing nothing useful until finally the EAs will get tired of funding me and I’ll have burned whatever employability I had in the “normal” job market too.
This is probably moot, but I’d like to argue against this sentiment and share part of my own story.
I myself am a programmer and have a lot of anxiety about getting fired and being unable to find another job. And so I’ve spent a good amount of time trying to debug this. Part of that debugging is asking True Self what he actually thinks. And this is his ~answer.
It is totally implausible that my fears end up actually being realized. Think of it like this:
Plan A is to keep my current job. I worry about getting fired, but it is pretty unlikely to actually happen. Look at the base rate. It’s low. And I have control over my performance. I can scale it up if I start to worry that I’m getting into risky territory.
Plan B is, if I get fired, to apply to, let’s call them “reach jobs” (like a reach school when you apply to colleges) and get one of them. Seems somewhat plausible.
Plan C is to mass apply to normal jobs that are in my ballpark. It might take a few months, but it seems highly likely I’d eventually get one of them.
Plan D1 is to ask friends and family for referrals.
Plan D2 is to lower my standards and apply to jobs that I’m overqualified for (and perhaps adjust the resume I use to apply to mitigate against the failure mode of “he would never actually accept this position”).
Plan D3 is to push even further into my network, asking former coworkers, former classmates, and friends of friends for referrals.
Plan D4 is to just have my girlfriend support me.
Plan E is to do something adjacent, like work as a coding bootcamp instructor or maybe even in QA.
Plan F is to do something like work at a library or a coffee shop. I worked at a library (actually two) in college and it was great. It was low stress and there was plenty of time to screw around on my laptop doing my own thing.
Even if I get “knocked off track” and end up at D2 or whatever, I can always work my way back up. It’d be a setback, but probably nothing too crazy.
And that’s actually something I ended up going through. After doing a coding bootcamp and working as a programmer for about a year and a half, I took a year off to self-study computer science, and then about three more years working on a failed startup. It was a little tough finding a job after that, but I managed. From there I worked my way up. Today I actually just accepted an offer at one of those “reach jobs”.
Anyway, what I’m trying to say is that taking time off doing EA stuff might be a setback in terms of your ability to get back into the “normal” job market, but I expect that it’d only knock you down a rung or so. I don’t think it’d completely knock you of the ladder. Maybe your ladder doesn’t look exactly like mine with A through F — I’m pretty fortunate to have the life circumstances I have — but I expect that it’s a lot longer than it feels. And even if you do get knocked down a rung, I expect that for you too it’d just be a temporary setback, nothing that’d knock you off course too significantly.
Gotcha. That was a really helpful response, and it makes a lot of sense.
What if this happened for you? Suppose you received the funding in a lump sum with no strings attached. Would you prefer that over having the day job? How do you expect it would affect the impact you would have on the world?
Glad it was helpful :)
Hmm. Certainly it’d make me feel a bit safer, but I’m not sure if it would change what I actually did in a short-term basis at least. My EA-productivity is limited more by motivational and emotional issues than time, and if I did manage to debug those issues enough that time would become the limiting factor, then I might feel fine asking for short-term funding anyway since I would no longer feel doubtful about my productivity.
I could definitely imagine it being helpful anyway, though I’m sufficiently uncertain about this that I think I’d feel bad about accepting any such offer. :)
I see. Thanks again for the explanation!
Hearing this, it re-opens a line of thought that’s been swimming in the back of my mind for quite some time: that helping EA people with mental health is a pretty high-yielding pursuit. Lots of people (including myself) deal with stuff, I presume. And if you can help such people, you can improve productivity by something like, I don’t know, 10-200%?
But how do you help them? I don’t think I have any great ideas here.
I assume most people have access to a therapist if they wanted one.
Maybe motivation to see a therapist is the problem, not access. But there’s plenty of people talking about and normalizing therapy nowadays, and I’m not sure how fruitful it’d be to continue that process.
Maybe difficulty finding the right therapist is the crux? Especially for rationalist-types who have “weird” issues. Maybe. Maybe expanding and/or branching off of something like the Secular Therapy Project would be worthwhile. Or the SlateStarCodex Psychiat-list.
Maybe we just need better models of how the mind works and how to repair psychiatric pain. But the world of clinical psychology research already has this covered. Right? Maybe, maybe not. It does seem difficult to break into and have a real impact. However, you Kaj seem to me like one of the few people who might have a comparative advantage in pursuing something like that. I’m thinking of your Multiagent Models of Mind sequence. I was really impressed by it. I’m not sure how much of it was actually novel — maybe parts were, maybe not really, I don’t really know — but along the lines of Non-Expert Explanation, I think there’s a good amount of value in framing things differently. And in popularizing worthwhile things! That sequence helped me arrive at a pretty good understanding of my own psychological issues, I think, whereas before that I was pretty lost. The understanding hasn’t translated to actually feeling any better, but that’s n=1 and beside the point. Speaking of which, what is my point? I think it’s just to consider all of this food for thought. I can’t say I’m confident in the broader points I’m making.
FWIW, my other day job (I have two part-time ones) is related.
Oh, cool!
Scott has been offered money to quit his job. I don’t know the full reason for why he didn’t take it. I think his observation was what his productivity on his blog doesn’t go up at all if he doesn’t have a job, I think he really values independence from funders, and his job provides him with important grounding that feels important for him to stay sane.
I see, thanks for clarifying.
(I’m interpreting what you’re saying as “doesn’t go up moderately” not “doesn’t go up at all”.)
That sounds implausible to me. Not having a job would mean more hours are available. Would all of those hours be spent on leisure? Is his “blogging bucket” already filled by the amount of blogging he is currently doing? What about his “doing other productive things” bucket? What about the benefits of having more slack?
As a related point, even if Scott’s productivity wouldn’t benefit from extra hours, I expect that most other people’s productivity would benefit, and ultimately I intend for my point to extend past Scott and Robin and into lots of other cool people (including yourself, actually!).
What I am proposing is just “here’s a briefcase of cash, go do what you want”. Ie. no earmarks. So it should provide that independence. This of course requires a lot of trust in the recipient, but I think that for Scott as well as many other people actually, such trust would be justifiable.
That sounds very reasonable to me.
It also reminds me of Richard Feynman not wanting a position at the institute for advance study.
“I don’t believe I can really do without teaching. The reason is, I have to have something so that when I don’t have any ideas and I’m not getting anywhere I can say to myself, “At least I’m living; at least I’m doing something; I am making some contribution”—it’s just psychological.
When I was at Princeton in the 1940s I could see what happened to those great minds at the Institute for Advanced Study, who had been specially selected for their tremendous brains and were now given this opportunity to sit in this lovely house by the woods there, with no classes to teach, with no obligations whatsoever. These poor bastards could now sit and think clearly all by themselves, OK? So they don’t get any ideas for a while: They have every opportunity to do something, and they are not getting any ideas. I believe that in a situation like this a kind of guilt or depression worms inside of you, and you begin to worry about not getting any ideas. And nothing happens. Still no ideas come.
Nothing happens because there’s not enough real activity and challenge: You’re not in contact with the experimental guys. You don’t have to think how to answer questions from the students. Nothing!
In any thinking process there are moments when everything is going good and you’ve got wonderful ideas. Teaching is an interruption, and so it’s the greatest pain in the neck in the world. And then there are the longer period of time when not much is coming to you. You’re not getting any ideas, and if you’re doing nothing at all, it drives you nuts! You can’t even say “I’m teaching my class.”
If you’re teaching a class, you can think about the elementary things that you know very well. These things are kind of fun and delightful. It doesn’t do any harm to think them over again. Is there a better way to present them? The elementary things are easy to think about; if you can’t think of a new thought, no harm done; what you thought about it before is good enough for the class. If you do think of something new, you’re rather pleased that you have a new way of looking at it.
The questions of the students are often the source of new research. They often ask profound questions that I’ve thought about at times and then given up on, so to speak, for a while. It wouldn’t do me any harm to think about them again and see if I can go any further now. The students may not be able to see the thing I want to answer, or the subtleties I want to think about, but they remind me of a problem by asking questions in the neighborhood of that problem. It’s not so easy to remind yourself of these things.
So I find that teaching and the students keep life going, and I would never accept any position in which somebody has invented a happy situation for me where I don’t have to teach. Never.”
— Richard Feynman, Surely You’re Joking, Mr. Feynman!
I suspect (and this is my interpretation of what he’s said) that Alexander’s productivity would actually go down if he quit his day job. A lot of his blogging is inspired by his psychiatric work, so he would lose that source of inspiration. Also, a lot of his best works (eg. Meditations on Moloch) were written while he was a medical school resident, working 60 hours a week outside of blogging, so it’s not clear to me that the hours of working are really taking away from his best writing. They are certainly taking away from posting as frequently—he’s been posting much more frequently now on Substack—but pressure to write daily posts might take away from work on longer high quality posts.
I don’t get the impression that too much is inspired by his psychiatric work. This is partly based on my being a reader of his posts on and off over the years, and also on a brief skim of recent posts (biographies of presidents, AI safety, pregnancy interventions). But even if that source of inspiration was lost, it’d presumably be replaced by other sources of inspiration, and his writing is broad enough where at best that’d be a large net gain and at worst it’d be a small net loss.
That’s a really interesting point. Maybe I’m wrong then. Maybe I don’t understand the subtleties of what makes for good writing. But even so, writing is only one thing. I expect that with more time people like Scott would come up with other cool things to pursue in addition to writing.
That’s not where I expected this was going to go. (Wasn’t there some sort of microgrants project somewhere ahile back? I don’t know if that was EA, but...)
It doesn’t look to me like it would go to people like Scott or Robin either. I am arguing that it should because they are productive people and it would enable them to spend more time being productive via removing the need for a day job, especially if there is a surplus of money available.
Noticed something recently. As an alien, you could read pretty much everything Wikipedia has on celebrities, both on individual people and the general articles about celebrity as a concept… And never learn that celebrities tend to be extraordinarily attractive. I’m not talking about an accurate or even attempted explanation for the tendency, I’m talking about the existence of the tendency at all. I’ve tried to find something on wikipedia that states it, but that information just doesn’t exist (except, of course, implicitly through photographs).
It’s quite odd, and I’m sure it’s not alone. “Celebrities are attractive” is one obvious piece of some broader set of truisms that seem to be completely missing from the world’s most complete database of factual information.
Part of the issue is like that celebrity, as wikipedia approaches the word, is broader than just modern TV, film, etc. celebrity and instead includes a wide variety of people who are not likely to be exceptionally attractive but are well known in some other way. There’s individual preferences in terms of who they think are attractive, but many politicians, authors, radio personalities, famous scientists, etc. are not conventionally attractive in the way movie stars are attractive and yet these people are still celebrities in a broad sense. However, I’ve not dug into the depths of wikipedia to see if, for example, this gap you see holds up if looking at pages that more directly talk about the qualities of film stars, for example.
Analyzing or talking about status factors is low-status. You do see information about awards for beauty, much like you can see some information about fiances, but not much about their expenditures or lifestyle.
(I can’t find where it was, if I find it, I’ll move it there) Someone suggested in light of the problems with AI to clone Yudkowsky, but the problem is that apparently we don’t have the 18 years it takes for the human brain to form, so that even when solving all the other problems, it’s just too slow. Well, with any means of accelerating the development of the brain, the problem is already clear.
I came up with the idea that people can cheer for the protagonist of the book, even if he is a villain, because the political instincts of rationalizing the correctness of your tribe’s actions are activated. You are rooting for Main Character, as for your group.
Suprising: the fact that Chuck Palahniuk’s writing style is visible in lsusr’s fiction. More suprising: the fact that Fight Club 2 deals with… memetics, of all things.
I am flattered. Chuck Palahniuk is among my favorite authors.
Perhaps somewhere on LessWrong this is already noticed, but judging by how much space there is not occupied by life, how much useless particles there are in physics, it seems that our universe is just one of the random options in which intelligence appears somewhere in order to you could see the universe. And how different it is from the universe, which was specially designed for life, even more than one planet would be enough, only 100 meters of the earth’s crust would be enough. As primitive people actually imagined, until science appeared, so that religion began to lay claim to irrefutability. It becomes ridiculously obvious once you understand it.
Relevant: http://hoaxes.org/archive/permalink/the_great_moon_hoax https://en.wikipedia.org/wiki/Principle_of_plenitude https://80000hours.org/podcast/episodes/tom-moynihan-prior-generations/
It seems that in one of the chains Yudkovsky says that Newtonian mechanics is false. But in my opinion, saying that Newton’s mechanics is false is the same as saying that Einstein’s theory of relativity is false, well, we know that it does not work in the quantum world, so sooner or later it will be replaced by another theory, so you can say in advance that it is false. I think that this is generally the wrong question, and either we should indicate how much the percentage is false, somehow without confusing it with the probability that it is false. Or continue the metaphor of the map and territory. Maps are usually not false, they are inaccurate. Some map may not outperform white noise in predictions, but Newton’s map is not like that, his laws worked well until the discovery of problems with the orbit of Mercury, and replaced by the theory of relativity. Newton’s map is less like territory, less accurate than Einstein’s map. Let’s say Newton’s map contained a blurry gray spot in the shape of a circle, and one could assume that it was just a gray circle, but Einstein’s map showed us in higher resolution that there is a complex pattern in this place within a circle with equal alternation of black and white, no grey.
A possible way to convert money to progress on alignment: offering a large (recurring) prize for the most interesting failures found in the behavior of any (sufficiently-advanced) model. Right now I think it’s very hard to find failures which will actually cause big real-world harms, but you might find failures in a way which uncovers useful methodologies for the future, or at least train a bunch of people to get much better at red-teaming.
(For existing models, it might be more productive to ask for “surprising behavior” rather than “failures” per se, since I think almost all current failures are relatively uninteresting. Idk how to avoid inspiring capabilities work, though… but maybe understanding models better is robustly good enough to outweight that?)
Ideas for defining “surprising”? If we’re trying to create a real incentive, people will want to understand the resolution criteria.
I like this. Would this have to be publicly available models? Seems kind of hard to do for private models.
What kind of access might be needed to private models? Could there be a secure multi-party computation approach that is sufficient?
Some time ago, I noticed that the concepts of fairness and fair competition were breaking down in my head, just as the concept of free will once broke down. All three are not only wrong, they are meaningless. If you go into enough detail, you will not be able to explain how this should work in principle. There is only determinism and chance, only upbringing and genetics, there is simply no place for free will. And from this it follows that there is also no place for fair punishment and fair competition, because either your actions and achievements are the result of heredity, or they are the result of the environment, society. The concept of punishment turns out to be fundamentally wrong, meaningless, you can’t give a person what he deserves, in some metaphysical sense. Maybe it’s my upbringing, or people in general tend to think of moral systems as objectively existing. But in fact, you can only influence him with positive and negative measures to achieve the desired behavior, including socially useful. As was noted in one of the chains, moral correctness is only relative to someone’s beliefs, this is not a property of an act, but your action of evaluating it. And this seems to be the only mention of such questions in the lessvrong chains. For some reason, there is a chain about free will, but not about fair punishments and fair competition. Perhaps there are materials on some third-party sites? Because I was completely unprepared for the fact that my ideas of justice would fall apart in my hands.
Be careful with thinking a phenomenon is meaningless or nonexistent just because it’s an abstraction over an insanely complex underlying reality. Even if you’re unsure of the mechanism, and/or can’t calculate how it works in detail, you’re probably best off with a decision theory that includes some amount of volition and intent. And moral systems IMO don’t have an objective real cause, but they can still carry a whole lot of power as coordination points and shared expectations for groups of humans.
Probably the easiest “honeypot” is just making it relatively easy to tamper with the reward signal. Reward tampering is useful as a honeypot because it has no bad real-world consequences, but could be arbitrarily tempting for policies that have learned a goal that’s anything like “get more reward” (especially if we precommit to letting them have high reward for a significant amount of time after tampering, rather than immediately reverting).
You don’t want it to be relatively easy to an outside force. Otherwise they can lead it to do as they please, and writing weird behaviour off as ‘oh, it’s changed our rewards, reset it again’, poses some risk.
The problem with trade agreements as a tool for maintaining peace is that they only provide an intellectual and economic reason for maintaining good relations between countries, not an emotional once. People’s opinions on war rarely stem from economic self interest. Policymakers know about the benefits and (sometimes) take them into account, but important trade doesn’t make regular Americans grateful to the Chinese for providing them with so many cheap goods—much the opposite, in fact. The number of people who end up interacting with Chinese people or intuitively understanding the benefits firsthand as a result of expanded business opportunities is very small.
On the other hand, video games, social media, and the internet have probably done more to make Americans feel aligned with the other NATO countries than any trade agreement ever. The YouTubers and Twitch streamers I have pseudosocial relationships with are something like 35% Europeans. I thought Canadians spoke Canadian and Canada was basically some big hippie commune right up until my minecraft server got populated with them. In some weird alternate universe where people are suggesting we invade Canada, my first instinctual thought wouldn’t be the economic impact on free trade, it would be whether my old steam friend Forbsey was OK.
I mean, just imagine if Pewdiepie were Ukrainian. Or worse, some hospital he was in got bombed and he lost an arm or a leg. You wouldn’t have to wait for America to initiate a draft, a hundred thousand volunteers would be carving a path from Odessa to Moscow right now.
If I were God-Emperor and I wanted to calm U.S.-China relations, my first actions would be to make it really easy for Chinese people to get visas, or even subsidize their travel. Or subsidize Mandarin learning. Or subsidize Google translate & related applications. Or really mulligan hard for our social media companies to get access to the chinese market.
It would not be to expand free trade. Political hacks find it exceptionally easy to turn simple trade into some economic boogeyman story. Actually meeting and interacting with the people from that country, having shared media, etc., makes it harder to inflame tensions.
This doesn’t seem like an either-or question. Freer trade and more individual interactions seem complementary to me.
I should note that I’m also pro free trade, because I like money and helping people. I’m just not pro free trade because I think it promotes peace.
I came across this old Metaculus question, which confirms my memory of how my timelines changed over time:
30% by 2040 at first, then march 2020 I updated to 40%, then Aug 2020 I updated to 71%, then I went down a bit, and then now it’s up to 85%. It’s hard to get higher than 85% because the future is so uncertain; there are all sorts of catastrophes etc. that could happen to derail AI progress.
What caused the big jump in mid-2020 was sitting down to actually calculate my timelines in earnest. I ended up converging on something like the Bio Anchors framework, but with a lot less mass in the we-need-about-as-much-compute-as-evolution-used-to-evolve-life-on-earth region. That mass was instead in the holy-crap-we-are-within-6-OOMs region, probably because of GPT-3 and the scaling hypothesis. My basic position hasn’t changed much since then, just become incrementally more confident as more evidence has rolled in & as I’ve heard the counterarguments and been dissatisfied by them.
How is “”Depression is just contentment with a bad attitude” false exactly?
I’m not trying to claim its true or sport defend flat earth style. I truly believe it’s different.
But back in Covid and even early aftermath I remember so often thinking “There’s no reason to go out because we’re all so happy at home that out likely wont be any better” which I eventually noticed is awfully similar to “There’s no reason to go out because I’m so unhappy out that out likely won’t be any better.” Seemed like a possible window into others’ lived experience.
Not really a rationality question but this is the highest concentration I know of people who have known people with depression, and also people who can answer potentially emotionally charged questions rationally.
The initial statement seemed wrong.
I’ve seen stuff about this, but I don’t remember where. I remember stuff like (summarizing the idea):
How awfully convenient it seems to be, for the optimists and the pessimists.
The optimists say, the world is alright, or even, wonderful, awesome, and amazing! (We don’t have to do anything.)
The pessimists say, the world is awful, terrible, unspeakably bad—but we can’t do anything about it.
Either the work is done, or it can never begin.
I have more thoughts on depression.
The first three episodes of Narcos: Mexico, Season 3, is some of the best television I have ever seen. The rest of the “Narcos” series is middling to bad and I barely tolerate it. So far I would encourage you to skip to this season.