I have similar experience with it today (before reading your article) https://www.lesswrong.com/editPost?postId=28XBkxauWQAMZeXiF&key=22b1b42041523ea8d1a1f6d33423ac
I agree that this over-confidence is disturbing :(
I have similar experience with it today (before reading your article) https://www.lesswrong.com/editPost?postId=28XBkxauWQAMZeXiF&key=22b1b42041523ea8d1a1f6d33423ac
I agree that this over-confidence is disturbing :(
We already live in a world in which any kid can start a difficult to stop and contain chain reaction: fire. We responded by:
making a social norm of not allowing kids to buy or use
separating houses 1.5 of their height
adding sprinklers, requiring them by law
having a state founded agency to stop fires
Honestly I still don’t understand very well what exactly stops evil/crazy people from starting fires in forests whenever they want to. Norms to punish violators? Small gain to risk factor?
Also, I wonder to what extent our own “thinking” is based on concepts we ourselves understand. I’d bet I don’t really understand what concepts most of my own thinking processes use.
Like: what are the exact concepts I use when I throw a ball? Is there a term for velocity, gravity constant or air friction, or is it just some completely “alien” computation which is “inlined” and “tree-shaked” of any unneeded abstractions, which just sends motor outputs given the target position?
Or: what concepts do I use to know what word to place at this place in this sentence? Do I use concepts like “subject”, or “verb” or “sentiment”, or rather just go with the flow subconsciously, having just a vague idea of the direction I am going with this argument?
Or: what concepts do I really use when deciding to rotate the steering wheel 2 degrees to the right when driving a car through a forest road’s gentle turn? Do I think about “angles”, “asphalt”, “trees”, “centrifugal force”, “tire friction”, or rather just try to push the future into the direction where the road ahead looks more straight to me and somehow I just know that this steering wheel is “straightening” the image I see?
Or: how exactly do I solve (not: verify an already written proof) a math problem? How does the solution pop into my mind? Is there some systematic search over all possible terms and derivations, or rather some giant hash-map-like interconnected “related tricks and transformations I seen before” which get proposed?
I think my point is that we should not conflate the way we actually solve problems (subconsciously?), with the way we talk (consciously) about solutions we’ve already found when trying to verify them ourselves (the inner monologue) or convey them to another person. First of all, the Release binary and Debug binaries can differ (it’s completely different experience to ride a bike for a first time, than an on 2000th attempt). Second, the on-the-wire format and the data structure before serialization can be very different (the way I explain how to solve an equation to my kid is not exactly how I solve it).
I think, that training a separate AI to interpret for us the inner workings of another AI is risky, the same way a Public Relations department or a lawyer doesn’t necessarily give you the honest picture of what the client is really up to.
Also, I there’s much talk about distinction between system 1 and 2, or subconsciousness and consciousness, etc.
But, do we really treat seriously the implication of all that: the concepts our conscious part of mind uses to “explain” the subconscious actions have almost nothing to do with how it actually happened. If we force the AI to use these concepts it will either lie to us (“Your honor, as we shall soon see the defendant wanted to..”) , or be crippled (have you tried to drive a car using just the concepts from physics text book?). But even in the later case it looks like a lie to me, because even if the AI is really using the concepts it claims/seems/reported to be using, there’s still the mismatch in myself: I think I now understand that the AI works just like me, while in the reality I work completely differently than I thought. How bad that is depends on problem domain, IMHO. This might be pretty good if the AI is trying to solve a problem like “how to throw a ball” and a program using physic equations is actually also a good way of doing it. But once we get to more complicated stuff like operating a autonomous drone on the battlefield or governing country’s budget I think there’s a risk because we don’t really know how we ourselves make these kind of decisions.
Based on the title alone I was expecting a completely different article: about how our human brains had originally evolved to be so big and great just to outsmart other humans in the political games ever increasing in complexity over millennia and
==thus==>
our value system already steers us to manipulate and deceive others but also ourselves so that we don’t even realize that that’s what our goal system is really about so that we can be more effective at performing those manipulations with straight face
==so==>
any successful attempt at aligning a super-intelligence to our values, will actually result in a super-manipulator which can perfectly hide it from everyone including self diagnostic
It’s already happening https://githubcopilotinvestigation.com/ (which I’ve learned yesterday from is-github-copilot-in-legal-trouble post)
I think it would be interesting plot twist: humanity saved from AI FOOM by the big IT companies having to obey intellectual property rights they themselves defended for so many years :)
One concrete advice on cracking eggs with two hands: try to pull your thumbs in opposite directions as if you wanted to tear the egg in halves (as opposed to pushing them in).
Sorry for “XY Problem”-ing this, but I felt strong sad emotion when reading your post and couldn’t resist trying to help—you wrote:
Unless I’m eating with other people, food for me is fuel.
Have you tried to rearrange your life so that you can eat the breakfast together with people you care much more often, to the point where you no longer care to make it as quick as possible?
There’s only so many ways our hardware can be stimulated to feel happy, don’t give up on “eating together with close people”!
Thank you! I’ve read up to and including section 4. Previously I did know a bit about neural networks, but had no experience with RL and in particular didn’t know how RL can actually bridge the gap between multiple actions leading to a sparse reward (as in: hour of Starcraft gameplay just to learn you’ve failed or won). Your article helped me realize how it is achieved—IIUC by:
0. focusing on trying to predict what the reward will be more than on maximizing it
1. using a recursive approach to infinite sum: sum(everything)=e+sum(verything).
2. by using a different neural network on the LHS than on the RHS of this equality, so that one of them can be considered “fixed” and thus not taking part in backpropagation. Cool!
Honestly, I am still a bit fuzzy on how exactly we solve the problem that most “e”s in e+sum(verything) will be zero—before reading the article I thought the secret sauce will be to have some imaginary “self-rewarding” which eventually sums up to the same total reward but in smaller increments over time (like maybe we should be happy about each vanishing pixel in Arkanoid, or each produced unit in Starcraft a little, even if there’s no direct reward for that from the environment)?
After reading your explanation I have some vague intuition that even if “e” is zero, e+sum(verything) has better chance of figuring out what the future rewards will be, because we are one step closer to these rewards, and that somehow this small dent is enough to get things going. I imagine this works somewhat like this: if there are 5 keys I have to press in succession to score one point, then sooner or later I will sample the state after 4 presses and fix my predictor output for this state as the reward will be immediate result of one keypress, then I’ll be ready to learn what to do after 3 presses as I will already know what happens after 4th keypress so I can better predict this two-actions chain, and so on, and so on learning to predict consequences of a longer and longer combos. Is that right?
Things which could improve the article:
1. Could you please go over each occurrence of word “reward” in this article and make it clear if it should be “reward at step t” or “sum of rewards from step t onwards”? I felt it is really difficult for me to understand some parts of explanation because of this ambiguity.
2. In the pseudo-code box you used “r(j) + Q(s’, a’, q)” formula. I was expecting “r(j)+gamma*q(s’,a’)” instead. This is because I thought this should be similar to what we saw earlier in the math formula (the one with red,green and blue parts) and I interpreted that Q(s’,a’,theta^-) in the math formula was meant to mean something like “the network output for state s’, proposed action a’ and weights set according to old frozen settings”, which I believe should be q(s’,a’) in the pseudocode. If I am wrong, then please clarify why we don’t need to multiplication by gamma and why Q can take three arguments, and why q is one of them, as none of this is clear to me.
3. Some explanation of how this sparsity of rewards is really addressed in games which give you zero reward until the very end.
So what’s the end state Putin wants to achieve through invading Ukraine? If Ukraine becomes part of Russia, then Russia will be bordering with NATO states.
Thank you for sharing!
Hello glich! Thanks for writing this whole series. When I’ve first read it a year ago, I thought to myself, that instead of impulsively going to implement it right ahead, I’ll wait one year to hear from you about how your strategy worked for you, first.
So.. How are you doing?
Wouldn’t the same argumentation lead to conclusion, that world should’ve already end soon after we’ve figured out how to make atomic bomb?
I don’t know how to write a novel with world which survives in equilibrium longer than a week (and this is one reason I’ve asked this question—I’d like to read ideas of others) but I suspect that the same way atomic bomb releases insane amounts of energy, yet we have reasons not to do that repeatedly, mages in would have good reasons to avoid destroying the world. Perhaps there’s not much to gain from doing so, maybe there’s M.A.D., maybe once you are smart enough to do that you are also smart enough not to do that, maybe you have to swear not to do that.
It could also be the case that I am too confident of our nuclear security. What’s currently our best reason not to blow up ourselves? Is it that nuclear energy costs a lot?
“translater” → “translator”?
“An division” → “A division”
Lots of details could matter, and the spareness of the writing only hints at what could be going on “for really reals”.
Thank you, this was enlightening for me—somehow, though I’ve read a few books and watched a few movies in my life, I hadn’t realized what you put here plainly, that these cuts are a device for the author to hide some truth from me (ok, this was obvious in “Memento”). I must’ve been very naive, as I simply thought it has more to do with MTV-culture/catering to short attention span of the audience. It’s funny how this technique becomes immediately obvious to me once I mentally flip the roles with the author and ask a question “how would I hide something from the reader or mislead them to believe some alternative explanation while not outright lying?”.
Hm, perhaps a similar, but more visible and annoying technique/plot device is when the author abruptly ends a conversation between two characters by some explosion or arrival of third person, and they never get to finish their sentence or clarify some misunderstanding. On some level this is the same trick, but between two characters, as opposed to between author and reader.
I now wonder what other “manipulation” techniques I was subjected to. Anyone care to list some they become aware of?
Given that Vi is counting seconds from encountering soldiers to their collapse, AND that there are three dots between this scene and the scene where Miriam says “I’ve been there since Z-Day.” (which technically is an inequality in the opposite direction than I need, but Miriam’s choosing this particular wording looks suggestive to me) I’d venture a guess, that the Z-Day virus was released by Vi in the facility, and Miriam is trying to blame the rouge AI for this. I read this story as Vi and Miriam already crossing a line of “the end justifies the means” and simply infect and kill the “innocent” soldiers protecting the headquarter of their commander who is an em/AI, which Vi and Miriam perceive as a threat that needs to be eliminated at all cost.
[p.s.: I’ve wrote above comment before I’ve realized that I somehow missed to read 8th episode, and now, after ridding 8th episode, I think Vi and Miriam are cleaning up the mess they’ve created themselves—the rouge AI they fight in 9th episode is the one they’ve released, it just took over the command of the army by pretending to be their real commander]
Hello, very intriguing story!
“solder” appears twice in text—it should be “soldier”
What is “Vi didn’t wait for her translator.” supposed to mean? I’m a bit confused because of earlier “She left her cornea and other electronics with Miriam on the scout ship.”. Is it supposed to hint at Vi having non-electronical ‘machines’ (such as the translator) in her body, or just a statement about her having to override her natural instinct/reflex (=normally she’d just wait for the translation, but this time she had a plan to lay down in advance which she would executed even if she still had the electronic translator)? Do people in far future really need to wait for translator?
“You will die. No matter what actions you’ll take all the possible branches end with your death. Still, you try to pick optimal path, because that’s what your brain’s architecture know how to do: pick optimal branch. You try to salvage this approach by proposing more and more complicated goal functions: instead of final value, let’s look at the sum over time, or avg, or max, or maybe ascribe other value to death, or try to extend summation beyond it, or whatever. You brain is a hammer, and it needs a nail. But it never occurs to you, that life is not something one needs to optimize. This is not an instance of the problem your brain is built to solve, and it looks silly to me you try to fit it by force to your preferred format. This is your inductive bias, so strong you probably don’t get what I’m trying to say to you: yes, you’ll die, but this doesn’t count.”
(I’m surprised nobody wrote it for 12 years, or at least my eyes can’t see it)
“The appeared” → “They appeared”
This discussion suggests, that the puzzles presented to the guesser should be associated with a “stake”—a numeric value which says how much you (the asker) care about this particular question to be answered correctly (i.e. how risk averse you are at this particular occassion). Can this be somehow be incorporated into the reward function itself or needs to be a separate input (Is “I want to know if this stock will go up or down, and I care 10 times as much about this question than about will it rain today”, the same thing as “Please estimate p for the following two questions where the reward function for the first one is f(x)=10(x-x^2) and the second is f(x)=x-x^2”? Does it somehow require some additional output channel from the guesser (“I am 90% confident that the p is 80%?” or maybe even “Here’s my distribution over the values of p \in (0,1)”) or does it somehow collapse into one dimension anyway (does “I am 90% confident that the p is 80% and 10% that it’s 70%” collaps to “I think p is 79%”? Does a distribution over p collapse to it’s expected value?).
IDK if this will be important to you, but I’d like to thank you for this comment, as it relieved my back pain after 8 years! Thank you @p.b. for asking for clarification and not giving up after first response. Thank you @Steven Byrens for writing the article and taking time to respond.
8 fucking years..
I’ve read this article and comments a month ago. Immediately after reading it the pain was gone. (I never had mystical experiences, like enlightenment, so the closest thing I can compare it to personally, was the “perspectival shift” I’ve felt ten years ago when “the map is not the territory” finally clicked)
I know—it could’ve been “just a placebo effect”—but as the author argues, who cares, and that’s kinda the main point of the claim. Still, I was afraid of giving myself a false hope—there were several few days long remissions of pain scattered along these 8 years, but the pain always returned—this is why I gave myself and this method a month before writing this comment. So far it works!
I know—“Post hoc ergo propter hoc” is not the best heuristic—there could be other explanations of my pain relief. For example a week or two before reading this article I’ve started following this exercise routine daily. However, I’ve paused the routine for three days before reading your article, and the pain relief happened exactly when I’ve finished reading your comment, so IMO timing and rarity (8 years...) of the event really suggests this comment is what helped. I still do the exercise routine, and it surely contributes and helps, too. Yet, I do the routine just once in the morning, yet I consciously feel how whenever throughout the day the pain starts to raise its head again, I can do a mental move inspired by this article to restore calm and dissolve the pain.
Also this is definitely how it felt from the inside! In the hope that it will help somebody else alleviate their pain here are some specific patterns of thoughts induced by this article I found helpful:
“oh, so my pain-center is simply confused about the signals, it is screaming like a child who can’t express well what’s wrong, and I was overreacting. I should show it love, not anger, I should calm it down, I must be the adult in the room and figure out what’s the real problem here.”
“I should ignore the pain by gently putting the pain to the side (like you do to the thoughts during meditation) as opposed to fighting through it. Like hitting snooze, vs clenching my jaw and fist to overcome it.”
“yeah, I’ve heard you pain-center, but I think you are mistaken about the magnitude and source of the problem, and I am actively working on the solution to the real problem, so please do not distract me while I am helping you”
“the pain-center is presenting me a whole crayon-drawn image of a tiger, but it was just connecting-the-dots creatively, and there really was no tiger, just the dots”. I think this one is most helpful metaphor for me. I can feel how I dissolve a full certainty of “the pain of the whole upper back” into individual, small, shaky dots of unsure signals from small patches of the back.
“looks like it was just one small place around this shoulder blade which started the alarm, maybe I should just change the position of right arm, oh, yes, this brought relief, good”
“ok, so this part near neck is so tense it started complaining, and this was probably because I was trying too hard to finish answering this email before visiting the restroom—let’s just give myself a pause and treat the body more gently”.
“ok, I need to be more precise: which patch of my back is in pain right now? If I can’t tell, then perhaps it’s something in the environment that is causing stress, or some thought, or some anticipation, or maybe some physiological need? Let’s look around and find out what this alarm is about”
The Bohr’s horseshoe: “I was told that it works even if you don’t believe in it”
I just imagine a volume knob on the pain and just turn it down
I am really excited about all this positive change in my mind, because as one can imagine (and if you can’t, recall main character of House M.D.) a constant pain corrupts other parts of your mind and life. It’s like a prior to interpret every sentence of family-members and every event in life. It’s a crony belief, a self-sustaining “bitch eating cracker syndrome”. It took 8 years to build this thought-cancer, and it will probably take some time to disband it, but I see the progress already.
Also, I am “counter-factually frightened” by how close I was to completely missing this solution to my problem. I was actively seeking, you see, fruitlessly, though, for years! I had so much luck: to start reading LW long ago; to subscribe Scott Alexander’s blog (I even read his original review of “unlearn your pain” from 2016 yet it sounded negative and (I) concentrated too much on discrediting the underlying model of action, so perhaps I could fix my pain 6 years earlier); to develop a habit of reading LW and searching for interesting things and reading comments, not just the article.. Thank you again for this article and this comment thread. When I imagine how sad would be the future if on that afternoon I didn’t read it I want to cry...