moridinamael
To be well-calibrated is to be punctual
I probably qualify as one of the people you’re describing.
My reasoning is that we are in the fortunate position of having AI that we can probably ask to do our alignment homework for us. Prior to two or three years ago it seemed implausible that we would get an AI that would:
* care about humans a lot, both by revealed preferences and according to all available interpretability evidence
* be quite smart, smarter than us in many ways, but not yet terrifyingly/dangerously smart
But we have such an AI. Arguably we have more than one such. This is good! We lucked out!
Eliezer has been saying for some time that one of his proposed solutions to the Alignment Problem is to shut down all AI research and to genetically engineer a generation of Von Neumanns to do the hard math and philosophy. This path seems unlikely to happen. However, we almost have a generation of Von Neumanns in our datacenters. I say almost because they are definitely not there yet, but I think, based on an informed awareness of LLM development capabilities and plausible mid-term trajectories, that we will soon have access to arbitrarily many copies of brilliant-but-not-superintelligent friendly AIs who care about human wellbeing, and will be more than adequate partners in the development of AI Alignment theory.
I can foresee many objections and critiques of this perspective. On the highest possible level, I acknowledge that using AI to do our AI Alignment homework carries risks. But I think these risks are clearly more favorable to us than the risks we all thought we would be facing in the late part of the early part of the Singularity. For example, what we don’t have is a broadly generally-capable version of AlphaZero. We have something that landed in just the right part of the intelligence space where it can help us quite a lot and probably not kill us all.
In my own writing I am very conscious of whether I’m writing from a place of inspiration.
All my most successful posts came to me as a vibrant and compelling idea that very quickly took shape in my mind and ended up being finished and posted quickly. What made them clear and living in my mind is what made them readable and engaging to readers, my job was mainly to stay out of my own way, to translate that lightning bolt of thought into writing.There’s a symmetry there: it was easy to write because the idea was so clear to me in my own mind, and this clarity is also what makes it enjoyable to read. If you don’t quite know exactly what you’re trying to say, that problem isn’t going to be overcome by more “effort” at the prose level.
Unfortunately you can’t force inspiration, or at least I haven’t figured out how to do it. I have a lot of drafts that never go posted because that inspiration/clarity wasn’t there.
Good work.
The hardest part of moderation is the need to take action in cases where someone is consistently doing something that imposes a disproportionate burden on the community and the moderators, but which is difficult to explain to a third party unambiguously.
Moderators have to be empowered to make such decisions, even if they can’t perfectly justify them. The alternative is a moderation structure captured by proceduralism, which is predictably exploitable by bad actors.
That said — this is Less Wrong, so there will always be a nitpick — I do think people need to grow a thicker skin. I have so many friends who have valuable things to say, but never post on LW due to a feeling of intimidation. The cure for this is, IMO, not moderating the level of meanness of the commentariat, but encouraging people to learn to regulate their emotions in response to criticism. However, at the margins, clipping off the most uncharitable commenters is doubtless valuable.
Sorry, that’s what I get for replying from the Notification interface.
I’m not sure if I understand your question. I am using the initial quotes from Stoic/Buddhist texts as examples of perverse thinking that I don’t endorse.
As to (1), I was following The Mind Illuminated, for what it’s worth. And I am a big fan of emotional integration. Spiritual practices can help with that, but I think they can also get in the way, and it’s really hard to know in advance which direction you’re going.
I think we are basically on the same page with (2).
As for (3) I think it’s a matter of degree, requiring the kind of nuance that doesn’t fit on a bumper sticker. If you feel so much persistent guilt that it’s causing daily suffering, then that’s probably something you need to sort out. I was intentional in adding the phrase “for a bit” in “It’s okay to feel bad for a bit,” because I don’t actually think it’s okay to feel persistently bad forever! Those are definitely two different situations. If you have ongoing intrusive negative emotions, that sounds adjacent to trauma, and that can be sorted out with some work.
I always appreciate your insights and opinions on this general topic.
At the time, I was following the instructions in The Mind Illuminated very closely. I will grant that this may have been user error/skill issue, but given that The Mind Illuminated is often put forth as a remarkably accessible and lucid map through the stages of vipassana, and given that I still went this badly wrong, you have to wonder if maybe the path itself is perhaps too dangerous to be worth it.
The outcome I reached may have been predictable, given that the ultimate reason I was meditating at the time was to get some relief from the the ongoing suffering of a chronic migraine condition. In that specific sense, I was seeking detachment.
In the end I am left wondering if I would have been better off if I had taken up mountain biking instead of meditation, given that it turned out that the path to integrating my emotions led through action more than reflection.
It’s Okay to Feel Bad for a Bit
This post resonated with me when it came out, and I think its thesis only seems more credible with time. Anthropic’s seminal “Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet” (the Golden Gate Claude paper) seems right in line with these ideas. We can make scrutable the inscrutable as long as the inscrutable takes the form of something organized and regular and repeatable.
This article gets bonus points for me for being succinct and while still making its argument clearly.
My favorite Less Wrong posts are almost always the parables and the dialogues. I find it easier to process and remember information that is conveyed in this way. They’re also simply more fun to read.
This post was originally written as an entry for the FTX Future Fund prize, which, at the time of writing the original draft, was a $1,000,000 prize, which I did not win, partly because it wasn’t selected as the winner and partly because FTX imploded and the prize money vanished. (There is a lesson about the importance of proper calibration of the extrema of probability estimates somewhere in there.) In any case, I did not actually think I would win, because I was basically making fun of the contest organizers by pointing out that the whole ethos behind their prize specification was wrong. At the time, there was a live debate around timelines, and a lot of discussions about the bio-anchors paper, which itself made in microcosm the same mistakes that I was pointing at.
Technically, the very-first-draft of this post was an extremely long and detailed argument for short AGI timelines that I co-wrote with my brother, but I realized while writing it that the presumption that long and short timelines should be in some sense averaged together to get a better estimate was pervasive in the zeitgeist and needed to be addressed on its own.
I am happy with this post because it started a conversation that I thought needed to be had. My whole shtick these days is that our community has seemingly tried to skip over decision theory basics in favor of esoterica, to our collective detriment, and I feel like writing this post explicitly helped with that.
I am happy to have seen this post referenced favorably elsewhere. I think I wrote it about as well as I could have, given that I was going for the specific Less Wrong Parable stylistic thing and not trying to write literary fiction.
There is also a weird accident-of-history situation where all of the optimizers we’ve had for the last century are really single-objective optimizers at their core. The consequence of this has been that people have gotten in the habit of casting their optimization problems (mathematical, engineering, economic) in terms of a single-valued objective function, which is usually a simple weighted sum of the values of the objectives that they really care about.
To unpack my language choices briefly: when designing a vase, you care about its weight, its material cost, its strength, its radius, its height, possibly 50 other things including corrosion resistance and details of manufacturing complexity. To “optimize” the vase design, historically, you needed to come up with a function that smeared away the detail of the problem into one number, something like the “utility” of the vase design.
This is sort of terrible, if you think about it. You sacrifice resolution to make the problem easier to solve, but there’s a serious risk that you end up throwing away what you might have considered to be the global optimum when you do this. You also baked in something like a guess as to what the tradeoffs should be at the Pareto frontier prior to actually knowing what the solution would look like. You know you want the strongest, lightest, cheapest, largest, most beautiful vase, but you can’t have all those things at once, and you don’t really know how those factors trade off against each other until you’re able to hold the result in your hands and compare it to different “optimal” vases from slightly different manifolds. Of course, you can only do that if you accept that you are significantly uncertain about your preferences, meaning the design and optimization process should partly be viewed as an experiment aimed at uncovering your actual preferences regarding these design tradeoffs, which are a priori unknown.
The vase example is both a real example and also a metaphor for how considering humans as agents under the VNM paradigm is basically the same but possibly a million times worse. If you acknowledge the (true) assertion that you can’t really optimize a vase until you have a bunch of differently-optimal vases to examine in order to understand what you actually prefer and what tradeoffs you’re actually willing to make, you have to acknowledge that a human life, which is exponentially more complex, definitely cannot be usefully treated with such a tool.
As a final comment, there is almost a motte-bailey thing happening where Rationalists will say that, obviously, the VNM axioms describe the optimal framework in which to make decisions, and then proceed to never ever actually use the VNM axioms to make decisions.
Charisma Skills Workshop | Guild of the Rose
This relates to my favorite question of economics: are graduate students poor or rich? This post suggests an answer I hadn’t thought of before: it depends on the attitudes of the graduate advisor, and almost nothing else.
Nuclear War, Map and Territory, Values | Guild of the Rose Newsletter, May 2024
Just in case people aren’t aware of this, drilling wells the “old fashioned way” is a very advanced technology. Typically a mechanically complex diamond-tipped tungsten carbide drill bit grinds its way down, while a fluid with precisely calibrated density and reactivity is circulated down the center of the drill string and back up the annulus between the drill string and edges of the hole, sweeping the drill cuttings up the borehole to the surface. A well 4 miles long and 8 inches wide has a volume of over 200,000L, meaning that’s the volume of rock that has to be mechanically removed from the hole during drilling. So that’s the volume of rock you would have to “blow” out of the hole with compressed air. You can see why using a circulating liquid with a reasonably high viscosity is more efficient for this purpose.
The other important thing about drilling fluid is that its density is calibrated to push statically against the walls of the hole as it is being drilled, preventing it from collapsing inward and preventing existing subsurface fluids from gushing into the wellbore. If you tried to drill a hole with no drilling fluid, it would probably collapse, and if it didn’t collapse, it would fill with high pressure groundwater and/or oil and/or explosive natural gas, which would possibly gush straight to the surface and literally blow up your surface facilities. These are all things that would almost inevitably happen if you tried to drill a hole using microwaves and compressed air.
tl;dr, drilling with microwaves might sense if you’re in space drilling into an asteroid, but makes so no sense for this application.
Talking to Golden Gate Claude reminds me of my relationship with my sense of self. My awareness of being Me is constantly hovering and injecting itself into every context. Is this what “self is an illusion” really means? I just need to unclamp my sense of self from its maximum value?
I think it is also good to consider that it’s the good-but-not-great hardware that has the best price-performance at any given point in time. The newest and best chips will always have a price premium. The chips one generation ago will be comparatively much cheaper per unit of performance. This has been generally true since I’ve started recording this kind of information.
As I think I mentioned in another comment, I didn’t mention Moore’s law at all because it has relatively little to do with the price-performance trend. It certainly is easy to end up with a superexponential trend when you have an (economic) exponential trend inside a (technological) exponential trend, but as other commenters point out, the economic term itself is probably superexponential, meaning we shouldn’t be surprised to see price-performance to fall more quickly than exponential even without exponential progress in chip speed.
You know those videos where a dog tries to carry a large stick through an opening in a fence, and the stick is too long to fit so it just keeps bumping against the verticals, and it’s obvious to a person watching that the dog would easily get the stick through if it just turned sideways, or rotated its head, or dragged the stick through by one end, or basically did anything at all other than what it is currently doing?
The other day I had on the kitchen counter a sort of floppy cloth place mat that was covered in crumbs and food debris. I tried to lift it and kind of bend it and then pour the crumbs and stuff into the sink. But because it was floppy and soft, instead I poured the crumbs all over the counter and floor. Maybe one-third made it into the sink.
My sister-in-law watched me do all this with the same expression you have on your face when you watch the dog try to get through the gate. We had a good laugh about it.
The point of this story is that smug intellectual superiority is really difficult to maintain when you think about all the moronic buffoonery that you have committed in your life. About all the absolute dumbass mistakes you’ve made. If you really contextualize your own self-image objectively with respect to your brilliancies and your blunders then you can’t help but see yourself as a kind of ridiculous Don Quixote-esque clown, a figure deserving more of bemused pity than anything.
And then you realize, this is just a description of mankind. You and me, benighted fools, are the proper referent for “people.” Dogs baffled by gates, bravely slamming our sticks against the verticals, and sometimes being struck by enough lucky inspiration to think of turning our heads. Locally smart, perhaps, when it concerns our favorite subjects. Capable of amazing things when we’re at our best. But really — try to remember the last time you returned a wave meant for the person standing behind you, while maintaining any sense of general misanthropy. It’s hard not to realize we’re all in this circus together.