Mathematician turned alignment researcher. Probably happy to chat about math, current ML, or long-term AI thoughts.
The basics—Nathaniel Monson (nmonson1.github.io)
Nathaniel Monson
If the review panel recommends a paper for a spotlight, there is a better than 50% chance a similarly-constituted review panel would have rejected the paper from the conference entirely:
https://blog.neurips.cc/2021/12/08/the-neurips-2021-consistency-experiment/
Can you name the organization?
I have actually tried this, not in tug-of-war, but with moving a stuck car (one end affixed to car, one end to a tree or lamppost or something). In that situation, where the objects aren’t actively adjusting to thwart you, it works quite well!
I appreciated your post, (indeed, I found it very moving) and found some of the other comments frustrating as I believe you did. I think, though, that I can see a part of where they are coming from. I’ll preface by saying I don’t have strong beliefs on this myself, but I’ll try to translate (my guess at) their world model.
I think the typical EA/LWer thinks that most charities are ineffective to the point of uselessness, and this is due to them not being smart/rational about a lot of things (and are very familiar with examples like the millennium village). They probably believe it costs roughly 5000 USD to save a life, which makes your line “Many of us are used to the ads that boast of every 2-3 dollars saving a life...” read like you haven’t engaged much with their world. They agree that institutions matter a huge amount and that many forms of aid fail because of bad institutions.
They probably also believe the exact shape of the dose-response curve to treating poverty with direct aid is unknown, but have a prior of it being positively sloped but flatter than we wish. There is a popular rationalist technique of “if x seems like it is helping the problem, just not as much as you wish, try way way more x.” (Eg, light for SAD)
I would guess your post reads to them like someone finding out that the dose-response curve is very flat and and that many charities are ineffective and then writing “maybe the dose-response curve isn’t even positively sloped!” It reads to them like the claim “no (feasible) amount of direct aid will help with poverty” followed by evidence that the slope is not as steep as we all wish. I don’t think any of your evidence suggests aid cannot have a positive effect, just that the amount necessary for that effect to be permanent is quite high.
Add this to your ending by donating money to give directly, and it seems like you either are behaving irrationally, or you agree that it has some marginal positive impact and were preaching to the choir.
As I said, I appreciated it, and the work that goes into making your world model and preparing it for posting, and engaging with commenters. Thank you.
If you click the link where OP introduces the term, it’s the Wikipedia page for psychopathy. Wiki lists 3 primary traits for it, one of which is DAE
This is (related to) a very old idea: https://en.wikipedia.org/wiki/Method_of_loci
Thanks for doing the math on this :)
My first instinct is that I should choose blue, and the more I’ve thought about it, the more that seems correct. (Rough logic: The only way no-one dies is if either >50% choose blue, or if 100% choose red. I think chances of everyone choosing red are vanishingly small, so I should push in the direction with a wide range of ways to get the ideal outcome.)
I do think the most important issue not mentioned here is a social-signal, first-mover one: If, before most people have chosen, someone loudly sends a signal of “everyone should do what I did, and choose X!”, then I think we should all go along with that and signal-boost it.
This is more a tangent than a direct response—I think I fundamentally agree with almost everything you wrote—but I dont think virtue ethics requires tossing out the other two (although I agree both of the others require tossing out each other).
I view virtue ethics as saying, roughly, “the actually important thing almost always is not how you act in contrived edge case thought experiments, but rather how how habitually act in day to day circumstances. Thus you should worry less, probably much much less, about said thought experiments, and worry more about virtuous behavior in all the circumstances where deontology and utilitarianism have no major conflicts”. I take it as making a claim about correct use of time and thought-energy, rather than about perfectly correct morality. It thus can extend to ”...and we think (D/U) ethics are ultimately best served this way, and please use (D/U) ethics if one of those corner cases ever shows up” for either deontology or (several versions of) utilitarianism, basically smoothly.
Those doesn’t necessarily seem correct to me. If, eg, OpenAI develops a super intelligent, non deceptive AI, then I’d expect some of the first questions they’d ask it to be of the form “are there questions which we would regret asking you, according to our own current values? How can we avoid asking you those while still getting lots of use and insight from you? What are some standard prefaces we should attach to questions to make sure following through on your answer is good for us? What are some security measures that we can take to make sure our users lives are generally improved by interacting with you? What are some security measures we can take to minimize the chances of a world turning out very badly according to our own desires?” Etc.
I am genuinely confused why this is on lesswrong instead of EA. What do you think the distribution of giving money is like on each place, and what do you think the distribution of responses to drowning child is like on each?
Alright, I’ll bite. As a CDT fan, I will happily take the 25 dollars. I’ll email you on setting up the experiment. If you’d like, we could have a third party hold money in escrow?
I’m open to some policy which will ceiling our losses if you don’t want to risk $2050, or conversely, something which will give a bonus if one of us wins by more than $5 or something.
As far as Newcomb’s problem goes, what if you find a super intelligent agent that says it tortures and kills anyone who would have oneboxed in Newcomb? This seems roughly as likely to me as finding the omega from the original problem. Do you still think the right thing to do now is commit to oneboxing before you have any reason to think that commitment has positive EV?
One issue that I think OpenAI didn’t convince me they had dealt with is that saying “neuron activations are well correlated with x” is different from being able to say what specifically a neuron does mechanistically. I think of this similarly to how I think of the limitations of picking max activating examples from a dataset or doing gradient methods to find high activations: finding the argmax of a function doesn’t necessarily tell you much about the functions...well, functionality.
This seems like it might have a related obstacle. While this method could eg make it easier to find a focus for mechanistic interpretability, I think the bulk of the hard work would still be ahead.
Relatedly, I’d really like to be able to attach private notes to author’s names. There are pairs of people on LW with names I find it easy to mistake, and being able to look at the author of a post or comment and see a self-note “This is the user who is really insightful about X” or “Don’t start arguing with this person, it takes forever and goes nowhere” etc would be very helpful.
Lots of your comments on various posts seem rude to me—should I be attempting to severely punish you?
This was very interesting, thanks for writing it :)
My zero-knowledge instinct is that sound-wave communication would be very likely to evolve in most environments. Motion → pressure differentials seems pretty inevitable, so would almost always be a useful sensory modality. And any information channel that is easy to both sense and affect seems likely to be used for communication. Curious to hear your thoughts if your intuition is that it would be rare.
I’m an AI alignment researcher who will be moving to the bay area soon (the two facts are unconnected—I’m moving due to my partner getting a shiny new job). I’m interested in connecting with other folks in the field, and feeling like I have coworkers. My background is academic mathematics.
1) In academia, I could show up at a department (or just look at their website) and find a schedule of colloquia/seminars/etc. ranging from 1-month to 10+/week, (depending on department size etc.). Are there similar things I should be aware of for AI folks near the bay?
2) Where (if anywhere) do independent alignment people tend to work (and how could I go about renting an office there?) I’ve heard of Constellation and Rose Garden Inn as the locations for several alignment organizations/events—do they also have office spaces for independent researchers?
3) Anything else I should know about?
I agree we are an existence proof for general intelligence. For alignment, what is the less intelligent thing whose goals humanity has remained robustly aligned to?
Yep! It might be easier to visualize with a train on tracks—the rope needs to be parallel to the intended direction of movement. Suppose the rope is nearly perfectly taut and tied to something directly in front of the train. Pulling the rope sideways w 100 newtons requires the perp component of force to be 100, definitionally. But the rope can only exert force along itself, so if it missed being taut by radians, it’ll be exerting enough force that . But if the rope is very close to perfectly taut, then , so (in the limit), you’re exerting infinite force.
This fades pretty quickly as the rope gets away from the 0 angle, so you then need to secure the car so it won’t move back (rocks under tires or something), and re-tighten the rope, and iterate.
I think it is 2-way, which is why many (almost all?) Alignment researchers have spent a significant amount of time looking at ML models and capabilities, and have guesses about where those are going.
Not OP, but relevant—I spent the last ~6 months going to meetings with [biggest name at a top-20 ML university]’s group. He seems to me like a clearly very smart guy (and very generous in allowing me to join), but I thought it was quite striking that almost all his interests were questions of the form “I wonder if we can get a model to do x”, or “if we modify the training in way y, what will happen?” A few times I proposed projects about “maybe if we try z, we can figure out why b happens” and he was never very interested—a near exact quote of his in response was “even if we figured that out successfully, I don’t see anything new we could get [the model] to do”.
At one point I explicitly asked him about his lack of interest in a more general theory of what neural nets are good at and why—his response was roughly that he’s thought about it and the problem is too hard, comparing it to P=NP.
To be clear, I think he’s an exceptionally good ML researcher, but his vision of the field looks to me more like a naturalist studying behavior than a biologist studying anatomy, which is very different from what I expected (and from the standard my shoulder-John is holding people to).
EDITED—removed identity of Professor.