Are there types of published alignment research that you think were (more likely to be) good to publish? If so, I’d be curious to see a list.
Chi Nguyen
I would like if there was a well-researched LessWrong post on the pros and cons of different contraceptives. - Same deal with a good post on how to treat or prevent urinary tract infection, although I’m less excited about that.
I’d be willing to pay some from my private money for this to get done. Maybe up to £1000? Open to considering higher amounts.
It would mostly be a public service as I’m kind of fine with my current contraception. So, I’m also looking for people to chip in (either to offer more money or just to take some of the monetary burden off me!)
Examples of content that I would like to see included:
Clarity on the contraception and depression question. e.g. apparently theory says that hormonal IUDs should give you less depression risk than pills, but in empirical studies it looks like it’s the other way around? Can I trust the studies?
Some perspective on the trade-offs involved. E.g. maybe I can choose between a 5% increased chance of depression vs. a 100% increased chance of blood clots. But maybe basically no one gets blood clots anyway, and then I’d rather take the increased blood clot risk! But because the medical system cares more about death than me, my doctor will never recommend me the blood clot one, or something like that.
If there wasn’t already a post on this (but I think there is), info on that it’s totally fine to *not* take 7 day pill breaks every months, but that you can just take the pill all the time. (Although I think it might be recommended to take a short break every X months)
Some realistic outlook on how much pain and effects on menstruation I should expect
Various potential benefits from contraceptives aside from contraception
On the UTI side: Is the cranberry stuff a myth or is it a myth that it’s a myth or is it a myth that it’s a myth that it’s a myth?
Alternatively: If there actually already are really good resources on this topic out there, please let me know!
edit: We’re sorted :)
Hello, I’m Chi, the friend, in case you wanna check out my LessWrong, although my EA forum account probably says more. Also, £50 referral bonus if you refer a person we end up moving in with!
Also, we don’t really know whether the Warren Street place will work out but are looking for flatmates either way. Potential other accommodation would likely be in N1, NW1, W1, or WC1
Greg Brockman and Sam Altman (cosigned):
[...]
First, we have raised awareness of the risks and opportunities of AGI so that the world can better prepare for it. We’ve repeatedly demonstrated the incredible possibilities from scaling up deep learningchokes on coffee
Thanks for posting this! I really enjoyed the read.
Feedback on the accompanying poll: I was going to fill it out. Then saw that I have to look up and list the titles I can (not) relate to instead of just being able to click “(strongly) relate/don’t relate” on a long list of titles. (I think the relevant function for this in forms is “Matrix” or something). And my reaction was “ugh, work”. I think I might still fill it in but I’m muss less likely to. If others feel the same, maybe you wanna change the poll?
Hi, thanks for this comment and the links.
I agree that it’s a pretty vast topic. I agree that the questions are personalized in the sense that there are many different personal factors to this question, although the bullets I listed weren’t actually really personalized to me. One hope I had with posting to LessWrong was that I trust people here to be able to do some of the “what’s most relevant to include” thinking, (e.g.: everything that affects ≥10% of women between 20 and 40 + everything that’s of more interest on LessWrong than elsewhere (e.g. irreversible contraception)) I agree it’s a tall order though.
For talking to my doctor: I found my experience of talking to doctors pretty frustrating to be honest. I think I’ve learned much more about contraception (including about where my doctors were misinformed) via the internet or friends than doctors. I don’t doubt that there are excellent doctors out there, but it’s difficult to find them. The advice with looking up people who wrote up medical guidelines seems solid.
That being said, while I’m interested in the topic myself, I was mostly thinking that it would be good for the LessWrong/EA community to have a reliable source. (I’m mostly constrained to hormonal contraception and have already tried out a couple, so my remaining search space is relatively small.) I think it could save lots of women many hours of research into which contraception to take + productivity loss from trying out or permanently choosing suboptimal contraception.
You prompted me to try out the D-Mannose, thanks! I’ve had it lying around, but was always to inert to research whether it actually works, so never bothered to take it.
I find this comment super interesting because
a) before, I would have expected many more people to be scared of being eaten by piranhas on LessWrong and not the EA Forum than vice versa. In fact, I didn’t even consider that people could find the EA Forum more scary than LessWrong. (well, before FTX anyway)
b) my current read of the EA Forum (and this has been the case for a while) is that forum people like when you say something like “People should value things other than impact (more)” and that you’re more likely to be eaten by piranhas for saying “People should value impact more” than vice versa.
Take this a slight nudge towards posting on the EA Forum perhaps, although I don’t really have an opinion on whether 2) and 3) might still be true.
Sorry for replying so late! I was quite busy this week.
I initially wanted to commission someone and expected that I’d have to pay 4 digits. Someone suggested I put down a bounty. I’m not familiar with putting bounties on things and I wanted to avoid getting myself in a situation where I feel like I have to pay the full amount for
work that’s poor
work that’s decent but much less detailed than I had envisioned
multiple reports each
I think I’m happy to pay the full amount for a report that is
transparent in its reasoning, so I can trust it,
tells me how much to trust study results, e.g., describes potential flaws and caveats for the studies they looked at,
roughly on the level of detail that’s indicated by what I wrote under “the type of content I would like to see included”. Ideally, the person writing wouldn’t treat my list as a shopping list, but use their common sense to include the things I’d be interested in
the only report of this type that claims the bounty
The first two are the most important ones. (And the last one is weird) If It’s much less detailed, but fulfills the other criteria, I’d still be happy to pay triple digit.
As you’re later comment says, I think this is a pretty complex topic, and I can imagine that £2000 wouldn’t actually cover the work needed to do such a report well.
I think before someone seriously puts time into this, they should probably just contact me. Both to spare awkward double work + submissions. And to set expectations on the payment. I’ll edit my post to be clearer on this.
Thanks! I already mention this in the post, but just wanted to clarify that Paul only read the first third/half (wherever his last comment is) in case people missed that and mistakenly take the second half at face value.
Edit: Just went back to the post and noticed I don’t really clearly say that.
minus Cullen O’Keefe who worked on policy and legal (so was not a clear cut case of working on safety),
I think Cullen was on the same team as Daniel (might be misremembering), so if you count Daniel, I’d also count Cullen. (Unless you wanna count Daniel because he previously was more directly part of technical AI safety research at OAI.)
Whoa, I didn’t know about this survey, pretty cool! Interesting results overall.
It’s notable that 6% of people also report they’d prefer absolute certainty of hell over not existing, which seems totally insane from the point of view of my preferences. The 11% that prefer a trillion miserable sentient beings over a million happy sentient beings also seems wild to me. (Those two questions are also relatively more correlated than the other questions.)
First of all: Thanks for asking. I was being lazy with this and your questions forced me to come up with a response which forced me to actually think about my plan.
Concrete changes
1) I’m currently doing week-daily in-person Pomodoro co-working with a friend, but I had planned that before this post IIRC, and definitely know for a while that that’s a huge boost for me.
In-person co-working and the type of work I do seem somewhat situational/hard to sustain/hard to quickly change sometimes. For some reason, (perhaps because I feel a bit meh about virtual co-working) I’ve never tried Focusmate and this made me more likely to try it in the future if and when my in-person co-working fizzled out.
2) The things that were a high mix of resonating with me and new were “Identifying as hard-working” and “Finding ways of reframing work as non-work” (I was previously aware that often things would be fun if I didn’t think of them as work and are “Ugh” as soon as there are work, but just knowing that there is another person who is successfully managing this property of theirs is really encouraging and helpful for thinking about solutions to this.)
Over the last few months, I’ve introduced the habit of checking in with myself at various times during the day and especially when I’m struggling with something (kind of spontaneous mini meditations). I’m hoping that I can piggy-back on that to try out the identity and reframing thing. (Although this comment just prompted be to actually go and write those down on post-its and hang them where I can see them, so I don’t forget, so thanks for asking!)
3) I am currently testing out having a productive hobby for my weekends. (This ties into not reframing work things as “not work”. Also, I am often strict with my weekends in a way that I wanna experiment with relaxing given one of the responses I got. Also prompted by the concept of doing something enjoyable and rewarding to regenerate instead of resting.) I’ll monitor the effects on my mental health on that quite closely because I think it could end up quite badly but has been fun this weekend.
3.5) I often refrain from doing work things I feel energy and motivation for because it’s too late in the day or otherwise “not work-time”. I think this overall serves me well in various ways. But as a result of this post, I am more likely to try relaxing this into the future a bit. I am already tracking my work and sleep hours, so hopefully, that will give me some basis to check how it affects my productivity. (And also 4 will hopefully help.)
4) Not directly as a consequence of this post, but related: I started thinking about how to set work targets for different time intervals and consistently setting and reviewing work targets. (It was kind of crazy to realise that I don’t already do this! Plans ≠ Targets.) This is a priority for me at the moment and I am interviewing people about this. I expect this to feed into this whole hard-working topic and maybe some of the responses about working hard will influence how I go about this.
Other minor updates or things that I won’t try immediately but that I’m more likely to try in the future now:
Decided not to prioritise improving diet, exercise, and sleep for the sake of becoming more hard-working.
Not being frustrated that there is no magical link: general growth as a person --> more hard-working
Maybe: Using the Freedom App (I’ve made good experiences with Cold Turkey but it’s not on my phone.)
Maybe: Doing more on paper
Maybe: Kanban boards
Maybe: Meetings with myself
Maybe: Experiment with stimulants (I can get them prescribed but dropped them for various reasons)
Some overall remarks
My biggest update was just learning about people permanently becoming more hard-working at all well into their 20s through means that aren’t only either meds or changing roles, meaning there is a point to me trying more non-med things that might increase how hard-working I am in the short-term. Previously, I was really unsure to which degree hard-workingness might just be a very stable trait across a lifetime. At least if you don’t drastically change the kind of work you do or your work environment in ways that are difficult to actually pull off. Tbf, I’m still not sure but am more hopeful than previously.
From that point of view, I found the people who mentioned having a concrete, time-constrained period where they were much more hard-working than previously for some reason and then keeping this going forward even when ~everything about their work situation changed really encouraging.
For context: I tracked my work hours for roughly a year. My week-to-week tends to be very heterogenous and through the tracking, I realised that none of the things I tracked during that year seemed to have any relationship to how much I work week-to-week other than having hard “real” deadlines and the overall trend was very flat, which felt a bit discouraging.
Thanks! I actually agree with a lot of what you say. Lack of excitement about existing intervention ideas is part of the reason why I’m not all in on this agenda at the moment. Although in part I’m just bottlenecked by lack of technical expertise (and it’s not like people had great ideas for how to align AIs at the beginning of the field...), so I don’t want people to overupdate from “Chi doesn’t have great ideas.”
With that out of the way, here are some of my thoughts:
We can try to prevent silly path-dependencies in (controlled or uncontrolled i.e. misaligned) AIs. As a start, we can use DT benchmarks to study how DT endorsements and behaviour change under different conditions and how DT competence scales with size compared to other capabilities. I think humanity is unlikely to care a ton about AI’s DT views and there might be path-dependencies. So like, I guess I’m saying I agree with “let’s try to make the AI philosophically competent.”
This depends a lot on whether you think there are any path-dependencies conditional on ~solving alignment. Or if humanity will, over time, just be wise enough to figure everything out regardless of the starting point.
One source of silly path-dependencies is if AIs’ native DT depends on the training process and we want to de-bias against that. (See for example this or this for some research on what different training processes should incentivise.) Honestly, I have no idea how much things like that matter. Humans aren’t all CDT even though my very limited understanding of evolution is that it should, in the limit, incentivise CDT.
I think depending on what you think about the default of how AIs/AI-powered earth-originating civilisation will arrive at conclusions about ECL, you might think some nudging towards the DT views you favour is more or less justified. Maybe we can also find properties of DTs that we are more confident in (e.g. “does this or that in decision problem X” than whole specified DTs, which, yeah, I have no clue. Other than “probably not CDT.”
If the AI is uncontrolled/misaligned, there are things we can do to make it more likely it is interested in ECL, which I expect to be net good for the agents I try to acausally cooperate with. For example, maybe we can make misaligned AI’s utility function more likely to have diminishing returns or do something else that would make its values more porous. (I’m using the term in a somewhat broader way than Bostrom.)
This depends a lot on whether you think we have any influence over AIs we don’t fully control.
It might be important and mutable that future AIs don’t take any actions that decorrelate them with other agents (i.e. does things that decrease the AI’s acausal influence) before they discover and implement ECL. So, we might try to just make it aware of that early.
You might think that’s just not how correlation or updatelessness work, such that there’s no rush. Or that this is a potential source of value loss but a pretty negligible one.
Things that aren’t about making AIs more likely to do ECL: Something not mentioned, but there might be some trades that we have to do now. For example, maybe ECL makes it super important to be nice to AIs we’re training. (I am mostly lean no on this question (at least for “super important”) but it’s confusing.) I also find it plausible we want to do ECL with other pre-ASI civilisations who might or might not succeed at alignment and, if we succeed and they fail, part-optimise for their values. It’s unclear to me whether this requires us to get people to spiritually commit to this now before we know whether we’ll succeed at alignment or not. Or whether updatelessness somehow sorts this because if we (or the other civ) were to succeed at alignment, we would have seen that this is the right policy, and done this retroactively.
Yeah, you’re right that we assume that you care about what’s going on outside the lightcone! If that’s not the case (or only a little bit the case), that would limit the action-relevance of ECL.
(That said, there might be some weird simulations-shenanigans or cooperating with future earth-AI that would still make you care about ECL to some extent although my best guess is that they shouldn’t move you too much. This is not really my focus though and I haven’t properly thought through ECL for people with indexical values.)
Thanks! I felt kind of sheepish about making a top-level post/question out of this but will do so now. Feel free to delete my comment here if you think that makes sense.
Copied from my comment on this from the EA forum:
Yeah, that’s a bit confusing. I think technically, yes, IDA is iterated distillation and amplification and that Iterated Amplification is just IA. However, IIRC many people referred to Paul Christiano’s research agenda as IDA even though his sequence is called Iterated amplification, so I stuck to the abbreviation that I saw more often while also sticking to the ‘official’ name. (I also buried a comment on this in footnote 6)
I think lately, I’ve mostly seen people refer to the agenda and ideas as Iterated Amplification. (And IIRC I also think the amplification is the more relevant part.)
I agree that it’s very not ideal and maybe I should just switch it to Iterated Distillation and Amplification :/
The “entity giving the payout” in practice for ECL would be just the world states you end up in and requires you to care about the environment of the person you’re playing the PD with.
So, defecting might be just optimising my local environment for my own values and cooperating would be optimising my local environment for some aggregate of my own values and the values of the person I’m playing with. So, it only works if there are positive-sum aggregates and if each player cares about what the other does to their local environment.
Letting on-lookers know that I responded in this comment thread
Thanks for the comment and I’m glad you like the post :)
On the other topic: I’m sorry, I’m afraid I can’t be very helpful here. I’d be somewhat surprised if I’d have had a good answer to this a year ago and certainly don’t have one now.
Some cop-out answers:
I often found reading his (discussions with others in) comments/remarks about corrigibility in posts focused on something else more useful to find out if his thinking changed on this than his blog posts that were obviously concentrating on corrigibility
You might have some luck reading through some of his newer blogposts and seeing if you can spot some mentions there
In case this was about “his current views” as opposed to “the views I tried to represent here which are one year old”: The comments he left are from this summer, so you can get some idea from there/maybe assume that he endorses the parts I wrote that he didn’t commented on (at least in the first third of the doc or so when he still left comments)
FWIW, I just had through my docs and found “resources” doc with the following links under corrigiblity:
Can corrigibility be learned safely?
Problems with amplification/distillation
Addressing three problems with counterfactual corrigibility
Not vouching for any of those being the up-to-date or most relevant ones. I’m pretty sure I made this list early on in the process and it probably doesn’t represent what I considered the latest Paul-view.
Correct me if I’m wrong but isn’t Conjecture legally a company? Maybe their profit model isn’t actually foundation models? Not actually trying to imply things, just thought the wording was weird in that context and was wondering whether Conjecture has a different legal structure than I thought.