For the preference learning skepticism, does this extend to the research direction (that isn’t yet a research area) of modelling long term preferences/preferences on reflection? This is more along the lines of the “AI-assisted deliberation” direction from ARCHES.To me it seems like AI alignment that can capture preferences on reflection could be used to find solutions to many of other problems. Though there are good reasons to expect that we’d still want to do other work (because we might need theoretical understanding and okay solutions before AI reaches the point where it can help on research, because we want to do work ourselves to be able to check solutions that AIs reach, etc.)It also seems like areas like FairML and Computational Social Choice will require preference learning as components—my guess is that people’s exact preferences about fairness won’t have a simple mathematical formulation, and will instead need to be learned. I could buy the position that the necessary progress in preference learning will happen by default because of other incentives.
One thing I’d like to see are some more fleshed out examples of the kinds of governance demands that you think might be important in the future and would be bottlenecked on research progress in these areas.
It seems that in principle a version of debate where only one agent makes statements and the other chooses which statements to expand could work, but it seems like it requires the judge to be very strict that the statement is 100% true. It seems hard to apply this kind of system to statements outside of formal mathematics.Systems where both agents can make statements seem like they might be less vulnerable to judges accepting statements that aren’t 100% true. For one example, if both agents take turns being the arguer, then if both agents submit a path that is judged to be correct, you can stipulate that the agent with the shortest path wins (like imposing a simplicity prior).
HCH could implement the decomposition oracle by searching over the space of all possible decompositions (it would just be quite expensive).
https://www.kialo.com/ lets people build debates on controversial topics in a heirarchical structure (more like stock debate, with both sides providing arguments), but doesn’t seem to have been used for explanations/arguments. I’d also be pretty interested to see more attempts at heirarchical explanations.
I think there are situations where you can still have subproblems where the output of the subproblem is long. A contrived example: suppose you have a problem where you want to calculate XOR(f(a), f(b)), where f(a) and f(b) are long strings. It seems reasonable to decompose into x=f(a), y=f(b), z=XOR(x, y), despite x and y being long, because there’s a simple way to combine them.If we had an AI system that could work on “making progress on a problem for an hour”, then write down a complete description of everything it had figured out and pass that to another AI system, I’d count that as dividing the problem into subproblems, just in a way that’s probably inefficient.I’d evaluate decompositions into subproblems by something like the total cost of solving a problem by dividing it into subproblems. Some decompositions would be efficent and others would be inefficient, sometimes this would be because the output is large but in other cases it could be because it takes a long time to write the input, or because there’s a lot of work repeated between subproblems.
Okay, makes more sense now, now my understanding is that for question X, answer from ML system Y, amplification system A, verification in your quote is asking the A to answer “Would A(Z) output answer Y to question X?”, as opposed to asking A to answer “X”, and then checking if it equals “Y”. This can at most be as hard as running the original system, and maybe could be much more efficient.
From the COVID Symptom Study in the UK (app based questionaire), “10 per cent of those taking part in the survey had symptoms of long Covid for a month, with between 1.5 and 2 per cent still experiencing them after three months”, and they claim “long Covid is likely a bigger issue than excess deaths as a result of Covid, which are between 0.5 per cent and 1 per cent”.
App-based survey, so not necessarily representative of population. Not clear how severe the 3 month cases are, though they state “The most common reported symptom has been described by doctors as “profound fatigue””. Article also summarizes other related studies.
Right, but in the post the implicitly represented Z is used by an amplification or debate system, because it contains more information than a human can quickly read and use (so are you assuming it’s simple to verify the results of amplification/debate systems?)
for extremely large Z which are represented only implicitly as in Paul’s post, we might not always check whether the model matches the ground truth by actually generating the ground truth and instead just ask the human to verify the answer given Z
I’m not sure what “just ask the human to verify the answer given Z” looks like, for implicitly represented Z
I’m skeptical of this.
Wuhan needed 2 months on lockdown: https://en.wikipedia.org/wiki/COVID-19_pandemic_lockdown_in_Hubei
I’d expect that imposing China-style lockdowns in the West would require significant force and might end up causing a large-scale panic in and of itself.
I’d expect that any lockdown in the West wouldn’t have been effective enough to stamp out 100% of cases, and if you don’t eradicate it then you need ongoing measures or it will just flare up again later, so one strictly enforced lockdown wouldn’t cut it. (Though maybe you could do very rigorous contact tracing and lock down just people who might have been in contact with cases, which could be less costly than full lockdown but probably still need significant enforcement).
https://www.microcovid.org/paper/2-riskiness#fn6 discusses https://covid.joinzoe.com/post/covid-long-term which has an app-based survey claiming 1 in 10 people still have symptoms after 3 weeks. (but since people can just sign up for the app I’d guess this is harder to know how to interpret than the telephone survey). Microcovid.org uses this 1 in 10 figure as the estimate for chance of some ongoing health consequence, and claims the risk of ongoing health problems from a 1% chance of COVID is equivalent to the risk from 1 year of driving (but this comparison involves even more assumptions).
https://www.cdc.gov/mmwr/volumes/69/wr/mm6930e1.htm found that ~1 in 5 of 18-34 year olds with no underlying health conditions had symptoms 3 weeks later (telephone survey of people who’d been symptomatic and had a positive test).
Other discussion in comments of https://www.lesswrong.com/posts/ahYxBHLmG7TiGDqxG/do-we-have-updated-data-about-the-risk-of-permanent-chronic
Not addressing fatigue, and just a study in progress, but this study is looking for long term neurological problems, might another weak bit of evidence when it releases results https://www.cambridgebrainsciences.com/studies/covid-brain-study
Seems like it could be helpful if people who’ve thought about this would also predict on the question of what the survey value would be today. (e.g. via elicit snapshots)
Google’s Meena (2.6 billion parameters, February 2020) creates original puns in “Conversation D”, and I think “Cross-turn Repetition Example 2″ is absurdist comedy, but maybe more as a result of the way the model fails.
Human: I like how cute cows are
Meena: Ah, interesting. You like how they are cute, right?
Human: yeah, and they’re realy smart
Meena: I heard they go to college
Human: cows go to college?
Meena: I heard that a cow went to Harvard.
Human: what did the cow study?
Meena: Bovine sciences.
Human: do horses go to Harvard?
Meena: Horses go to Hayvard.
Human: that’s a pretty good joke, I feel like you led me into it
Meena: You were trying to steer it elsewhere, I can see it.
Are there any sources that describe why 14 months is the trial period (or provide justification for picking trial periods of various lengths)?
Seems like it ought to be more of a continuous variable, rather than this discrete 14 month trial: at time t, we’ve observed x people for y months to see if they have wierd long-term side effects, so we should be willing to vaccinate z more people.
The chrome extention Netflix Party lets you synchronize playing the same video on netflix other people, which you can use along with Skype to watch something together.
(You can always fall back to counting down “3,2,1” to start playing the video at the same time, but the experience is nicer if you ever need to pause and resume)
The worry I’d have about this interpretability direction is that we become very good at telling stories about what 95% of the weights in neural networks do, but the remaning 5% hides some important stuff, which could end up including things like mesa-optimizers or deception. Do you have thoughts on that?