Sounds a little like StarWeb? Recently read a lovely article about a similar but different game, Monster Island, which was a thing from 1989 to 2017.
But yes, my default assumption would be that the particular conversation you’re referring to never resulted in a game that saw the light of day; I’ve seen many detailed game design discussions among people I’ve known meet the same fate.
Thanks, I agree that’s a better analogy. Though of course, it isn’t necessary that none of the employees (participants in a sandwiching project) are unaware of the CEO’s (sandwiching project overseer’s) goal; I was only highlighting that they need not necessarily be aware of it in order to make it clear that the goals of the human helpers/judges aren’t especially relevant to what sandwiching, debate, etc. is really about. But of course if it turns out that having the human helpers know what the ultimate goal is helps, then they’re absolutely allowed to be in on it...Perhaps this is a bit glib, but arguably some of the most profitable companies in the mobile game space have essentially built product assembly lines to churn out fairly derivative games that are nevertheless unique enough to do well on the charts, and they absolutely do it by factoring the project of “making a game” into different bits that are done by different people (programmers, artists, voice actors, etc.), some of whom might not have any particular need to know what the product will look like as a whole to play their part.
However, I don’t want to press too hard on this game example as you may or may not consider this ‘cognitive work’ and as it has other disanalogies with what we are actually talking about here. And to a certain degree I share your intuition that factoring certain kinds of tasks is probably very hard: if it wasn’t, we might expect to see a lot more non-manufacturing companies whose employee main base consists of assembly lines (or hierarchies of assembly lines, or whatever) requiring workers with general intelligence but few specialized rare skills, which I think is the broader point you’re making in this comment. I think that’s right, although I also think there are reasons for this that go beyond just the difficulty of task factorization, and which don’t all apply in the HCH etc. case, as some other commenters have pointed out.
We start with some ML model which has lots from many different fields, like GPT-n. We also have a human who has a domain-specific problem to solve (like e.g. a coding problem, or a translation to another language) but lacks the relevant domain knowledge (e.g. coding skills, or language fluency). The problem, roughly speaking, is to get the ML model and the human to work as a team, and produce an outcome at-least-as-good as a human expert in the domain. In other words, we want to factorize the “expert knowledge” and the “having a use-case” parts of the problem....This sort of problem comes up all the time in real-world businesses. We could just as easily consider a product designer at a tech startup (who knows what they want but little about coding), an engineer (who knows lots about coding but doesn’t understand what the designer wants)...
These examples conflate “what the human who provided the task to the AI+human combined system wants” with “what the human who is working together with the AI wants” in a way that I think is confusing and sort of misses the point of sandwiching. In sandwiching, “what the human wants” is implicit in the choice of task, but the “what the human wants” part isn’t really what is being delegated or factored off to the human who is working together with the AI; what THAT human wants doesn’t enter into it at all. Using Cotra’s initial example to belabor the point: if someone figured out a way to get some non-medically-trained humans to work together with a mediocre medical-advice-giving AI in such a way that the output of the combined human+AI team is actually good medical advice, it doesn’t matter whether those non-medically-trained humans actually care that the result is good medical advice; they might not even individually know what the purpose of the system is, and just be focused on whatever their piece of the task is—say, verifying the correctness of individual steps of a chain of reasoning generated by the system, or checking that each step logically follows from the previous, or whatever. Of course this might be really time intensive, but if you can improve even slightly on the performance of the original mediocre system, then hopefully you can train a new AI system to match the performance of the original AI+human system by imitation learning, and bootstrap from there.
The point, as I understand it, is that if we can get human+AI systems to progress from “mediocre” to “excellent” (in other words, to remain aligned with the designer’s goal) -- despite the fact that the only feedback involved is from humans who wouldn’t even be mediocre at achieving the designer’s goal if they were asked to do it themselves—and if we can do it in a way that generalizes across all kinds of tasks, then that would be really promising. To me, it seems hard enough that we definitely shouldn’t take a few failed attempts as evidence that it can’t be done, but not so hard as to seem obviously impossible.
I just shared this info with an immune-compromised relative, thanks so much for this.
When I see young healthy people potentially obsessing, turning life into some sort of morbid probability matrix because one particular potential risk (Long Covid) has been made more salient and blameworthy, I sympathize a lot less.
ONS’s latest survey finds 2.8% of the UK population report that they are currently experiencing long COVID symptoms: 67% of that 2.8% report that the symptoms adversely affect their day-to-day activities. Separately, they’ve estimated that 70% of England has had COVID at least once; weighting their estimates for England/Scotland/Wales/NI suggests about 68% of the UK has had it. So conditional on having caught COVID at least once, we have ~3% of the population experiencing symptoms that adversely affect day-to-day activities for at least a month and often much longer. (Table 7 of the associated dataset implies that for each individual symptom, well over half have been experiencing those symptoms for “at least 12 weeks”, which is consistent with Fig 3 in this earlier survey.).
Anyway, if every time or few times that I catch COVID equates to a ~3% chance of long covid that adversely affects my day-to-day activities for a long time, for me that’s high enough that it justifies having categories of things that I do less often than I used to, categories of things that I do while masked, and categories that I do with no precautions. We don’t generally go around criticizing people for “obsessing” when they take other slightly inconvenient actions to mitigate other low-probability risks (wearing seatbelts; having a diet composed of more healthy-but-less-delicious than unhealthy-and-more-delicious foods; cutting down on alcohol; etc.). So this constant criticism of people who are choosing to make changes to reduce their long COVID risk does rub me the wrong way.
The poster’s concern is with long COVID, which can certainly have effects that a lot of people would consider severe. The “severe” COVID that has a baseline of less than 1% for the young and healthy refers to COVID that requires hospitalization. Long Covid rates are higher.
I was slightly surprised to find that even fine-tuning GPT-Neo-125M for a long time on many sequences of letters followed by spaces, followed by a colon, followed by the same sequence in reverse, was not enough to get it to pick up the pattern—probably because the positional encoding vectors make the difference between e.g. “18 tokens away” and “19 tokens away” a rather subtle difference. However, I then tried fine-tuning on a similar dataset with numbers in between (e.g. “1 W 2 O 3 R 4 D 5 S : 5 S 4 D 3 R 2 O 1 W”) (or similar representation—can’t remember exactly, but something roughly like that) and it picked up the pattern right away. Data representation matters a lot!
Thanks for digging into this a bit, and I should have linked directly to the paper rather than to an article with the headline “Heart-disease risk soars after COVID” with a focus on relative risks, since as you say the absolute risks are very important for putting things into perspective. For what it’s worth, I agree with Zvi’s final conclusion (“That’s not nothing, but it’s not enough that you shouldn’t live your life”).That said, an additional 1.2 out of every 100 people experiencing heart failure in the first 12 months after COVID-19 infection, if that holds up in reality, seems like it may have some effects at a population level (suggests that cardiologists will be in more demand, if nothing else). I can imagine that for some people with low risk tolerances, or with high preexisting cardiac risk, it might be a factor in wanting to live one’s life slightly differently than one did prepandemic.
It’d have been nice if they’d included a breakdown of when the vaccinated participants tested positive relative to when they were vaccinated. Supplementary table 21 notes that virtually no one was vaccinated prior to enrollment in the study, but that 62% in the COVID group (56% in the control group) had been vaccinated by the end.
Yes, this seems correct. Unfortunately it already sounds difficult to get out:
‘My future is taken away from me’: Russians flee to escape consequences of Moscow’s war | Russia | The Guardian
“Those seeking to leave faced a severe lack of available flights after western countries closed their airspace to Russian airlines. Moscow has also closed its airspace to much of the west in response.
Flights to Yerevan, Istanbul and Belgrade were completely sold out for the coming days while a one-way ticket to Dubai was priced at over £3,000 ($4,006) – compared with £250 ($334) in ordinary times – according to the flight aggregator Skyscanner. Train tickets from St Petersburg to Helsinki were also sold out on Thursday and Friday.”
Also sounds from the article like border officials are extensively questioning folks at the border, scrolling through any private chats that haven’t been deleted on messaging apps, etc. Be careful all.
Just to clarify—given that your first link seems concerned about athlete collapses/deaths following vaccination (supposedly, although the comments there imply insufficient fact-checking), but your second link is about athlete collapses/deaths following COVID-19 infection and your comment is on a post about long COVID, is your concern about heart issues following vaccination or following COVID infection?If the latter, yes, heart disease and stroke do seem to be more probable following COVID infection according to this recent large study. It should be noted that the control group came from 2017, but the effect sizes they find are so large that it doesn’t seem like differences in average heart disease frequency between 2017 and 2022 in a counterfactual world without COVID are especially relevant.
The way that data set is presented is infuriating – there are tables that list raw counts without reference to the sample size (maybe it’s an estimated raw number for the whole country, in which case they’re quite small)
This is the UK Office for National Statistics—their usual is to report estimated numbers for the whole country. Easy to miss but it’s in thousands—scroll to the far right of each table with raw numbers and you’ll see that stated near the top. So Table 1 estimates 1,332,000 UK residents with Long COVID, which is in line with the 2% figure stated in Table 4 if we assume that it’s talking about the whole country.
I presume this is listing their health conditions before Covid since it makes no sense the other way, but am still somewhat confused.
Footnote 7 says “Health/disability status is self-reported by study participants rather than clinically diagnosed. From February 2021 study participants were asked to exclude any symptoms related to COVID-19 when reporting their health/disability status. However, in practice it may be difficult for some participants to separate long COVID symptoms from unrelated exacerbation of pre-existing conditions, so these results should be treated with caution.”
What’s even stranger is this is now people who had Covid over 12 weeks ago, instead of the general population, and the estimate has gone down – 2.06% to 1.46%.
The title of the table can be parsed different ways, but pretty sure that what this table is showing is, “Of people living in private households with self-reported long COVID, what proportion of them say that they first had COVID at least 12 weeks previously” (1.46%). We can see from Footnote 1 that the definition of Long Covid for this study was “Would you describe yourself as having ‘long COVID’, that is, you are still experiencing symptoms more than 4 weeks after you first had COVID-19, that are not explained by something else?” So presumably the remaining 98.54% of people with self-reported long COVID said that they first had COVID at least 4 weeks but less than 12 weeks previously.
The table with the 2.06% is saying, “Of all people in the country, what percentage of them have long COVID of any duration”, i.e., 4 weeks or longer. So I don’t think there’s a contradiction here.
Matt Bell was referencing the UK data set above so I have no idea how he can get 2.8%, and in fact my reading of the link says he has it somewhat lower than that but still strangely high.
I also tried and failed to figure out how he gets this number.
There is a separate study by the Office for National Statistics with controls (a later one than the one Matt Bell mentions, with different methodology) that I found useful—report is here—though annoyingly it doesn’t break the data down by individual symptoms. Figures 1 and 2 also illustrative with respect to duration of symptoms. The report is pretty comprehensive but the data tables are here, Tables 1-4 show comparisons to controls.Bottom line is summarized by the points at the top, reproduced below; note that only “Approach 3″ uses self-reported long COVID:”Approach 1: Prevalence of any symptom at a point in time after infection. Among study participants with COVID-19, 5.0% reported any of 12 common symptoms 12 to 16 weeks after infection; however, prevalence was 3.4% in a control group of participants without a positive test for COVID-19, demonstrating the relative commonness of these symptoms in the population at any given time.
Approach 2: Prevalence of continuous symptoms after infection. Among study participants with COVID-19, 3.0% experienced any of 12 common symptoms for a continuous period of at least 12 weeks from infection, compared with 0.5% in the control group; this estimate of 3.0% is based on a similar approach to the one we published in April 2021 (13.7%), but is substantially lower because of a combination of longer study follow-up time and updated statistical methodology. The corresponding prevalence estimate when considering only participants who were symptomatic at the acute phase of infection was 6.7%.
Approach 3: Prevalence of self-reported long COVID. An estimated 11.7% of study participants with COVID-19 would describe themselves as experiencing long COVID (based on self-classification rather than reporting one of the 12 common symptoms) 12 weeks after infection, and may therefore meet the clinical case definition of post-COVID-19 syndrome, falling to 7.5% when considering long COVID that resulted in limitation to day-to-day activities; these percentages increased to 17.7% and 11.8% respectively when considering only participants who were symptomatic at the acute phase of infection.”
UK’s ONS has a nice comparison with controls which shows a clear difference, see Fig 1. (Note that this release uses laboratory-confirmed COVID-19 only, unlike some of their other releases.)
Given that the early data I’ve seen suggests that efficacy of 3 doses vs. omicron is similar to that of 2 doses vs. delta—probably a bit lower, but at least in the same universe—I’ve been using it largely as is, multiplying the final output by 2 to 3 based on what I’ve seen about the household transmission rate of Omicron relative to Delta. I know some other boosted people who have used it in a similar fashion. There’s so much uncertainty in the model assumptions that its best use in my view is to get very broad-strokes order-of-magnitude idea of the risk, which has been extremely useful for friends and relatives who have just wanted a baseline idea of whether the risk of getting COVID when participating in a particular activity is more like .01% or .1% or 1% or 10%. (Note: I doubt that said friends and relatives would have been able to use it in this way without my help, since it requires a little math and they’re not math types.) So I guess my main recommendations would be:
- don’t get rid of it even if you aren’t confident in the Omicron data—if you can produce results that are probably in the right order of magnitude, it’s still useful! If you aren’t up for a full Omicron overhaul, but you think there’s some back-of-the-envelope adjustment that could give results that are probably the correct order of magnitude, I think applying that—with suitable caveats about accuracy—would be preferable to taking the site down or leaving it as is.
- It’s easy to forget how many people are not math people whatsoever. Best practice in risk communication is often considered to be communicating numbers as percentages, as well as contextualized frequencies—not just ‘X-in-a-million’, but something like “X out of Y people (for context, Y is roughly the number of people living in Z)”—as there are a lot of people who don’t really understand percentages and need a little context to understand frequencies. In my ideal world the output would make the chance of getting COVID from this specific activity clear as a percentage and as a contextualized frequency, as well as the chance of getting COVID from this activity in a year under the assumption that you do this activity every N weeks, where N can be entered by the user.
Thank you for such a comprehensive rundown! I’ve bookmarked this as I expect/hope to be in a situation in the future when this comes in handy.
I hate to say it, but the images are not coming through for me, as perhaps you’ve already noticed!
Really, though, shouldn’t we be able to do something to protect the elderly or other vulnerable people without causing everyone else six months of financial hardship and lost relationships?”″Six months...” the man squirms. “I might need you to do this for a year or two.”
Not exactly a fair description of what the public health measures have been. What country has been in lockdown for “a year or two” (besides China)?> The harms caused by COVID suppression were larger than the harms of COVID itself for most people.Possibly, but I doubt the same can be said for the net hedon loss. The great-uncle who died of COVID may have been quite old, but he still probably had a few years ahead of him: an expected 11 if he was 75, or 6 if he was 85. Those are years his family misses out on spending with him as well. The 10% of those infected who are still experiencing symptoms after 12 weeks (not depression: most frequently fatigue, cough, headache, loss of taste, loss of smell, myalgia), most of whom are likely to still be experiencing these issues for another 12 weeks or more, are not mentioned, nor is the impact of this on their own lives and livelihoods.Most importantly, this really seems to strawman our poor bureaucrat, as he doesn’t even mention the actual point of these measures: to serve as a stopgap until herd immunity, ideally primarily by vaccination so as to mitigate the above harms + further harms caused by hospital overload. Meanwhile, vaccination is the primary thing that our actual public health bureaucrats have been hammering on for the past year. I get the feeling that this isn’t discussed in this post because it doesn’t fit the narrative.(edited to add context to initial quote)
The number of experiences I’ve had of reading an abstract and later finding that the results provided extraordinarily poor evidence for the claims (or alternatively, extraordinarily good evidence—hard to predict what I will find if I haven’t read anything by the authors before...) makes this system suspect. This seems partially conceded in the fictive dialogue (“You don’t even have to dig into the methodology a lot”) - but it helps to look at it at least a little. I knew a senior academic whose system was as follows: read the abstract (to see if the topic of the paper is of any interest at all) but don’t believe any claims in it; then skim the methodology and results and update based on that. This makes a bit more sense to me.
Relevant: From OpenAI’s “Training Verifiers To Solve Math Word Problems”: “We also note that it is important to allow the model to generate the full natural language solution before outputting a final answer. If we instead finetune a 6B model to directly output the final answer without any intermediate steps, performance drops drastically from 20.6% to 5.2%.” Also the “exploration” linked in the post, as well as my own little exploration restricted to modulo operations on many-digit numbers (via step-by-step long division!), on which LMs do very poorly without generating intermediate steps. (But see also Hendryks et al: “We also experiment with using step-by-step solutions. We find that having models generate their own step-by-step solutions before producing an answer actually degrades accuracy. We qualitatively assess these generated solutions and find that while many steps remain illogical, they are often related to the question. Finally, we show that step-by-step solutions can still provide benefits today. We find that providing partial ground truth step-by-step solutions can improve performance, and that providing models with step-by-step solutions at training time also increases accuracy.”)
https://www.bbc.co.uk/news/health-57667987 - Hard to say what’s “likely” with this government, but it’s what the Joint Committee on Vaccination and Immunisation has advised