AI engineer
alyssavance
Vote up this comment if you would be most likely to read a post on Less Wrong or another friendly blog.
I appreciate the effort, and I agree with most of the points made, but I think resurrect-LW projects are probably doomed unless we can get a proactive, responsive admin/moderation team. Nick Tarleton talked about this a bit last year:
“A tangential note on third-party technical contributions to LW (if that’s a thing you care about): the uncertainty about whether changes will be accepted, uncertainty about and lack of visibility into how that decision is made or even who makes it, and lack of a known process for making pull requests or getting feedback on ideas are incredibly anti-motivating.” (http://lesswrong.com/lw/n0l/lesswrong_20/cy8e)
That’s obviously problematic, but I think it goes way beyond just contributing code. As far as I know, right now, there’s no one person with both the technical and moral authority to:
set the rules that all participants have to abide by, and enforce them
decide principles for what’s on-topic and what’s off-topic
receive reports of trolls, and warn or ban them
respond to complaints about the site not working well
decide what the site features should be, and implement the high-priority ones
Pretty much any successful subreddit, even smallish ones, will have a team of admins who handle this stuff, and who can be trusted to look at things that pop up within a day or so (at least collectively). The highest intellectual-quality subreddit I know of, /r/AskHistorians, has extremely active and rigorous moderation, to the extent that a majority of comments are often deleted. Since we aren’t on Reddit itself, I don’t think we need to go quite that far, but there has to be something in place.
I think saying “we” here dramatically over-indexes on personal observation. I’d bet that most overweight Americans have not only eaten untasty food for an extended period (say, longer than a month); and those that have, found that it sucked and stopped doing it. Only eating untasty food really sucks! For comparison, everyone knows that smoking is awful for your health, it’s expensive, leaves bad odors, and so on. And I’d bet that most smokers would find “never smoke again” easier and more pleasant (in the long run) than “never eat tasty food again”. Yet, the vast majority of smokers continue smoking:
https://news.gallup.com/poll/156833/one-five-adults-smoke-tied-time-low.aspx
I edited the MNIST bit to clarify, but a big point here is that there are tasks where 99.9% is “pretty much 100%” and tasks where it’s really really not (eg. operating heavy machinery); and right now, most models, datasets, systems and evaluation metrics are designed around the first scenario, rather than the second.
Intentional murder seems analogous to misalignment, not error. If you count random suicides as bugs, you get a big numerator but an even bigger denominator; the overall US suicide rate is ~1:7,000 per year, and that includes lots of people who have awful chronic health problems. If you assume a 1:20,000 random suicide rate and that 40% of people can kill themselves in a minute (roughly, the US gun ownership rate), then the rate of not doing it per decision is ~20,000 * 60 * 16 * 365 * 0.4 = 1:3,000,000,000, or ~99.99999997%.
You say “yet again”, but random pilot suicides are incredibly rare! Wikipedia counts eight on commercial flights in the last fifty years, out of a billion or so total flights, and some of those cases are ambiguous and it’s not clear what happened: https://en.wikipedia.org/wiki/Suicide_by_pilot
This is just a guess, but I think CFAR and the CFAR-sphere would be more effective if they focused more on hypothesis generation (or “imagination”, although that term is very broad). Eg., a year or so ago, a friend of mine in the Thiel-sphere proposed starting a new country by hauling nuclear power plants to Antarctica, and then just putting heaters on the ground to melt all the ice. As it happens, I think this is a stupid idea (hot air rises, so the newly heated air would just blow away, pulling in more cold air from the surroundings). But it is an idea, and the same person came up with (and implemented) a profitable business plan six months or so later. I can imagine HPJEV coming up with that idea, or Elon Musk, or von Neumann, or Google X; I don’t think most people in the CFAR-sphere would, it’s just not the kind of thing I think they’ve focused on practicing.
Clients are free to publish whatever they like, but we are very strict about patient confidentiality, and do not release any patient information without express written consent.
I think it’s true, and really important, that the salience of AI risk will increase as the technology advances. People will take it more seriously, which they haven’t before; I see that all the time in random personal conversations. But being more concerned about a problem doesn’t imply the ability to solve it. It won’t increase your base intelligence stats, or suddenly give your group new abilities or plans that it didn’t have last month. I’ll elide the details because it’s a political debate, but just last week, I saw a study that whenever one problem got lots of media attention, the “solutions” people tried wound up making the problem worse the next year. High salience is an important tool, but nowhere near sufficient, and can even be outright counterproductive.
Hey! Thanks for writing all of this up. A few questions, in no particular order:
The CFAR fundraiser page says that CFAR “search[es] through hundreds of hours of potential curricula, and test[s] them on smart, caring, motivated individuals to find the techniques that people actually end up finding useful in the weeks, months and years after our workshops.” Could you give a few examples of curricula that worked well, and curricula that worked less well? What kind of testing methodology was used to evaluate the results, and in what ways is that methodology better (or worse) than methods used by academic psychologists?
One can imagine a scale for the effectiveness of training programs. Say, 0 points is a program where you play Minesweeper all day; and 100 points is a program that could take randomly chosen people, and make them as skilled as Einstein, Bismarck, or von Neumann. Where would CFAR rank its workshops on this scale, and how much improvement does CFAR feel like there has been from year to year? Where on this scale would CFAR place other training programs, such as MIT grad school, Landmark Forum, or popular self-help/productivity books like Getting Things Done or How to Win Friends and Influence People? (One could also choose different scale endpoints, if mine are too suboptimal.)
While discussing goals for 2015, you note that “We created a metric for strategic usefulness, solidly hitting the first goal; we started tracking that metric, solidly hitting the second goal.” What does the metric for strategic usefulness look like, and how has CFAR’s score on the metric changed from 2012 through now? What would a failure scenario (ie. where CFAR did not achieve this goal) have looked like, and how likely do you think that failure scenario was?
CFAR places a lot of emphasis on “epistemic rationality”, or the process of discovering truth. What important truths have been discovered by CFAR staff or alumni, which would probably not have been discovered without CFAR, and which were not previously known by any of the staff/alumni (or by popular media outlets)? (If the truths discovered are sensitive, I can post a GPG public key, although I think it would be better to openly publish them if that’s practical.)
You say that “As our understanding of the art grew, it became clear to us that “figure out true things”, “be effective”, and “do-gooding” weren’t separate things per se, but aspects of a core thing.” Could you be more specific about what this caches out to in concrete terms; ie. what the world would look like if this were true, and what the world would look like if this were false? How strong is the empirical evidence that we live in the first world, and not the second? Historically, adjusted for things we probably can’t change (like eg. IQ and genetics), how strong have the correlations been between truth-seeking people like Einstein, effective people like Deng Xiaoping, and do-gooding people like Norman Borlaug?
How many CFAR alumni have been accepted into Y Combinator, either as part of a for-profit or a non-profit team, after attending a CFAR workshop?
Good to ask, but I’m not sure what it would be. The code is just a linear regression I did in a spreadsheet, and eyeballing the data points, it doesn’t look like there are any patterns that a regression is missing. I tried it several different ways (comparing to different smaller models, comparing to averages of smaller models, excluding extreme values, etc.) and the correlation was always zero. Here’s the raw data:
It’s hard to know if there is some critical bug in all the results being reported in the Gopher, Chinchilla, and PaLM papers, since we don’t have access to the models, but it would surprise me if no one caught that across multiple independent teams.
Eliezer’s writeup on corrigibility has now been published (the posts below by “Iarwain”, embedded within his new story Mad Investor Chaos). Although, you might not want to look at it if you’re still writing your own version and don’t want to be anchored by his ideas.
Although I largely agree, there is little actual experimental support for Maslow’s theory. He mostly just made it up. See http://lesswrong.com/lw/2j/schools_proliferating_without_evidence/ . See also eg.:
“The uncritical acceptance of Maslow’s need hierarchy theory despite the lack of empirical evidence is discussed and the need for a review of recent empirical evidence is emphasized. A review of ten factor-analytic and three ranking studies testing Maslow’s theory showed only partial support for the concept of need hierarchy. A large number of cross-sectional studies showed no clear evidence for Maslow’s deprivation/domination proposition except with regard to self-actualization. Longitudinal studies testing Maslow’s gratification/activation proposition showed no support, and the limited support received from cross-sectional studies is questionable due to numerous measurement problems.”
from “Maslow reconsidered: A review of research on the need hierarchy theory”, by Wahba and Bridwell.
“If you don’t teach your children the One True Religion, you’re a lousy parent.”
Given that the One True Religion is actually correct, wouldn’t you, in fact, be a lousy parent if you did not teach it? Someone who claims to be a Christian and yet doesn’t teach their kids about Christianity is, under their incorrect belief system, condemning them to an eternity of torture, which surely qualifies as being a lousy parent in my book.
EDIT: This comment described a bunch of emails between me and Leverage that I think would be relevant here, but I misremembered something about the thread (it was from 2017) and I’m not sure if I should post the full text so people can get the most accurate info (see below discussion), so I’ve deleted it for now. My apologies for the confusion
Was including tech support under “admin/moderation”—obviously, ability to eg. IP ban people is important (along with access to the code and the database generally). Sorry for any confusion.
“I had not remembered, until that time, how the Roman Empire rose, and brought peace and order, and lasted through so many centuries, until I forgot that things had ever been otherwise; and yet the Empire fell, and barbarians overran my city, and the learning that I had possessed was lost. The modern world became more fragile to my eyes; it was not the first modern world.”
I think the Romans, at least the more philosophical and intellectual ones, were perfectly well aware that this would happen to them eventually. After the fall of Carthage:
“Scipio, when he looked upon the city as it was utterly perishing and in the last throes of its complete destruction, is said to have shed tears and wept openly for his enemies. After being wrapped in thought for long, and realizing that all cities, nations, and authorities must, like men, meet their doom; that this happened to Ilium, once a prosperous city, to the empires of Assyria, Media, and Persia, the greatest of their time, and to Macedonia itself, the brilliance of which was so recent, either deliberately or the verses escaping him, he said:
A day will come when sacred Troy shall perish,
And Priam and his people shall be slain.
And when Polybius speaking with freedom to him, for he was his teacher, asked him what he meant by the words, they say that without any attempt at concealment he named his own country, for which he feared when he reflected on the fate of all things human. Polybius actually heard him and recalls it in his history.”—Appian, Punica
Upvote this post if you support moving it back to 50.
“As far as how to optimally distribute money to charity, that is very much an unsolved problem, but I think it’s one that we can mostly worry about when we get that far.”
I like the rest of your proposal, but I seriously think we need to look more carefully at this part. Once a billion dollars is already on the line, it’s worthwhile for large charities that won’t do very much good to spend $100M on marketing for a 12% chance at getting it, which does no one any good at all (except the marketing companies). If we make the decision beforehand- even if it is completely arbitrary (eg., we take all the charities recommended by GiveWell and put them on a giant roulette wheel)- then charities won’t spend large amounts of money competing amongst themselves for the money, which would defeat the original purpose.
Since this blog post was made, more than 25 donors have donated a total of more than $1,500 to the Singularity Research Challenge. I would like to thank each and every one of you for your generosity.
EDIT: It’s now 35 donors and $3,000.
EDIT 2: It’s now 40 donors and $3,500.
EDIT 3: It’s now 45 donors and $4,500.
Vote up this comment if you would be most likely to read an academic paper, downloadable over the Internet as a PDF.
Fantastic post! I agree with most of it, but I notice that Eliezer’s post has a strong tone of “this is really actually important, the modal scenario is that we literally all die, people aren’t taking this seriously and I need more help”. More measured or academic writing, even when it agrees in principle, doesn’t have the same tone or feeling of urgency. This has good effects (shaking people awake) and bad effects (panic/despair), but it’s a critical difference and my guess is the effects are net positive right now.