# Jsevillamol

Karma: 458
• Thank you for the feedback, I think what you say makes sense.

I’d be interested in seeing whether we can pin down exactly in what sense are Switch parameters “weaker”. Is it because of the lower precision? Model sparsity (is Switch sparse on parameters or just sparsely activated?)?

What do you think, what typology of parameters would make sense /​ be useful to include?

# [Question] Pa­ram­e­ter count of ML sys­tems through time?

19 Apr 2021 12:54 UTC
31 points
• re: “I’d expect experts to care more about the specific details than I would”

Good point. We tried to account for this by making it so that the experts do not have to agree or disagree directly with each sentence but instead choose the least bad of two extreme positions.

But in practice one of the experts bypassed the system by refusing to answer Q1 and Q2 and leaving an answer in the space for comments.

• Street fighting math:

Let’s model experts as independent draws of a binary random variable with a bias $P$. Our initial prior over their chance of choosing the pro-uniformity option (ie $P$) is uniform. Then if our sample is $A$ people who choose the pro-uniformity option and $B$ people who choose the anti-uniformity option we update our beliefs over $P$ to a $Beta(1+A,1+B)$, with the usual Laplace’s rule calculation.

To scale this up to eg a $n$ people sample we compute the mean of $n$ independent draws of a $Bernoilli(P)$, where $P$ is drawn from the posterior Beta. By the central limit theorem is approximately a normal of mean $P$ and variance equal to the variance of the bernouilli divided by $n$ ie $\{1}{n}P(1-P)$.

We can use this to compute the approximate probability that the majority of experts in the expanded sample will be pro-uniformity, by integrating the probability that this normal is greater than $1/​2$ over the possible values of $P$.

So for example we have $A=1$, $B=3$ in Q1, so for a survey of $n=100$ participants we can approximate the chance of the majority selecting option $A$ as:

import scipy.stats as stats
import numpy as np

A = 1
B = 3
n = 100

b = stats.beta(A+1,B+1)
np.mean([(1 - survey_dist.cdf(1/2)) * b.pdf(p)
for p in np.linspace(0.0001,0.9999,10000)
for survey_dist in (stats.norm(loc = p, scale = np.sqrt(p*(1-p)/n)),)])

which gives about $0.19$.

For Q2 we have $A=1$, $B=4$, so the probability of the majority selecting option $A$ is about $0.12$.

For Q3 we have $A=6$, $B=0$, so the probability of the majority selecting option $A$ is about $0.99$.

EDIT: rephrased the estimations so they match the probability one would enter in the Elicit questions

# Sur­vey on cor­ti­cal unifor­mity—an ex­pert am­plifi­ca­tion exercise

23 Feb 2021 22:13 UTC
37 points
• re: impotance of oversight

I do not think we really disagree on this point. I also believe that looking at the state of the computer is not as important as having an understanding of how the program is going to operate and how to shape its incentives.

re: How quantum computing will affect ML

I basically agree that the most plausible way QC can affect AI aligment is by providing computational speedups—but I think this mostly changes the timelines rather than violating any specific assumptions in usual AI alignment research.

Relatedly, I am bullish that we will see better than quadratic speedups (ie Grover) - to get better-than-quadratic speedups you need to surpass many challenges that right now it is not clear can be surpassed outside of very contrived problem setup [REF].

In fact I think that the speedups will not even be quadratic because you “lose” the quadratic speedup when parallelizing quantum computing (in the sense that the speedup does not scale quadratically with the number of cores).

• Suggestion 1: Utility != reward by Vladimir Mikulik. This post attempts to distill the core ideas of mesa alignment. This kind of distillment increases the surface area of AI Alignment, which is one of the key bottlenecks of the area (that is, getting people familiarized with the field, motivated to work on it and with a handle on some open questions to work on). I would like an in-depth review because it might help us learn how to do it better!

Suggestion 2: me and my coauthor Pablo Moreno would be interested in feedback in our post about quantum computing and AI alignment. We do not think that the ideas of the paper are useful in the sense of getting us closer to AI alignment, but I think it is useful to have signpost explaining why avenues that might seem attractive to people coming into the field are not worth exploring, while introducing them to the field in a familiar way (in this case our audience are quantum computing experts). One thing that confuses me is that some people have approached me after publishing the post asking me why I think that quantum computing is useful for AI alignment, so I’d be interested in feedback on what went wrong on the communication process given the deflationary nature of the article.

• Amazing initiative John—you might give yourself a D but I am giving you an A+ no doubt.

Trying to decide if I should recommend this to my family.

In Spain, we have 18000 confirmed COVID cases in January 2021. I assume real cases are at least 20000. Some projections estimate that laypeople might not get vaccinated in 10 months, so the potential benefit of a widespread DIY vaccine is avoiding 200k cases of COVID19 (optimistically assuming linear growth of cases).

Spain pop is 47 million, so the naïve chance of COVID for an individual before vaccines are widely available is 2e4*10 /​ 5e6 ie about 1 in 250.

Let’s say that the DIY vaccine has 10% chance of working on a givne individual. If we take the side effects of the vaccine to be as bad as catching COVID19 itself, then I want the chances of a serious side effect to be lower than 1 in 2500 for the DIY vaccine to be worth it.

Taking into account the risk of preparing it incorrectly plus general precaution, the chances of a serious side effect look to me more like 1 in 100 than 1 in 1000.

So I do not think, given my beliefs, that I should recommend it. Is this reasoning broadly correct? What is a good baseline for the chances of a side effect in a new peptide vaccine?

• This post is great! I love the visualizations. And I hadn’t made the explicit connection between iterated convolution and CLT!

# [Question] Cri­tiques of the Agent Foun­da­tions agenda?

24 Nov 2020 16:11 UTC
15 points
• I don’t think so.

What I am describing is an strategy to manage your efforts in order to spend as little as possible while still meeting your goals (when you do not know in advance how much effort will be needed to solve a given problem).

So presumably if this heuristic applies to the problems you want to solve, you spend less on each problem and thus you’ll tackle more problems in total.

# Spend twice as much effort ev­ery time you at­tempt to solve a problem

15 Nov 2020 18:37 UTC
48 points
• I think this helped me a lot understand you a bit better—thank you

Let me try paraphrasing this:

> Humans are our best example of a sort-of-general intelligence. And humans have a lazy, satisfying, ‘small-scale’ kind of reasoning that is mostly only well suited for activities close to their ‘training regime’. Hence AGIs may also be the same—and in particular if AGIs are trained with Reinforcement Learning and heavily rewarded for following human intentions this may be a likely outcome.

Is that pointing in the direction you intended?

1. Break the door with your shoulders

2. Use the window

3. Break the wall with your fists

4. Scream for help until somebody comes

5. Call a locksmith

6. Light up a paper and trigger the smoke alarm and wait for the firemen to rescue you

7. Hide in the closet and wait for your captors to come back—then run for your life

8. Discover how to time travel—time travel forward into the future until there is no room

9. Wait until the house becomes old and crumbles

10. Pick the lock with a paperclip

11. Shred the bed into a string, pass it through the pet door, lasso the lock and open it

12. Google how to make a bomb and blast the wall

13. Open the door

14. Wait for somebody to pass by, attract their attention hitting the window and ask for help writing on a notepad

15. Write your location in a paper and slide it under the door, hoping it will find its way to someone who can help

16. Use the vents

17. Use that handy secret door you built it a while ago and your wife called you crazy for doing so

18. Send a message through the internet asking for help

19. Order a pizza, ask for help when they arrive

20. Burn the door

21. Melt the door with a smelting tool

22. Shoot at the lock with a gun

23. Push against the door until you quantum tunnel through it

24. Melt the lock with the Breaking Bad melting lock stuff (probably google that first)

25. There is no door—overcome your fears and cross the emptyness

26. Split your matress in half with a kitchen knife, fit the split mattress through the window to make a landing spot and jump into it

27. Make a paper plane with instructions for someone to help and throw it out of the window

28. Make a rope with your duvet and slide yourself down to the street

29. Make a makeshift glider with your duvet and jump out of the window—hopefully it will slow you down enough to not die

30. Climb out of the window and into the next room

31. Dig the soil under the door until you can fit through

33. Break the window with a chair and climb outside

34. Grow a tree under the door and let it lift the door for you

35. Use a clothe hanger to slide through the clothing line between your building and your neighbourg’s. Apologize to the neightbour for disrupting their sleep.

36. Hit the ceiling with a broom to make the house rate come out. Attach a message to them and send them back into their hole, and to your neighbour

37. Meditate until somebody opens the door

38. Train your flexibility for years until you fit through the dog door

39. Build a makeshift ariete with the wooden frame of the bed

40. Unmont the hinges with a scredriver and remove the door

41. Try random combinations until you find the password

42. Look for the key over the door frame

43. Collect dust and blow it over the numpad. The dust collects over the three most greasy digits. Try the 6 possible combinations until the door opens.

44. Find the model number of the lock. Call the fabricator pretending to be the owner. Wait five minutes while listening to waiting music. Explain you are locked. Realize you are talking to an automated receiver. Ask to talk with a real person. Explain you ae locked. Follow all instructions.

45. Do not be in the room in the first place

46. Try figuring out if you really need to escape in the first place

47. Swap consciosuness with the other body you left outside the room

48. Complain to your captor that the room is too small and you are claustrophobic. Hope they are understanding.

49. Pretend to have a hearth attack, wait for your captor to carry you outside

50. Check out ideas on how to escape in the lesswrong bable challenge

• Let me try to paraphrase this:

In the first paragraph you are saying that “seeking influence” is not something that a system will learn to do if that was not a possible strategy in the training regime. (but couldn’t it appear as an emergent property? Certainly humans were not trained to launch rockets—but they nevertheless did?)

In the second paragraph you are saying that common sense sometimes allows you to modify the goals you were given (but for this to apply to AI ststems, wouldn’t they need have common sense in the first place, which kind of assumes that the AI is already aligned?)

In the third paragraph it seems to me that you are saying that humans have some goals that have an built-in override mechanism in them—eg in general humans have a goal of eating delicious cake, but they will forego this goal in the interest of seeking water if they are about ot die of dehydratation (but doesn’t this seem to be a consequence of these goals being just instrumental things that proxy the complex thing that humans actually care about?)

I think I am confused because I do not understand your overall point, so the three paragraphs seem to be saying wildly different things to me.

• I notice I am surprised you write

However, the link from instrumentally convergent goals to dangerous influence-seeking is only applicable to agents which have final goals large-scale enough to benefit from these instrumental goals

and not address the “Riemman disaster” or “Paperclip maximizer” examples [1]

• Riemann hypothesis catastrophe. An AI, given the final goal of evaluating the Riemann hypothesis, pursues this goal by transforming the Solar System into “computronium” (physical resources arranged in a way that is optimized for computation)— including the atoms in the bodies of whomever once cared about the answer.

• Paperclip AI. An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacture of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips.

Do you think that the argument motivating these examples is invalid?

Do you disagree with the claim that even systems with very modest and specific goals will have incentives to seek influence to perform their tasks better?

• Thank you for pointing this out!

I have a sense that that log-odds are an underappreciated tool, and this makes me excited to experiment with them more—the “shared and distinct bits of evidence” framework also seems very natural.

On the other hand, if the Goddess of Bayesian evidence likes log odds so much, why did she make expected utility linear on probability? (I am genuinely confused about this)

# Ag­gre­gat­ing forecasts

23 Jul 2020 18:04 UTC
14 points