Vivek Hebbar

Karma: 881

Vivek Hebbar 8 Sep 2021 0:23 UTC
LW: 2 AF: 2
AF
on: The ground of optimization
Is a metal bar an optimizer? Looking at the temperature distribution, there is a clear set of target states (states of uniform temperature) with a much larger basin of attraction (all temperature distributions that don’t vaporize the bar).
I suppose we could consider the second law of thermodynamics to be the true optimizer in this case. The consequence is that any* closed physical system is trivially an optimizing system towards higher entropy.
In general, it seems like this optimization criterion is very easy to satisfy if we don’t specify what exactly we care about as a meaningful aspect of the system. Even the bottle cap ‘optimizes’ for trivial things like maintaining its shape (against the perturbation of elastic deformation).
Do you think this will become a problem when using this definition for AI? For example, we might find that a particular program incidentally tends to ‘optimize’ certain simple measures such as the average magnitude of network weights, or some other functions of weights, loss, policy, etc. to a set point/range. We may then find slightly more complex things being optimized that look like sub-goals (which could in a certain context be unwanted or dangerous). How would we know where to draw the line? It seems like the definition would classify lots of things as optimization, and it would be up to us to decide which kinds are interesting or concerning and which ones are as trivial as the bottle cap maintaining its shape.
That being said, I really like this definition. I just think it should be extended to classify the interestingness of a given optimization. An AI agent which competently pursues complex goals is a much more interesting optimizer than a metal bar, even though the bar seems more robust (deleting a tiny piece of metal won’t stop it from conducting; deleting a tiny piece in the AI’s computer could totally disable it).
Also a nitpick on the section about whether the universe is an optimizing system:
I don’t think it is correct to say that the target space is almost as big as the basin of attraction. Either:
- We use area to represent the number of macroscopic states—in this case, the target space is extremely small (one state only(?) -- an ultra-low-density bath of particles with uniform temperature). The universe is an extremely powerful optimizer from this perspective, with the caveat that it takes almost forever to achieve its target.
- We use area to represent the number of microscopic states (as I think you intended). In this case, I think the target space is exactly identical to the basin of attraction. Low entropy microstates are not any less likely than high entropy microstates—there just happen to be astronomically fewer of them. There is no ‘optimizing force’ pushing the universe out of these states. From the microstate perspective, there is no reason to exclude them from the target zone, since any small and unremarkable subset of the target space will display the property that the system tends to stumble out of it at random.
I would say that the first lens is almost always better than the second, since macro-states are what we actually care about and how we naturally divide the configuration space of a system.
Finally, just want to say this is an amazing post! I love the style as well as the content. The diagrams make it really easy to get an intuitive picture.
*Unsure about the existence of exceptions (can an isolated system be contrived that fails to reach the global max for entropy?)

Vivek Hebbar 18 Sep 2021 2:19 UTC
3 points
on: Jitters No Evidence of Stupidity in RL
The random jittering reminds me of the random movements of the stock market: As new information trickles in, the estimate of the optimal point jitters around noisily, rather than following a smooth trajectory. If the value being estimated is Utility(action A) - Utility(action B), then we would expect the agent to jitter between the two actions when the estimate is near zero, like some sort of random walk repeatedly crossing the axis.

Vivek Hebbar 15 Oct 2021 7:55 UTC
1 point
on: Individual Rationality Needn’t Generalize to Rational Consensus
The obvious solution is to use probabilities rather than absolute judgements of true/false. Although we still have the issue that in general the average of two products is different from the product of two averages. This inconsistency is much smaller though, and can be dealt with a more nuanced calculation (accounting for the possibly correlated distributions behind the point estimates) if absolutely necessary.

Vivek Hebbar 12 Nov 2021 12:30 UTC
LW: 4 AF: 1
AF
on: Discussion with Eliezer Yudkowsky on AGI interventions
What probability do you assign to the proposition “Prosaic alignment will fail”?
1. Purely based on your inside view model
2. After updating on everyone else’s views
Same question for:
“More than 50% of the prosaic alignment work done by the top 7 researchers is nearly useless”

Vivek Hebbar 12 Nov 2021 22:47 UTC
8 points
in reply to: benjamincosman’s comment on: Discussion with Eliezer Yudkowsky on AGI interventions
Presumably a Bayesian reasoner using expected value would never reach max utility, because there would always be a non-zero probability that the goal hasn’t been achieved, and the course of action which increases its success estimate from 99.9999% to 99.99999999% probably involves turning part of the universe into computronium.

Vivek Hebbar 12 Nov 2021 23:07 UTC
5 points
in reply to: AVoropaev’s comment on: Discussion with Eliezer Yudkowsky on AGI interventions
Taking over the lightcone is the default behavior. If you can create an AGI which doesn’t do this, you’ve already figured out how to put some constraint on its activities. Notably, not destroying the lightcone implies that the AGI doesn’t create other AGIs which go off and destroy the lightcone.

Vivek Hebbar 18 Nov 2021 6:50 UTC
1 point
in reply to: Unknown’s comment on: A Premature Word on AI
If you sum over an infinite number of worlds and weight them using a reasonable simplicity measure (like description length), this shouldn’t be a problem.

Vivek Hebbar 18 Nov 2021 11:14 UTC
2 points
in reply to: Eliezer Yudkowsky’s comment on: A positive case for how we might succeed at prosaic AI alignment
This seems like a very important crux—maybe there should be a scheduled debate on this?

Vivek Hebbar 22 Nov 2021 6:23 UTC
7 points
in reply to: Dave Orr’s comment on: The Meta-Puzzle
My solution (rot13′d): “Vs lbh nfxrq zr vs V nz fvatyr, V jbhyq fnl lrf”

Vivek Hebbar 29 Nov 2021 0:51 UTC
1 point
in reply to: nostalgebraist’s comment on: human psycholinguists: a critical appraisal
Maybe add a disclaimer at the start of the post?

Vivek Hebbar 9 Feb 2022 23:28 UTC
6 points
in reply to: shminux’s comment on: Epistemic Legibility
Did you mean level 2 for the CDC?

Vivek Hebbar 18 Feb 2022 6:05 UTC
1 point
on: Create a prediction market in two minutes on Manifold Markets
Does anyone know why this is a thing:
Why is the “Payout at 91%” displayed as $99 instead of 0.91*102 ~= $93 (or lower if 4% is taxed to pay out the question creator)?

Vivek Hebbar 18 Feb 2022 6:06 UTC
1 point
in reply to: Vivek Hebbar’s comment on: Create a prediction market in two minutes on Manifold Markets
Great platform btw, I’m having a lot of fun with it!

Vivek Hebbar 24 Feb 2022 1:31 UTC
LW: 1 AF: 1
AF
in reply to: gwern’s comment on: Transformer inductive biases & RASP
Nice! Do you know if the author of that post was involved in RASP?

Vivek Hebbar 24 Feb 2022 1:34 UTC
LW: 1 AF: 1
AF
in reply to: Vivek Hebbar’s comment on: Transformer inductive biases & RASP
Checked; the answer is no: https://www.lesswrong.com/posts/Lq6jo5j9ty4sezT7r/teaser-hard-coding-transformer-models?commentId=ET24eiKK6FSJNef7G

Vivek Hebbar 24 Feb 2022 1:36 UTC
5 points
in reply to: MadHatter’s comment on: Teaser: Hard-coding Transformer Models
Any update on this (applying for funding)?

Vivek Hebbar 24 Feb 2022 7:59 UTC
3 points
on: Do, Then Think
Exception: If there is left tail-risk, then think first.

Vivek Hebbar 26 Feb 2022 0:23 UTC
−1 points
on: Seeking models of LW’s aversion to religion
“Many religions claim things that are straightforwardly false (e.g., “Jesus physically rose from the dead.”)”
False with what probability?

Vivek Hebbar 27 Mar 2022 12:17 UTC
LW: 3 AF: 3
AF
in reply to: Morten Hustveit’s comment on: When people ask for your P(doom), do you give them your inside view or your betting odds?
Oh, this is definitely not what I meant.
“Betting odds” == Your actual belief after factoring in other people’s opinions
“Inside view” == What your models predict, before factoring in other opinions or the possibility of being completely wrong

Vivek Hebbar 7 Apr 2022 8:37 UTC
LW: 2 AF: 1
AF
in reply to: Thomas Kwa’s comment on: [Link] A minimal viable product for alignment
Is the claim here that the 2^200 “persuasive ideas” would actually pass the scrutiny of top human researchers (for example, Paul Christiano studies one of them for a week and concludes that it is probably a full solution)? Or do you just mean that they would look promising in a shorter evaluation done for training purposes?