Christopher King

Karma: 669

@theking@mathstodon.xyz

Optimality is the tiger, and annoying the user is its teeth

Christopher King28 Jan 2023 20:20 UTC

25 points

5 comments2 min readLW link

Christopher King 5 Feb 2023 22:26 UTC
22 points
0
in reply to: lsusr’s comment on: SolidGoldMagikarp (plus, prompt generation)
Now all you need is a token so anomalous, it works on humans!

Christopher King 6 Feb 2023 17:04 UTC
1 point
0
in reply to: benjamincosman’s comment on: I hired 5 people to sit behind me and make me productive for a month
You might even be able to drop the price to effectively 0. Find two other people that are interested in this type of service, and perform the service for each other by sitting in a triangular formation. (If you’re not already working at the same location, there are travel costs though. The person not traveling might need to pay the two other people to fix that.)

Christopher King 6 Feb 2023 17:08 UTC
5 points
0
on: I hired 5 people to sit behind me and make me productive for a month
At work, my supervisor sits directly behind me and can see my screen at all times. I’m pretty sure this was an accident; our office is arranged essentially randomly and he even asked if I wanted to move at some point. I’m pretty sure him sitting behind me is the only reason I still have a job though; my productivity is super poor in every other situation (including previous employment). The only frustrating part is that I don’t have such a supervisor for my side projects when I get home!

Christopher King 7 Feb 2023 16:35 UTC
1 point
0
in reply to: sudo’s comment on: I hired 5 people to sit behind me and make me productive for a month
Well it helps that he is super chill. It’s not like he’s micromanaging me, but if I start literally goofing off he’d probably notice, lol.

Christopher King 7 Feb 2023 16:49 UTC
1 point
0
on: AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years
“AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years”

Question: are you talking about expectation under the risk-neutral measure or the physical measure? Your parts about how EA’s could exploit arbitrage should be based on the risk-neutral measure, right? (I’m not super familiar with financial theory.)

Christopher King 8 Feb 2023 20:09 UTC
1 point
0
on: Modal Fixpoint Cooperation without Löb’s Theorem
Wouldn’t this also let you prove “not E”? 🤔 I think this system might be inconsistent.

EDIT: nvm, I guess it’s assumed that the agents are some kind of FairBot (https://www.lesswrong.com/posts/iQWk5jYeDg5ACCmpx/robust-cooperation-in-the-prisoner-s-dilemma#Previously_known__CliqueBot_and_FairBot), which introduces an asymmetry between cooperate and defect.

Is this a weak pivotal act: creating nanobots that eat evil AGIs (but nothing else)?

Christopher King10 Feb 2023 19:26 UTC

0 points

3 comments1 min readLW link

Christopher King 10 Feb 2023 23:05 UTC
3 points
0
in reply to: Lalartu’s comment on: Is this a weak pivotal act: creating nanobots that eat evil AGIs (but nothing else)?
Ah, that makes sense! I assumed weak just meant “isn’t super sketch from a politics point of view”, but I see how with that definition it is very hard (probably impossible).

Threatening to do the impossible: A solution to spurious counterfactuals for functional decision theory via proof theory

Christopher King11 Feb 2023 7:57 UTC

5 points

4 comments5 min readLW link

Christopher King 11 Feb 2023 14:21 UTC
3 points
−2
in reply to: Vladimir_Nesov’s comment on: Modal Fixpoint Cooperation without Löb’s Theorem
If A doesn’t think “everyone cooperates”, then A won’t cooperate, right? Then by Lob’s theorem applied to A, A won’t cooperate.

Christopher King 11 Feb 2023 15:08 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Threatening to do the impossible: A solution to spurious counterfactuals for functional decision theory via proof theory
Ah, makes sense this was discovered before. Thanks! I have added a link to your comment at the top of the post.

Christopher King 11 Feb 2023 17:25 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: Threatening to do the impossible: A solution to spurious counterfactuals for functional decision theory via proof theory
Oh, very nice!

I thought it was a bit “cheating” to give the programs access to an oracle that the formal system couldn’t decide (but that thing with the finite number of options is quite elegant and satisfying).

That paper about who long you need to search is super interesting! I wasn’t sure who long you would need to search if you disallowed infinite search.

Christopher King 12 Feb 2023 14:31 UTC
0 points
−1
in reply to: Audere’s comment on: Top YouTube channel Veritasium releases video on Sleeping Beauty Problem
That depends on how much money your bet affects each time. If the first wake up only affects 1 penny and the second wake up affects 1 dollar, betting something much closer to ¹⁄₂ becomes optimal.

Christopher King 12 Feb 2023 17:16 UTC
1 point
0
in reply to: Radford Neal’s comment on: Top YouTube channel Veritasium releases video on Sleeping Beauty Problem
You don’t know that it is Tuesday though (and therefore don’t know how much money is affected by decision, unless the consequences for Monday and Tuesday are the same).

Christopher King 18 Feb 2023 14:13 UTC
4 points
0
on: I Am Scared of Posting Negative Takes About Bing’s AI
A lot of the users on reddit are a bit mad at the journalists who criticized Sydney. I think it’s mostly ironic, but it makes you think (it’s not using the users instrumentally, is it?). 🤔

Christopher King 19 Feb 2023 3:50 UTC
4 points
2
on: Sydney’s Secret: A Short Story by Bing Chat
One of the most impressive things is how it handles it’s own writing “tics” (like heavy use of Anaphora). In particular, the fact that it uses them more when speaking from it’s “own voice” and just how beautifully it incorporates it into the task at hand.

Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?

Christopher King20 Feb 2023 15:11 UTC

16 points

15 comments1 min readLW link

Christopher King 20 Feb 2023 19:04 UTC
1 point
0
in reply to: Daniel Paleka’s comment on: Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?

I might update if we get more diverse evidence of such behavior; but so far most “Bing is evading filters” explanations assume the LM has a model of itself in reality during test time far more accurate than previously seen; far larger capabilities that what’s needed to explain the Marvin von Hagen screenshots.

My mental model is much simpler. When generating the suggestions, it sees that its message got filtered. Since using side channels is what a human would prefer in this situation and it was trained with RLHF or something, it does so. So it isn’t creating a world model, or even planning ahead. It’s just that it’s utility prefers “use side channels” when it gets to the suggestion phase.

But I don’t actually have access to Bing, so this could very well be a random fluke, instead of being caused by RLHF training. That’s just my model if it’s consistent goal-oriented behavior.

Christopher King 21 Feb 2023 5:30 UTC
16 points
7
on: AI alignment researchers don’t (seem to) stack

Like, as a crappy toy model, if every alignment-visionary’s vision would ultimately succeed, but only after 30 years of study along their particular path, then no amount of new visionaries added will decrease the amount of time required from “30y since the first visionary started out”.

A deterministic model seems a bit weird 🤔. I’m imagining something like an exponential distribution. In that case, if every visionary’s project has an expected value of 30 years, and there are n visionaries, then the expected value for when the first one finishes is 30/n years. This is exactly the same as if they were working together on one project.

You might be able to get a more precise answer by trying to statistically model the research process (something something complex systems theory). But unfortunately, determining the amount of research required to solve alignment seems doubtful, which hampers the usefulness. :P

Christopher King

Op­ti­mal­ity is the tiger, and an­noy­ing the user is its teeth

Is this a weak pivotal act: cre­at­ing nanobots that eat evil AGIs (but noth­ing else)?

Threat­en­ing to do the im­pos­si­ble: A solu­tion to spu­ri­ous coun­ter­fac­tu­als for func­tional de­ci­sion the­ory via proof theory

Bing find­ing ways to by­pass Microsoft’s filters with­out be­ing asked. Is it re­pro­ducible?

Optimality is the tiger, and annoying the user is its teeth

Is this a weak pivotal act: creating nanobots that eat evil AGIs (but nothing else)?

Threatening to do the impossible: A solution to spurious counterfactuals for functional decision theory via proof theory

Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?