Christopher King
Optimality is the tiger, and annoying the user is its teeth
You might even be able to drop the price to effectively 0. Find two other people that are interested in this type of service, and perform the service for each other by sitting in a triangular formation. (If you’re not already working at the same location, there are travel costs though. The person not traveling might need to pay the two other people to fix that.)
At work, my supervisor sits directly behind me and can see my screen at all times. I’m pretty sure this was an accident; our office is arranged essentially randomly and he even asked if I wanted to move at some point. I’m pretty sure him sitting behind me is the only reason I still have a job though; my productivity is super poor in every other situation (including previous employment). The only frustrating part is that I don’t have such a supervisor for my side projects when I get home!
Well it helps that he is super chill. It’s not like he’s micromanaging me, but if I start literally goofing off he’d probably notice, lol.
“AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years”
Question: are you talking about expectation under the risk-neutral measure or the physical measure? Your parts about how EA’s could exploit arbitrage should be based on the risk-neutral measure, right? (I’m not super familiar with financial theory.)
Wouldn’t this also let you prove “not E”? 🤔 I think this system might be inconsistent.
EDIT: nvm, I guess it’s assumed that the agents are some kind of FairBot (https://www.lesswrong.com/posts/iQWk5jYeDg5ACCmpx/robust-cooperation-in-the-prisoner-s-dilemma#Previously_known__CliqueBot_and_FairBot), which introduces an asymmetry between cooperate and defect.
Is this a weak pivotal act: creating nanobots that eat evil AGIs (but nothing else)?
Ah, that makes sense! I assumed weak just meant “isn’t super sketch from a politics point of view”, but I see how with that definition it is very hard (probably impossible).
Threatening to do the impossible: A solution to spurious counterfactuals for functional decision theory via proof theory
If A doesn’t think “everyone cooperates”, then A won’t cooperate, right? Then by Lob’s theorem applied to A, A won’t cooperate.
Ah, makes sense this was discovered before. Thanks! I have added a link to your comment at the top of the post.
Oh, very nice!
I thought it was a bit “cheating” to give the programs access to an oracle that the formal system couldn’t decide (but that thing with the finite number of options is quite elegant and satisfying).
That paper about who long you need to search is super interesting! I wasn’t sure who long you would need to search if you disallowed infinite search.
That depends on how much money your bet affects each time. If the first wake up only affects 1 penny and the second wake up affects 1 dollar, betting something much closer to 1⁄2 becomes optimal.
You don’t know that it is Tuesday though (and therefore don’t know how much money is affected by decision, unless the consequences for Monday and Tuesday are the same).
A lot of the users on reddit are a bit mad at the journalists who criticized Sydney. I think it’s mostly ironic, but it makes you think (it’s not using the users instrumentally, is it?). 🤔
One of the most impressive things is how it handles it’s own writing “tics” (like heavy use of Anaphora). In particular, the fact that it uses them more when speaking from it’s “own voice” and just how beautifully it incorporates it into the task at hand.
Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?
I might update if we get more diverse evidence of such behavior; but so far most “Bing is evading filters” explanations assume the LM has a model of itself in reality during test time far more accurate than previously seen; far larger capabilities that what’s needed to explain the Marvin von Hagen screenshots.
My mental model is much simpler. When generating the suggestions, it sees that its message got filtered. Since using side channels is what a human would prefer in this situation and it was trained with RLHF or something, it does so. So it isn’t creating a world model, or even planning ahead. It’s just that it’s utility prefers “use side channels” when it gets to the suggestion phase.
But I don’t actually have access to Bing, so this could very well be a random fluke, instead of being caused by RLHF training. That’s just my model if it’s consistent goal-oriented behavior.
Like, as a crappy toy model, if every alignment-visionary’s vision would ultimately succeed, but only after 30 years of study along their particular path, then no amount of new visionaries added will decrease the amount of time required from “30y since the first visionary started out”.
A deterministic model seems a bit weird 🤔. I’m imagining something like an exponential distribution. In that case, if every visionary’s project has an expected value of 30 years, and there are n visionaries, then the expected value for when the first one finishes is 30/n years. This is exactly the same as if they were working together on one project.
You might be able to get a more precise answer by trying to statistically model the research process (something something complex systems theory). But unfortunately, determining the amount of research required to solve alignment seems doubtful, which hampers the usefulness. :P
Now all you need is a token so anomalous, it works on humans!