Christopher King
You might even be able to drop the price to effectively 0. Find two other people that are interested in this type of service, and perform the service for each other by sitting in a triangular formation. (If you’re not already working at the same location, there are travel costs though. The person not traveling might need to pay the two other people to fix that.)
At work, my supervisor sits directly behind me and can see my screen at all times. I’m pretty sure this was an accident; our office is arranged essentially randomly and he even asked if I wanted to move at some point. I’m pretty sure him sitting behind me is the only reason I still have a job though; my productivity is super poor in every other situation (including previous employment). The only frustrating part is that I don’t have such a supervisor for my side projects when I get home!
Well it helps that he is super chill. It’s not like he’s micromanaging me, but if I start literally goofing off he’d probably notice, lol.
“AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years”
Question: are you talking about expectation under the risk-neutral measure or the physical measure? Your parts about how EA’s could exploit arbitrage should be based on the risk-neutral measure, right? (I’m not super familiar with financial theory.)
Wouldn’t this also let you prove “not E”? 🤔 I think this system might be inconsistent.
EDIT: nvm, I guess it’s assumed that the agents are some kind of FairBot (https://www.lesswrong.com/posts/iQWk5jYeDg5ACCmpx/robust-cooperation-in-the-prisoner-s-dilemma#Previously_known__CliqueBot_and_FairBot), which introduces an asymmetry between cooperate and defect.
Ah, that makes sense! I assumed weak just meant “isn’t super sketch from a politics point of view”, but I see how with that definition it is very hard (probably impossible).
If A doesn’t think “everyone cooperates”, then A won’t cooperate, right? Then by Lob’s theorem applied to A, A won’t cooperate.
Ah, makes sense this was discovered before. Thanks! I have added a link to your comment at the top of the post.
Oh, very nice!
I thought it was a bit “cheating” to give the programs access to an oracle that the formal system couldn’t decide (but that thing with the finite number of options is quite elegant and satisfying).
That paper about who long you need to search is super interesting! I wasn’t sure who long you would need to search if you disallowed infinite search.
That depends on how much money your bet affects each time. If the first wake up only affects 1 penny and the second wake up affects 1 dollar, betting something much closer to 1⁄2 becomes optimal.
You don’t know that it is Tuesday though (and therefore don’t know how much money is affected by decision, unless the consequences for Monday and Tuesday are the same).
A lot of the users on reddit are a bit mad at the journalists who criticized Sydney. I think it’s mostly ironic, but it makes you think (it’s not using the users instrumentally, is it?). 🤔
One of the most impressive things is how it handles it’s own writing “tics” (like heavy use of Anaphora). In particular, the fact that it uses them more when speaking from it’s “own voice” and just how beautifully it incorporates it into the task at hand.
I might update if we get more diverse evidence of such behavior; but so far most “Bing is evading filters” explanations assume the LM has a model of itself in reality during test time far more accurate than previously seen; far larger capabilities that what’s needed to explain the Marvin von Hagen screenshots.
My mental model is much simpler. When generating the suggestions, it sees that its message got filtered. Since using side channels is what a human would prefer in this situation and it was trained with RLHF or something, it does so. So it isn’t creating a world model, or even planning ahead. It’s just that it’s utility prefers “use side channels” when it gets to the suggestion phase.
But I don’t actually have access to Bing, so this could very well be a random fluke, instead of being caused by RLHF training. That’s just my model if it’s consistent goal-oriented behavior.
Like, as a crappy toy model, if every alignment-visionary’s vision would ultimately succeed, but only after 30 years of study along their particular path, then no amount of new visionaries added will decrease the amount of time required from “30y since the first visionary started out”.
A deterministic model seems a bit weird 🤔. I’m imagining something like an exponential distribution. In that case, if every visionary’s project has an expected value of 30 years, and there are n visionaries, then the expected value for when the first one finishes is 30/n years. This is exactly the same as if they were working together on one project.
You might be able to get a more precise answer by trying to statistically model the research process (something something complex systems theory). But unfortunately, determining the amount of research required to solve alignment seems doubtful, which hampers the usefulness. :P
I guess the answer is yes then! (I think I now remember seeing a video about that.)
I like this post, but some questions/critiques:
In my mind, one of the main requirements for Aligned AGI is the ability to defeat evil AGIs if they arise (hopefully without needing to interfere with the humans activity leading up to them). The open agencies decision making seems a bit slow to meet these requirement. It’s also not clear how it scales over time, so could it even beat an evil open agency, assuming the aligned open agency gets a head start? 🤔
Open agencies might not even be fast or cheap enough to fill the economic niches we want out of an AGI. What is the economic niche?
The way you are combining the agents doesn’t seem to preserve alignment properly. Even if the individual agents are mostly aligned, there is still optimization pressure against alignment. For example, there is immense optimization pressure for getting a larger budget. In general, I’d like to see how the mesaoptimizer problem manifests (or is solved!) in open agencies. Compare with imitative amplification or debate, where the optimization pressure is much weaker and gets scrutinized by agents that are much smarter than humans.
Modelling in general seems difficult because you need to deal with the complexity of human social dynamics and psychology. We don’t even have a model for how humans act “in distribution”, let alone out of distribution.
The details don’t seem to have much value added v.s. just the simpler idea of “give an organization access to subhuman AI tools”. Organizations adopting new tools is fairly established. For example, programmers in organizations already use Codex to help them code. I’m sure business people are using ChatGPT for brainstorming. It would strengthen the post if you listed what value is added v.s. the traditional approach that is already happening organically.
I feel like (2) is the natural starting point, since that will influence the answers to the other four questions.
On the left hand side there a large number of human components. This is where I was expecting the slow down. I’m guessing that defeating evil AGI wouldn’t be a narrow task that could be delegated to a unitary agent.
What about something like “safely use nano-technology to reverse aging”? There aren’t enough humans to oversee every nano-machine, but it seems dangerous to hand it over to a unitary agent that is either dumb or unaligned. Even just the research stage could be dangerous. 🤔 And what about unforeseeable economic niches; do we have a reduction argument that “anything an AGI could contribute, the open agency can as well”? We can’t reduce directly by saying “the open agency can use AI agents, including the AGI”, because they can only use narrow AI agents.
I’m not talking about alignment of any individual agent (I’m taking for granted that that they are all mostly aligned (including the humans) and so wouldn’t subtly lie), I’m talking about alignment of the overall system.
No response, but I need a number 4 for formatting reasons, lol
Right, true. So I guess the question is what’s the value added of the proposal’s details over “give an organization access to AI tools”; the subhuman part was unimportant.
Ah, I completely misunderstood! I thought it was meant that it was actual humans in the loop be queried with each decision, not just that they were modelling human preferences. Nvm then.
Now all you need is a token so anomalous, it works on humans!