Then we’ll need a “thought process tampering awareness” evaluation.
omegastick
if AIs were completing 1 month long self contained software engineering tasks (e.g. what a smart intern might do in the first month)
This doesn’t seem like a good example to me.
The sort of tasks we’re talking about are extrapolations of current benchmark tasks, so it’s more like: what a programming savant with almost no ability to interact with colleagues or search out new context might do in a month given a self-contained, thoroughly specced and vetted task.
I expect current systems will naively scale to that, but not to the abilities of an arbitrary intern because that requires skills that aren’t tested in the benchmarks.
Some people have strong negative priors toward AI in general.
When the GPT-3 API first came out, I built a little chatbot program to show my friends/family. Two people (out of maybe 15) flat out refused to put in a message because they just didn’t like the idea of talking to an AI.
I think it’s more of an instinctual reaction than something thought through. There’s probably a deeper psychological explanation, but I don’t want to speculate.
Rather than having objective standards, I find a growth-centric approach to be most effective. Optimizing for output is easy to Goodheart, so as much as possible I treat more as a metric than a goal. It’s important that I’m getting more done now than I was a year ago, for example, but I don’t explicitly aim for a particular output on a day-to-day basis. Instead I aim to optimize my processes and improve my skills, which leads to increased output. That applies not just to good work performance, but many things.
> How much do you get done in a typical month/half year?
Measuring this objectively is hard, but roughly one large project (big new feature, new application, major design overhaul) per month, or more if the projects I’m working on are smaller.
> How much do consider aspirational but realistic to get done in a typical month/half year?
I’ve managed to get projects that I’d normally finish in a month done in 2 weeks or so by crunching hard, but I’m generally pretty consistent with output on the scale of months/half years. I definitely don’t aim for that.
> How much do you consider on the low end but okay to get done in a typical month/half year?
I wouldn’t be too upset if a project goes over by 25% due to low output (they can go over longer if there are unexpected issues, but that’s another thing). Again though, I’m pretty consistent on the scale of months/half years, so this rarely happens.
> What kind of output would you want to see out of a researcher/community organiser/other independent worker within a month/half a year to be impressed/not be disappointed? (Assuming this is amount is representative of them)
I don’t have objective standards here. If I get the impression they are genuinely putting in a good effort and improving with time, I’m happy. Different people have different strengths, and a person might work quite slowly relative to the average, but produce very high quality work. If they continue improving their output, eventually it will be high (for whatever standard of “high” you like). If they’re putting in effort and not improving, they might not be in the right line of work, and then I’d be disappointed.
> What’s the minimum output would you want to see out of a researcher/community organiser/other independent worker to be in favour of them getting funding to continue their work? (Assuming this is amount is representative of them)
This is a knapsack problem. Calculate thescore = (expected output * expected value of work per unit of output) / funding requiredfor each person that needs funding, sort the list in descending order, and allocate funding in order from top to bottom. You don’t need to fully solve the knapsack problem here, because leftover funding can be carried over.
> What’s the minimum output would you want to see out of your friend to feel good about them continuing their current work? (Assuming this is amount is representative of them)
Their average output over the last 12 months should be higher than their average output over the previous 12, by some non-insignificant amount.
Agreed, this was an expected result. It’s nice to have a functioning example to point to for LLMs in an RLHF context, though.
From one perspective, nature does kind of incentivize cooperation in the long term. See The Goddess of Everything Else.
Is there a reason to believe this is likely? Outside of a strong optimization pressure for niceness (of which there is definitely some, but relative to other optimization pressures it’s relatively weak) I’d expect these organizations to be of roughly average possible niceness for their situation.
A quick Google search of probe tuning doesn’t turn up anything. Do you have more info on it?
Probe-tuning doesn’t train on LLM’s own “original rollouts” at all, only on LLM’s activations during the context pass through the LLM.
This sounds like regular fine tuning to me. Unless you mean that the loss is calculated based on one (multiple?) of the network’s activations rather than on the output logits.
Edit: I think I get what you mean now. You want to hook a probe to a model and fine-tune it to perform well as a probe classifier, right?
It’s also possible that there is some elegant, abstract “intelligence” concept (analogous to arithmetic) which evolution built into us but we don’t understand yet and from which language developed. It just turns out that if you already have language, it’s easier to work backwards from there to “intelligence” than to build it from scratch.
This probably isn’t the case, but I secretly wonder if the people in camp #1 are p-zombies.
Not very familiar with US culture here: is AI safety not extremely blue-tribe coded right now?
How does the logic here work if you change the question to be about human history?
Guessing a 50⁄50 coin flip is obviously impossible, but if Omega asks whether you are in the last 50% of “human history” the doomsday argument (not that I subscribe to it) is more compelling. The key point of the doomsday argument is that humanity’s growth is exponential, therefore if we’re the median birth-rank human and we continue to grow, we don’t actually have that long (in wall-time) to live.
Please don’t do this, this is dangerous.
How much Test E did you take? 200mg/ml, but how many ml?
Usually, one dose of testosterone isn’t enough for a noticeable difference in mental state, and by the time it is enough you’ll need a plan for managing mental side effects from your increased estrogen.
I’m usually a pretty big fan of bioengineering, self-experimentation, etc. but this strikes me as particularly reckless.
Is anyone worried about AI one-shotting comprehensive nano-technology? It can make as many tries as it wants, and in fact, we’ll be giving it as many tries as we can.
I think the long gap between GPT-3 and GPT-4 can be explained by Chinchilla. That was the point where OpenAI realized their models were undertrained for their size, and switched focus from scaling to fine-tuning for a couple of years. InstructGPT, Codex, text-davinci-003, and GPT-3.5 were all released in this period.
GPT-4 can handle tabletop RPGs incredibly well. You just have to ask it to DM a Dungeons and Dragons 5e game, give it some pointers about narrative style, game setting, etc. and you’re off.
For the first couple of hours of play it’s basically as good as a human, but annoyingly it starts to degrade after that, making more mistakes and forgetting things. I don’t think it’s a context length issue, because it forgets info that’s definitely within context, but I can think of a few other things that could be the issue.
It seems implied that the chance of a drought here is 50%. If there is a 50% chance of basically any major disaster in the foreseeable future, the correct action is “Prepare Now!”.
This advice also applies to the aligned case. And all of the inbetweens. And to most other scenarios.
Disclaimer: I run an “AI companion” app, which has fulfilled the role of a romantic partner for a handful of people.
This is the main benefit I see of talking about your issues with an AI. Current-gen (RLHF tuned) LLMs are fantastic at therapy-esque conversations, acting as a mirror to allow the human to reflect on their own thoughts and beliefs. Their weakpoint (as a conversational partner) right now is lacking agency and consistent views of their own, but that’s not what everyone needs.
I enjoyed the article and think it points at some important things, but agree with Stephen that it might not point to a useful distinction.
Purely anecdotally: I don’t get absorbed into books easily (I very much enjoy reading, but don’t get the level of immersion you describe), feel emotional conflict as two distinct feelings or thoughts warring in my mind, can have IFS conversations, etc. but am absolutely hopeless at multi-tasking, dividing my attention, etc.
Meanwhile, my wife is the polar opposite. She gets immersed in books, feels one emotion at a time, empathizes compulsively, etc. but is great at multi-tasking.
Maybe the threaded model just doesn’t apply to multi-tasking, but that seems unusual to me. I would expect multi-tasking to be an obvious benefit of having a “multi-threaded” brain.