Thanks, fixed.
Quintin Pope
Suppose we want to train GPT-n in to do any of many different goals (give good medical advice, correctly critique an argument, write formal and polite text, etc). We could find training data that demonstrate a possible goal and insert natural language control codes around that data.
E.g., suppose XY is a section of training text. X contains a description of a medical problem. Y gives good medical advice. We would then modify XY to be something like:
[give correct medical advice]X[start]Y[end]
We would then repeat this for as many different goals and for as much of the training text as possible. Hopefully, GPT-n will learn that [instructions](problem description)[start] should be followed by the solution to (problem description) in accordance with [instructions], and that it should only revert to “normal text” mode once it sees an [end].
If GPT-n generalizes well, we may be able to provide customized control codes that don’t appear anywhere in the training data and have it follow our instructions. I think this approach will scale well because bigger models are better at learning rare patterns in their data. We just need to annotate enough examples to teach the intended pattern. This may even be easier for bigger/more sample efficient models.
(This is basically the approach described in https://arxiv.org/abs/1909.05858 but with more focus on generalizing control codes with natural language.)
The control codes could include a special token/sequence that only authorized users can use.
Also, if you’re allowing arbitrary untrusted queries to the model, your security shouldn’t depend on model output anyways. Even if attackers can’t use control codes, they can still likely get the model to do what they want via blackbox adversarial search over the input tokens.
A simple way to make GPT-3 follow instructions
I’m not sure I follow. Where does AI Dungeon come into this? Could you elaborate?
The GPT-3 answers I gave for “are ghosts real?” came from zero-shot prompting of OpenAI’s GPT-3 API (meaning no prior examples).
If you’re asking about the “You are a superintelligent AI that’s never wrong” trick, then the idea is that, by prefacing your question like this, you can get GPT-3 to write the text that the “superintelligent AI” would write, because GPT-3 thinks that’s the most likely continuation. GPT-3 is more likely to be right if it’s writing dialog for a character that’s always right.
People often give GPT-3 multiple examples of the task they want it to solve (multi-shot prompting), but wanted to keep things simple in the post. I’ll add some clarification there.
The finetuning scheme I proposed probably wouldn’t be as beneficial in the multi-shot setting as it would be in the zero-shot setting, but I still think it would be beneficial. Explicitly training GPT-3 to follow instructions also seems like a more straightforward way to tell GPT-3 that it’s supposed to follow your instructions than giving it enough examples that GPT-3 picks up on your intent. Working with GPT-3 would be far easier if we didn’t have to generate a list of examples of each task we wanted it to do.
I was using the API. The trick actually seemed to help a bit, but its responses were still inconsistent and not always “no”.
Honestly, you’re probably better off just buying a reusable half-face respirator: https://www.grainger.com/product/MILLER-ELECTRIC-Half-Mask-Respirator-Kit-36RC58
They’re more comfortable that you’d expect, offer vastly superior protection, and are far more convenient than having to fiddle with different reusable masks as they fall apart. It’s also probably cheaper than constantly buying new reusable masks. The linked mask is $43.27 and (depending on location), should arrive in a few days.
Looking at amazon, kn95s cost ~$1 per mask. Replacement filters for the mask I listed cost $18 for the pair. If the mask filters last > 18 times longer than the KN95, the mask is cheaper.
There are two types of filters. Gas filters inactivate harmful gasses with chemicals in the filters. They need to be changed often because the filter runs out of chemicals. Particulate filters remove particulates from the air. They need to be changed once the filter clogs up with particulates. If you’re not using it to filter smoke/dust/ect, they can last a long time. I’d change at least once every 6 months, but that’s $36 per year in mask costs, which is pretty small.
As someone who exclusively uses respirators, I’d say they’re actually reasonably comfortable once you get used to them.
GPT-3 (and most pretrained transformers) generate tokens, not words or characters. Sometimes, those tokens represent words and sometimes they represent single characters. More common words receive their own token, and less common words are broken into two or more tokens. The vocab is tuned to minimize avg. text length.
[Question] Is there work looking at the implications of many worlds QM for existential risk?
Standard quantum mechanics models small, unobserved quantum systems as probability distributions over possible observable values, meaning there’s no function that gives you a particle’s exact momentum at a given time. Instead, there’s a function that gives you a probability distribution over possible momentum values at a given time. Every modern interpretation of quantum mechanics predicts nearly the same probability distributions for every quantum system.
Many worlds QM argues that, just as small, unobserved quantum systems are fundamentally probabilistic, so too is the wider universe. Under many worlds, there exists a universal probability distribution over states of the universe. Different “worlds” in the many worlds interpretation equate to different configurations of the observable universe.
If many worlds is true, it implies there are alternate versions of ourselves who we can’t communicate with. However, the actions that best improve humanity’s prospects in a many worlds situations may be different from the best actions in a single world situation.
Very impressive results! I’m particularly glad to see the agents incorporating text descriptions of their goals in the agents’ inputs. It’s a step forward in training agents that flexibly follow human instructions.
However, it currently looks like the agents are just using the text instructions as a source of information about how to acquire reward from their explicit reward functions, so this approach won’t produce corrigible agents. Hopefully, we can combine XLand with something like the cooperative inverse reinforcement learning paradigm.
E.g., we could add CIRL agent to the XLand environments whose objective is to assist the standard RL agents. Then we’d have:
An RL agent
whose inputs are the text description of its goal and its RGB vision + other sensors
that gets direct reward signals
A CIRL agent
whose inputs are the text description of the RL agent’s goals and the CIRL agent’s own RGB vision + other sensors
that has to infer the RL agent’s true reward from the RL agent’s behavior
Then, apply XLand open ended training where each RL agent has a variable number of CIRL agents assigned as assistants. Hopefully, we’ll get a CIRL agent that can receive instructions via text and watch the behavior of the agent it’s assisting to further refine its beliefs about its current objective.
What really impressed me were the generalized strategies the agent applied to multiple situations/goals. E.g., “randomly move things around until something works” sounds simple, but learning to contextually apply that strategy
to the appropriate objects,
in scenarios where you don’t have a better idea of what to do, and
immediately stopping when you find something that works
is fairly difficult for deep agents to learn. I think of this work as giving the RL agents a toolbox of strategies that can be flexibly applied to different scenarios.
I suspect that finetuning agents trained in XLand in other physical environments will give good results because the XLand agents already know how to use relatively advanced strategies. Learning to apply the XLand strategies to the new physical environments will probably be easier than starting from scratch in the new environment.
There are people who’ve been blind from birth. They’re still generally intelligent. I think general intelligence is mostly applying powerful models to huge amounts of rich data. Human senses are sufficiently rich even without vision.
Also, there are lots of differences between human brains and current neural nets. E.g., brains are WAY more powerful than current NNs and train for years on huge amounts of incredibly rich sensory data.
The summary says they use text and a search for “text” in the paper gives this on page 32:
“In these past works, the goal usually consists of the position of the agent or a target observation to reach, however some previous work uses text goals (Colas et al., 2020) for the agent similarly to this work.”
So I thought they provided goals as text. I’ll be disappointed if they don’t. Hopefully, future work will do so (and potentially use pretrained LMs to process the goal texts).
I thought that if things got significantly more intense I might have a heart attack and die!
I was initially skeptical that this was a risk worth considering. I’ve heard anecdotes of people dying of excitement, but seemed like a “shark attack” sort of risk that’s more discussed than experienced. However, some Googling revealed “Cardiovascular Events during World Cup Soccer”, which finds that cardiac incidents were 2.66x higher on days the German team competed during the 2006 soccer world cup. FIFA’s website says an average of ~21.9 million people watched each match. This website says Germany had a population of 81,472,235 in 2006.
If we attribute 100% of the 2.66x increase to 21.9 million soccer fans being more excited on those days (as opposed to getting less sleep, drinking more alcohol, etc.), then we get (CV_risk_x * 21.9 + 59.57) / 81.47 = 2.66, so CV_risk_x = 7.18x higher risk due to extreme excitement. If we arbitrarily attribute 33% of the increase to excitement, we get (CV_risk_x * 21.9 + 59.57) / 81.47 = 1.548, and CV_risk_x = 3.04x.
That’s higher than I expected, but still not too bad, especially if your current risk is low. I think virtual reality in particular is less of a risk than many other high-excitement activities because it involves more exertion than, say, normal video games or reading. I expect the increased exertion on net more than balances out any excitement risks.
Do you view art, literature, meditation or pet care similarly?
New GPT-3 competitor
I think it’s plausible we’ll be able to use deep learning to model a brain well before we understand how the brain works.
Record a ton of brain activity + human behaviour with a brain computer interface and wearable recording devises, respectively.
Train a model to predict future brain activity + behaviour, conditioned on past brain activity + behaviour.
Continue running the model by feeding it its own predicted brain activity + behaviour as the conditioning data for future predictions.
Congratulations, you now have an emulated human. No need to understand any brain algorithms. You just need tons of brain + behaviour data and compute. I think this will be possible before non brain-based AGI because current AI research indicates it’s easier to train a model by distilling/imitating an already trained model than it is to train from scratch, e.g., DistilBERT: https://arxiv.org/abs/1910.01108v4
Same. Specifically, I went from predicting 50% chance of human-level AGI within 40 years to 50% chance within 10 years.
Andrew Mayne was also given access to the GPT-3 API. You can read his impressions here: https://andrewmayneblog.wordpress.com/
I found his results very impressive as well. For example, he’s able to prompt GPT-3 to summarize a Wikipedia article on quantum computing at either a second grade or an eighth grade level, depending on the prompt.
I actually put together a presentation on GPT-like architectures and their uses for my advisor: https://docs.google.com/presentation/d/1kCJ2PJ_3UteHBX5TWZyrF5ontEdNx_B4vi6KTmQmPNo/edit?usp=sharing
It’s not really meant to be a stand alone explanation, but it does list some of GPT-2/3′s more impressive abilities. After compiling the presentation, I think we’ll look back on GPT-3 as the “Wright brothers” moment for AGI.
Consider, this post suggests GPT-3 cost ~$4.6 million to train: https://lambdalabs.com/blog/demystifying-gpt-3. It would be well within Google/Microsoft/Amazon/DoD/etc’s budget to increase model size by another 2 (possibly 3) orders of magnitude. Based on the jump in GPT-3′s performance going from 13 B parameters to 175 B parameters, such a “GPT-4” would be absolutely stunning.