I personally feel that the fact that it was such an effortless attempt makes it more impressive, and really hammers home the lesson we need to take away from this. It’s one thing to put in a great deal of effort to defeat some defences. It’s another to completely smash through them with the flick of a wrist.
FactorialCode
Props to whoever petrov_day_admin_account was for successfully red-teaming lesswrong.
As much as I hate to say it, I don’t think that it makes much sense for the main hub of the rationalist movement to move away from Berkeley and the Bay Area. There are several rationalist adjacent organizations that are firmly planted in Berkely. The ones that are most salient to me are the AI and AI safety orgs. You have OpenAI, MIRI, CHAI, BAIR, etc. Some of these could participate in a coordinated move, but others are effectively locked in place due to their tight connections with larger institutions.
Ehh, Singapore is a good place to do business and live temporarily. But mandatory military service for all male citizens and second gen permanent residents, along with the work culture make it unsuitable as a permanent location to live. Not to mention that there’s a massive culture gap between the rats and the Singaporeans.
I think the cooperative advantages mentioned here have really been overlooked when it comes to forecasting AI impacts, especially in slow takeoff scenarios. A lot of forecasts, like what WFLL, mainly posit AI’s competing with each other. Consequently Molochian dynamics come into play and humans easily lose control of the future. But with these sorts of cooperative advantages, AIs are in an excellent position to not be subject to those forces and all the strategic disadvantages they bring with them. This applies even if an AI is “merely” at the human level. I could easily see an outcome that from a human perspective looks like a singleton taking over, but is in reality a collective of similar/identical AI’s working together with superhuman coordination capabilities.
I’ll also add source-code-swapping and greater transparency to the list of cooperative advantages at an AI’s disposal. Different AIs that would normally get stuck in a multipolar traps might not stay stuck for long if they can do things analogous to source code swap prisoners dilemmas.
Just use bleeding edge tech to analyze ancient knowledge from the god of information theory himself.
This paper seems to be a good summary and puts a lower bound on entropy of human models of english somewhere between 0.65 and 1.10 BPC. If I had to guess, the real number is probably closer 0.8-1.0 BPC as the mentioned paper was able to pull up the lower bound for hebrew by about 0.2 BPC. Assuming that regular english compresses to an average of 4* tokens per character, GPT-3 clocks in at 1.73/ln(2)/4 = 0.62 BPC. This is lower than the lower bound mentioned in the paper.
So, am I right in thinking that if someone took random internet text and fed it to me word by word and asked me to predict the next word, I’d do about as well as GPT-2 and significantly worse than GPT-3?
That would also be my guess. In terms of data entropy, I think GPT-3 is probably already well into the superhuman realm.
I suspect this is mainly because GPT-3 is much better at modelling “high frequency” patterns and features in text that account for a lot of the entropy, but that humans ignore because they have low mutual information with the things humans care about. OTOH, GPT-3 also has extensive knowledge of pretty much everything, so it might be leveraging that and other things to make better predictions than you.
*(ask Gwern for details, this is the number I got in my own experiments with the tokenizer)
I’m OOTL, can someone send me a couple links that explain the game theory that’s being referenced when talking about a “battle of the sexes”? I have a vague intuition from the name alone, but I feel this is referencing a post I haven’t read.
Edit: https://en.wikipedia.org/wiki/Battle_of_the_sexes_(game_theory)
I’m gonna go with barely, if at all. When you wear a surgical mask and you breath in, a lot of air flows in from the edges, without actually passing through the mask, so the mask doesn’t have very good opportunity to filter the air. At least with N95 and N99 mask, you have a seal around your face, and this forces the air through the filter. Your probably better off wearing a wet bandana or towel that’s been tied in such a way as to seal around your face, but that might make it hard to breath.
I found this, which suggests that they’re generally ineffective. https://www.cdph.ca.gov/Programs/EPO/Pages/Wildfire Pages/N95-Respirators-FAQs.aspx
Yeah, I’ll second the caution to draw any conclusions from this. Especially because this is macroeconomics.
https://en.wikipedia.org/wiki/Sectoral_balances
It is my understanding that this is broadly correct. It is also my understanding that this is not common knowledge.
Agentic Language Model Memes
One hypothesis I have is that even in the situation where there is no goal distribution and the agent has a single goal, subjective uncertainty makes powerful states instrumentally convergent. The motivating real world analogy being that you are better able to deal with unforeseen circumstances when you have more money.
I’ve gone through a similar phase. In my experience you eventually come to terms with those risks and they stop bothering you. That being said, mitigating x and s-risks has become one of my top priorities. I now spend a great deal of my own time and resources on the task.
I also found learning to meditate helps with general anxiety and accelerates the process of coming to terms with the possibility of terrible outcomes.
The way I was envisioning it is that if you had some easily identifiable concept in one model, e.g. a latent dimension/feature that corresponds to the log odd of something being in a picture, you would train the model to match the behaviour of that feature when given data from the original generative model. Theoretically any loss function will do as long as the optimum corresponds to the situation where your “classifier” behaves exactly like the original feature in the old model when both of them are looking at the same data.
In practice though, we’re compute bound and nothing is perfect and so you need to answer other questions to determine the objective. Most of them will be related to why you need to be able to point at the original concept of interest in the first place. The acceptability of misclassifying any given input or world-state as being or not being an example of the category of interest is going to depend heavily on things like the cost of false positives/negatives and exactly which situations get misclassified by the model.
The thing about it working or not working is a good point though, and how to know that we’ve successfully mapped a concept would require a degree of testing, and possibly human judgement. You could do this by looking for situations where the new and old concepts don’t line up, and seeing what inputs/world states those correspond to, possibly interpreted through the old model with more human understandable concepts.
I will admit upon further reflection that the process I’m describing is hacky, but I’m relatively confident that the general idea would be a good approach to cross-model ontology identification.
I think you can loosen (b) quite a bit if you task a separate model with “delineating” the concept in the new network. The procedure does effectively give you access to infinite data, so the boundary for the old concept in the new model can be as complicated as your compute budget allows. Up to and including identifying high level concepts in low level physics simulations.
I think the eventual solution here (and a major technical problem of alignment) is to take an internal notion learned by one model (i.e. found via introspection tools), back out a universal representation of the real-world pattern it represents, then match that real-world pattern against the internals of a different model in order to find the “corresponding” internal notion.
Can’t you just run the model in a generative mode associated with that internal notion, then feed that output as a set of observations into your new model and see what lights up in it’s mind? This should work as long as both models predict the same input modality. I could see this working pretty well for matching up concepts between the latent spaces of different VAEs. Doing this might be a bit less obvious in the case of autoregressive models, but certainly not impossible.
I think this is pretty straight forward to test. GPT-3 gives joint probabilities of string continuations given context strings.
Step 1: Give it 2 promps, one suggesting that it is playing the role of a smart person, and one where it is playing the roll of a dumb person.
Step 2: Ask the “person” a question that demonstrates that persons intelligence. (something like a math problem or otherwise)
Step 2: Write continuations where the person answers correctly and incorrectly
Step 3: Compare the relative probabilities GPT-3 assigns to each continuation given the promps and questions.
If GPT-3 is sandbagging itself, it will assign a notably higher probability to the correct answer when conditioned on the smart person prompt than when conditioned on the dumb person prompt. If it’s not, it will give similar probabilities in both cases.
Step 4: Repeat the experiment with problems of increasing difficulty and plot the relative probability gap. This will show the limits of GPT-3′s reflexive intelligence. (I say reflexive because it can be instructed to solve problems it otherwise couldn’t with the amount of serial computations at it’s disposal by carrying out an algorithm as part of its output, as is the case with parity)
This is an easy $1000 for anyone who has access to the beta API.
Hypothesis: Unlike the language models before it and ignoring context length issues, GPT-3′s primary limitation is that it’s output mirrors the distribution it was trained on. Without further intervention, it will write things that are no more coherent than the average person could put together. By conditioning it on output from smart people, GPT-3 can be switched into a mode where it outputs smart text.
According to Gwern, it fails the Parity Task.
I haven’t actually figured that out yet, but several people in this thread have proposed takeaways. I’m leaning towards “social engineering is unreasonably effective”. That or something related to keeping a security mindset.