Consider a model tasked with predicting characters in text with a set of 64 characters (52 uppercase and lowercase letters, along with some punctuation).
Wait, people are doing this, instead of just turning words into numbers and having ‘models’ learn those? Anything GPT sized and getting results?
Once you start training, the easiest win is to simply notice how frequent each character is; just noticing that uppercase letters are rare, spaces are common, vowels are common, etc. could get your error down to 4-5 bits.
A diverse dataset will also include mistakes. And the more common a mistake is, the more likely it is learned as correct. Like its versus it’s (also reasons to make that mistake structure wise maybe? There’s a reason we do it, after all) and whether or not name ending in s should be followed by ‘s or ’.
or example, it might learn that “George W” tends to be followed by “ashington”.
I am now imagining a sentence completion task where the answer is George Washington, but the model predicts George W Bush instead, or vice versa.
As a simple example of how the scaling hypothesis affects AI safety research, it suggests that the training objective (“predict the next word”) is relatively unimportant in determining properties of the trained agent; in contrast, the dataset is much more important. This suggests that analyses based on the “reward function used to train the agent” are probably not going to be very predictive of the systems we actually build.
A fascinating point (though how much compute it requires is relevant). Though, even if it was scaled up a lot, what could a program that plays GTA do?
Problem: This definition fails to account for cases of knowledge where the map is represented in a very different way that doesn’t resemble the territory, such as when a map is represented by a sequence of zeros and ones in a computer.
While a problem, it’s not obvious how to overcome it—probably because it’s a real problem. If you found a textbook in another language would you be able to recognize it? Figure out what it was about? If it was physical yes (looking at the pictures), if digital and it was stored in a way your computer couldn’t read it, probably not.
Problem: In the real world, nearly every region of space will have high mutual information with the rest of the world. For example, by this definition, a rock accumulates lots of knowledge as photons incident on its face affect the properties of specific electrons in the rock giving it lots of information.
It seems if I point a camera at a place, and have it send what it sees to a screen then this:
has a lot of mutual information with that place
Looks like what it ‘models’
More generally, if we could figure things out about the world by looking at a rock, then this definition might seem less of an issue.
Looking for something like an intersection of definitions seems to have worked well here.
Problem: A video camera that constantly records would accumulate much more knowledge by this definition than a human, even though the human is much more able to construct models and act on them.
A human will also move around, learn about the shape of the world, etc. While we don’t look at cameras and think ‘that’s a flat earther’, I expect there is a lot of contrast between ‘the information the system including the camera has’ and ‘international travelers’.
The next iteration of the idea would be a drone that flies around the world, perhaps with very powerful batteries or refueling stations. However, reaching this stage seems to have softened that problem somewhat. Flying requires knowing how to fly, there’s issues around avoiding weather, not getting destroyed, repair (or self repair)...
it seems wrong to say that during the map-making process knowledge was not accumulating.
Let’s say it was. Then the process that destroyed it worked against that. Similarly:
“sphex” characterizes activity taken by a (mindless automaton)
Nevermind, I just looked it up, and apparently that isn’t real.
I thought that was in The Sequences, but this yielded no results:
It’s true that knowledge destroyed seems like it’s not useful for precipitating action (though we might like more of a distinction here between ‘knowledge’ and ‘life’ (that is, destroyed knowledge is ‘dead’)). Why this can’t be rescued with counterfactuals isn’t clear.
Also, if we send a ship out to map the coastline, and it doesn’t come back:
Maybe sailing around the coast is treacherous.
Or it ran into pirates, invaders, or something.
Repeated attempts, meeting with failure, may allow us to infer (‘through absence rather than presence’) some knowledge.
Wait, people are doing this, instead of just turning words into numbers and having ‘models’ learn those? Anything GPT sized and getting results?
Not totally sure. There are advantages to character-level models, e.g. you can represent Twitter handles (which a word embedding based approach can have trouble). People have definitely trained character-level RNNs in the past. But I don’t know enough about NLP to say whether people have trained large models at the character level. (GPT uses byte pair encoding.)
Why this can’t be rescued with counterfactuals isn’t clear.
I suspect Alex would say that it isn’t clear how to define what a “counterfactual” is given the constraints he has (all you get is a physical closed system and a region of space within that system).
Wait, people are doing this, instead of just turning words into numbers and having ‘models’ learn those? Anything GPT sized and getting results?
A diverse dataset will also include mistakes. And the more common a mistake is, the more likely it is learned as correct. Like its versus it’s (also reasons to make that mistake structure wise maybe? There’s a reason we do it, after all) and whether or not name ending in s should be followed by ‘s or ’.
I am now imagining a sentence completion task where the answer is George Washington, but the model predicts George W Bush instead, or vice versa.
A fascinating point (though how much compute it requires is relevant). Though, even if it was scaled up a lot, what could a program that plays GTA do?
While a problem, it’s not obvious how to overcome it—probably because it’s a real problem. If you found a textbook in another language would you be able to recognize it? Figure out what it was about? If it was physical yes (looking at the pictures), if digital and it was stored in a way your computer couldn’t read it, probably not.
It seems if I point a camera at a place, and have it send what it sees to a screen then this:
has a lot of mutual information with that place
Looks like what it ‘models’
More generally, if we could figure things out about the world by looking at a rock, then this definition might seem less of an issue.
Looking for something like an intersection of definitions seems to have worked well here.
A human will also move around, learn about the shape of the world, etc. While we don’t look at cameras and think ‘that’s a flat earther’, I expect there is a lot of contrast between ‘the information the system including the camera has’ and ‘international travelers’.
The next iteration of the idea would be a drone that flies around the world, perhaps with very powerful batteries or refueling stations. However, reaching this stage seems to have softened that problem somewhat. Flying requires knowing how to fly, there’s issues around avoiding weather, not getting destroyed, repair (or self repair)...
Let’s say it was. Then the process that destroyed it worked against that. Similarly:
“sphex” characterizes activity taken by a (mindless automaton)
Nevermind, I just looked it up, and apparently that isn’t real.
I thought that was in The Sequences, but this yielded no results:
https://www.readthesequences.com/Search?q=sphex&action=search
It’s true that knowledge destroyed seems like it’s not useful for precipitating action (though we might like more of a distinction here between ‘knowledge’ and ‘life’ (that is, destroyed knowledge is ‘dead’)). Why this can’t be rescued with counterfactuals isn’t clear.
Also, if we send a ship out to map the coastline, and it doesn’t come back:
Maybe sailing around the coast is treacherous.
Or it ran into pirates, invaders, or something.
Repeated attempts, meeting with failure, may allow us to infer (‘through absence rather than presence’) some knowledge.
Not totally sure. There are advantages to character-level models, e.g. you can represent Twitter handles (which a word embedding based approach can have trouble). People have definitely trained character-level RNNs in the past. But I don’t know enough about NLP to say whether people have trained large models at the character level. (GPT uses byte pair encoding.)
I suspect Alex would say that it isn’t clear how to define what a “counterfactual” is given the constraints he has (all you get is a physical closed system and a region of space within that system).