About 5 years to learn the mindset, frequent discussion with peers in a shared culture, less frequent interaction and feedback from experienced mentors. May as well talk about “building an AI alignment department.”
rotatingpaguro
I watched some of the 10 min videos (in Italian).
General comments:
I like them better the more the principle “show, don’t tell” is applied. You tend to talk fast and a lot. When you make concrete examples it’s a nice interruption. I guess that telling is cheaper to start with though.
Although you say that the latest videos are higher quality, and in some sense I can see that they look more professional, I found the video about how to discuss funnier and more engaging because of your acting. The newer videos are more rigid, you stand in front on the camera and talk mechanically (makes me recall my Italian teachers in high school). But maybe it is effective to look like a serious teacher, to convey the sense that the ideas are important?
Specific comment: in the video about books on decisions, I think that the example about losing all your possessions vs. gaining 3x is an example of utility != money, not of irrationality. It’s pretty confusing, because if I lose everything I can’t pay the bills, the food, etc. I think it’s a fallacy when I have 1000 € and I won’t risk 1€ to make 3€.
(GPT-4 also suggested there is a “social” motivation of “fitting in”, but I think it is hardly the case: is there any society where people are ostracised for not possessing difficult skills? I doubt so. This is especially so in the social groups of children, where, if anything, the opposite is the case: kids may be ostracised for trying to become “too smart”, or “learning too much”.)
I disagree about children. I’ve seen classrooms where competition is on having good grades, and the guys with good grades are the bullies/they are socially on top. In general there are many social contexts in which you have to be good at something difficult to fit in. I’ve seen it both in selective schools, where you could expect it, but also in ghettos.
Ilya Sutskever says something about this in an interview:
https://www.youtube.com/watch?v=Yf1o0TQzry8
My recall: optimization on predicting the next token finds intelligent schemes that can be coordinated to go further than the humans that produced the tokens in the first place. Think about GPT-n being at least as smart and knowledgeable as the best human in every specialized domain, and then the combination of all this abilities at once allowing it to go further than any single human or coordinated group of humans.
I would not have come to trust a socially-able-Eliezer. He’s pure. Let him be that.
Is there a trick to write a utility satisficer as a utility maximizer?
By “utility maximizer” I mean the ideal bayesian agent from decision theory that outputs those actions which maximize some expected utility over states of the world .
By “utility satisficer” I mean an agent that searches for actions that make greater than some threshold short of the ideally attainable maximum, and contents itself with the first such action found. For reference, let’s fix that and set the satisficer threshold to .
The satisficer is not something that maximizes . That would be again a utility maximizer, but with utility , and it would run into the usual alignment problems. The satisficer reasons on . However, I’m curious if there is still a way to start from a satisficer with utility and threshold and define a maximizer with utility that is functionally equivalent to the satisficer.
As said, it looks clear to me that won’t work.
Of course it is possible to write but it is not interesting. It is not a compact utility to encode.
Is there some useful equivalence of intermediate complexity and generality? If there was, I expect it would make me think alignment is more difficult.
I agree with what you write but it does not answer the question. From the links you provide I arrived at Quantilizers maximize expected utility subject to a conservative cost constraint, which says that a quantilizer, which is a more accurate formalization of a satisficer as I defined it, maximizes utility subject to a constraint over the pessimization of all possible cost functions from the action generation mechanism to the action selection. This is relevant but does not translate the satisficer to a maximizer, unless it is possible to express that constraint in the utility function (maybe it’s possible, I don’t see how to do it).
-
I have trouble framing this thing in my mind because I do not understand what the distribution is relative to. In the strictest sense, the distribution of internet text is the internet text itself, and everything GPT outputs is an error. In a broad sense, what is an error and what isn’t? I think there’s something meaningful here, but I can not pinpoint it clearly.
-
This strongly shows that GPT won’t be able to stay coherent with some initial state, which was already clear from it being autoregressive. It only weakly indicates that GPT won’t learn, somewhere in its weights, the correct schemes to play chess, which could then be somehow elicited.
-
How does this not apply to humans? It seems to me we humans do have a finite context window, within which we can interact with a permanent associative memory system to stay coherent on a longer term. The next obvious step with LLMs is introducing tokens that represent actions and have it interact with other subsystems or the external world, like many people are trying to do (e.g., PaLM-e). If this direction of improvement pans out, I would argue that LLMs leading to these “augmented LLMs” would not count as “LLMs being doomed”.
3a) It applies to humans, and humans are doomed :)
LLMs are already somewhat able to generate dialogues where they err and then correct in a systematic way (e.g., reflexion). If there really was the need to create large datasets with err-and-correct-text, I do not exclude they could be generated with the assistance of existing LLMs.
-
I have not read the paper you link, but I have this expectation about it: that the limitation of imitation learning is proved in a context that lacks richness compared to imitating language.
My intuition is: I have experience myself of failing to learn just from imitating an expert playing a game the best way possible. But if someone explains to me their actions, I can then learn something.
Language is flexible and recursive: you can in principle represent anything out of the real world in language, including language itself, and how to think. If somehow the learner manages to tap into recursiveness, it can shortcut the levels. It will learn how to act meaningfully not because it has covered all the possible examples of long-term sequences that lead to a goal, but because it has seen many schemes that map to how the expert thinks.
I can not learn chess efficiently by observing a grandmaster play many matches and jotting down all the moves. I could do it if the grandmaster was a short program if implemented in chess moves.
- 8 Apr 2023 23:47 UTC; 2 points) 's comment on GPTs are Predictors, not Imitators by (
Ok, now I understand better and I agree with this point, it’s like when you learn something faster if a teacher lets you try in small steps and corrects your errors at a granular level instead of leaving you alone in front of a large task you blankly stare at.
For a response to this, see my comment above.
It’s not enough that the agent build the Turing machine that implements the expert, it needs to furthermore modify itself to behave like that Turing machine, otherwise you just have some Turing machine in the environment doing its own thing, and the agent still can’t behave like the expert.
I don’t care if the agent is “really doing the thing himself” or not. I care that the end result is the overall system imitating the expert. Of course my extreme example is in some sense not useful, I’m saying “the expert is already building the agent you want, so you can imitate it to build the agent you want”. The point of the example is showing a simple crisp way the proof fails.
So yeah then I don’t know how to clearly move from the very hypothetical counterexample to something less hypothetical. To start, I can have the agent “do the work himself” by having the expert run the machine it defined with its own cognition. This is in principle possible in the autoregressive paradigm, since if you consider the stepping function as the agent, it’s fed its previous output. However there’s some contrivance in having the expert define the machine in the initial sequence, and then running it, in such a way that the learner gets both the definition and the running part from imitation. I don’t have a clear picture in my mind. And next I’d have to transfer somehow the intuition to the domain of human language.
I agree overall with the rest of your analysis, in particular thinking about this in term of threshold coherence lengths. If somehow the learner needs to infer the expert Turing machine from the actions, the relevant point is indeed how long is the specification of such machine.
I’m not sure I understand correctly: you are saying that it’s easier for the LLM to stay coherent in the relevant sense than it looks from LeCun’s argument, right?
It wasn’t working for mee too, it worked switching from “Markdown” to “Lesswrong docs”
I tried right now the embedding and it did not work unless I activated the “Lesswrong Docs” editing mode instead of the default “Markdown” one, that was not clear
The preceding decade had contained a military defeat, a puppet government installed by the conquerors, a coup, a counter-coup, and a counter-counter coup; things in Athens were actually unstable, and it’s not difficult to be sympathetic to the view that this was the wrong time to be picking nits.
This makes me suspicious about the claim that Socrates was an existential societal treat from the point of view of the denizens.
Overall, I’m here because LW is the way it is. I think I manage to be both constructive and critical. I expect old people to get bored and move on, like I did myself for any social space so long. I don’t expect exploring very specific theories of LW-vibe-trends will give insight in the end.
I’m a natural at this kind of stuff. Yet, growing up during my twenties, I became more like that.
It seems to me you imply that rationalism was a key component. Sometimes I wonder about that for myself. Currently I put more probability on there being many many trajectories of this kind and the specificities not mattering much. Other people have completely different experiences, converge on the same conclusions, and then think their experiences were key. Maybe it’s good old growing up.
Epistemic status: Bayesian rant.
I don’t agree about Bayesian vs. Frequentist, in the sense that I think frequentist = complex+slow.
-
Right now, most common models can be set up Bayesianly in a probabilistic programming system like PyMC or Stan, and be fit much more comfortably than the frequentist equivalent. In particular, it’s easier and straightforward to extract uncertainties from the posterior samples.
-
When that was not the case, I still think that common models were more easily derived in the Bayesian framework, e.g., classic ANOVA up to Scheffé intervals (pages of proofs, usual contrivance of confidence intervals and multiple testing), versus doing the Bayesian version (prior x posterior = oh! it’s a Student! done.) (ref. Berger&Casella for the frequentist ANOVA)
-
If you complain “maximum likelihood easier than posterior”, I answer “Laplace approximation”, and indeed it’s metis that observed Fisher information better than expected Fisher information. In HEP with time they somehow learned empirically to plot contour curves of the likelihood. Bayes was within yourself all along.
-
If you say “IPW”, I answer IT’S NOT EFFICIENT IN FINITE SAMPLES DAMN YOU WHY ARE YOU USING IT WHY THE SAME PEOPLE WHO CORRECTLY OBSERVE IT’S NOT EFFICIENT INSIST ON USING IT I DON’T KNOW
-
Ridge regression is more simply introduced and understood as Normal prior.
-
Regularization of histogram deconvolution in HEP is more simply understood as prior. (ref. Cowan)
-
Every regularization is more simply understood as prior I guess.
-
Simulated inference is more simply understood as Approximate Bayesian Computation, and also easier. Guess which field died? (Ok, they are not really the same thing, but a SI folk I met does think ABC is the Bayesian equivalent and that Bayesians stole their limelight)
-
Random effects models+accessories are more straightforward in every aspect in the Bayesian formulation. (Poor frequentist student: “Why do I have to use the Bayesian estimator in the frequentist model for such and such quantity? Why are there all these variants of the frequentist version and each fails badly in weird ways depending on what I’m doing? Why are there two types of uncertainty around? REM or not REM? (depends, are you doing a test?) p-value or halved p-value? (depends...) Do I have to worry about multiple comparisons?”)
-
Did you know that state of the art in causal inference is Bayesian? (BART, see ACIC challenge)
-
Did you know that Bayesian tree methods blow the shit out of frequentist ones? (BART vs. random forest) And as usual it’s easier to compute the uncertainties of anything.
-
If you are frequentist the multiple comparisons problem will haunt you, or more probably you’ll stop caring. Unless you have money, in which case you’ll hire a team of experts to deal with it. (Have you heard about the pinnacle of multiple testing correction theory, graphical alpha-propagation? A very sophisticate method, indeed, worth a look.)
Maybe you can make a case that Frequentism is still time-shorter because there are no real rules so you can pull a formula out of your hat, say lo!, and compute it, but I think this is stretching it. For example, you can decide you can take the arithmetic mean of your data because yes and be done. I’d say that’s not a fair comparison because you have to be at a comparable performance level for the K/T prior to weigh in, and if you want to get right the statistical properties of arbitrary estimators, it starts getting more complicated.
-
Or what Yudkowsky calls “security mindset”, he insists so much on this on Arbital, on deploying all the options.
I understood that the argument given for (2) (gets out of distribution) came from (1) (the (1-e)^n thing), so could you restate the argument for (2) in a self-conclusive way without using (1)?
When you compute H(A,B), you sum the terms P(a)P(b)logP(a,b). I think you should be summing P(a,b)logP(a,b) instead.