Is this because they think it would hurt their reputation, or because they think Remmelt would make the program a bad experience for them?
This comment has two disagree votes, which I interpret as other people seeing the flowchart. I see it too. If it still doesn’t work for you for some reason, you can also see it here: AISC ToC Graph—Google Drawings
This makes a lot of sense to me. For some reason it reminds me of some stuart armstrong OOD-generalization work for alternative safeguarding strategies to imperfect value extrapolation? I can’t find a good link though.
I also thought it would be interesting to mention the link to the idea in linguistics that a word is specified by all the different contexts it is specified in and so a symbol is a probability distribution of contextual meaning. From the perspective of this post, wouldn’t natural language work a bit as a redundancy specifier in that case and so LLMs are more alignable than RL agents? (I don’t think I’m making a novel argument here, I just thought it would be interesting to point out.)
What is the threat model you are probing for here?
The prompting approach suffers from LLMs’ shallowness in strategic thinking; I would not be surprised if, prompting a language model several times to come up with an encoding scheme would result in a highly concentrated distribution.
> A saving grace may be that ‘detecting steg is happening at all’ seems easier than ’recovering the plaintext message’
While i agree in principle, as in all red/blue team interactions, you may not want your blue team to be trigger happy; you still want to be able to extract some usefulness out of your scheming models, thus making this hope somewhat less robust. This would be a cool research question nonetheless.
> Both of these [steganography inducing methods] involve finetuning the model
Note that in Hidden in Plain Text, we also work with in-context RL, which is not explicitly finetuning the model, but just giving rewards to rollouts in context.
Yes, my conclusion is “it’s hopeless”.
(NLP assumes that you could reverse-engineer someone’s thought processes by observing their eye movements. That looking in one direction means “the person is trying to remember something they saw”, looking in another direction means “the person is trying to listen to their inner voice”, etc., you get like five or six categories. And when you listen to people talking, by their choice of words you can find out whether they are “visual” or “auditive” or “kinesthetic” type. So if you put these two things together, you get a recipe like “first think about a memory that includes some bodily feelings, then turn on your auditive imagination, then briefly switch to visual imagination, then back to auditive, then write it down”. They believe that this is all you need. I believe that it misses… well, all the important details.)
I have heard from many people near AI Safety camp that they also have judged AI safety camp to have gotten worse as a result of this.
Hm. This does give me serious pause. I think I’m pretty close to the camps but I haven’t heard this. If you’d be willing to share some of what’s been relayed to you here or privately, that might change my decision. But what I’ve seen of the recent camps still just seemed very obviously good to me?
I don’t think Remmelt has gone more crank on the margin since I interacted with him in AISC6. I thought AISC6 was fantastic and everything I’ve heard about the camps since then still seemed pretty great.I am somewhat worried about how it’ll do without Linda. But I think there’s a good shot Robert can fill the gap. I know he has good technical knowledge, and from what I hear integrating him as an organiser seems to have worked well. My edition didn’t have Linda as organiser either.
I think I’d rather support this again than hope something even better will come along to replace it when it dies. Value is fragile.
Eternity can seem kinda terrifying.
A lifeist doesn’t say “You must decide now to live literally forever no matter what happens.”!
OK, I guess I got some assumption wrong, but please explain to me which one.
people use dating apps such as OkCupid with the intention of finding a potential sexual partner (as opposed to e.g. trying to find a platonic friend)
if someone is looking for a potential sexual partner, and finds someone such that the idea of having sex with him feels disgusting, she swipes left or whatever is the UI action for “go away” (as opposed to keeping the contact just in case the feeling might change in future)
I think that section “You are simpler than Microsoft Word” is just plain wrong, because it assumes one UTM. But Kolmogorov complexity is defined only up to the choice of UTM.
Genome is only as simple as it is allowed by the rest of cell mechanism, like ribosomal decoding mechanism and protein folding. Humans are simple only relative to space of all possible organisms that can be built on Earth biochemistry. Conversively, Word is complex only relatively to all sets of x86 processor instructions or all sets of C programs, or whatever you used for definition of Word size. To properly compare complexity of both things, you need to translate from one language to another. How large should be genome of organism capable to run Word? It seems reasonable that simulation of human organism up to nucleotides will be very large if we write it in C, and I think genome of organism capable to run Word just as good as modern PC will be much larger than human genome.
Can frontier language models engage in collusion for steganography? Here is a write-up of a preliminary result along these lines, showing that Deepseek-v3 may be able to collude with other instances of itself to do steganography. And also that this steganography might be more subtle than we think.
Epistemic status; highly uncertain (and I’m sorry if this ends up being overclaiming, but I’m very excited at the moment).
Slide deck: https://docs.google.com/presentation/d/1JASDLDlGZcoHwHQRjOJOcrKRp7G0_eeyfzkCLsR7m7E/edit?usp=sharing
Other (more compelling to me) reasons for being a “deathist”:
Eternity can seem kinda terrifying.
In particular, death is insurance against the worst outcomes lasting forever. Things will always return to neutral eventually and stay there.
In what sense were you lifeist and now deathist? Why the change?
Yeah, you will get much better fills if you walk your options limit orders (manually or automatically, see here for an example of an automatic implementation: https://www.schwab.com/content/how-to-place-walk-limit-order). Market makers will often fill your nearly-mid-market limit orders within seconds.
The systematic reviews of randomized controlled trials also say that vegetables and fruits (and, generally, minimally processed whole plant foods) are good for reducing all-cause mortality and heart disease. It’s not just the cohort studies. (1, 2)
The large cohort studies are great because they have large sample sizes, look at people living in normal conditions, and look at the most important outcomes (all-cause mortality, heart disease) over long periods of time.
The randomized controlled trials are great because they address confounding, directly. They’re smaller, the controlled feeding studies usually look at people in unusual situations, like living in wards, and they look at shorter periods of time. But they address confounding, which is a huge benefit. Causal inference methods can in principle address confounding too, but that is more difficult and we are not at that level.
In practice, systematic reviews of randomized controlled trials and systematic reviews of large cohort studies reach mostly the same conclusions. I would agree with not believing claims solely based on correlational studies, but that is not necessary since the randomized trials exist.
I do buy that people have various specific nutritional requirements, and that not eating vegetables and fruits means you risk having deficits in various places. The same is true of basically any exclusionary diet chosen for whatever reason, and especially true for e.g. vegans.
In practice, the only thing that seems to be an actual issue is fiber.
Saturated fat is a big issue! It increases your risk of heart disease, and that’s one of the most important causes of death. Nutrition researchers and physicians and cardiologists have been agreeing on this for 20–25 years and there are systematic reviews of controlled trials. Plant foods other than coconut just have a lot less saturated fat than non-plant foods.
Another big issue is the nutrient density (density relative to energy), particularly in underconsumed nutrients. The minimally processed whole plant foods are more dense overall, relative to energy, in the essential vitamins, minerals, omega−3 and omega−6, dietary fiber, relative to the dietary reference intakes. Food composition databases don’t account for bioavailability, but nevertheless. This is especially true for vegetables rather than nuts, fruits, and grain. If you think eggs have high nutrient density, try beating mushrooms, spinach, broccoli—the most nutrient-dense foods are plant foods. This means plant foods allow you to construct a better diet, even if it doesn’t happen automatically.
The NASEM’s adequate intake for fiber is higher than the recommendations of countries outside North America, yet the dose-response curves just show the health benefits for dietary fiber continuing to get higher until the values are high enough that there is no data to know whether they continue going up. Fiber consumption is low even relative to the lower recommendations.
Protein quality doesn’t seem to be an important concern. All plant foods contain all the essential amino acids. Yes, protein quality is lower overall than in typical animal foods, but that matters little because it’s trivial to meet the recommended dietary allowances for all the essential amino acids by eating beans or tofu, and the systematic reviews support plant protein having better health outcomes than animal protein (the systematic reviews tell you what actually matters, and apparently what actually matters is reducing TMAO levels and saturated fat).
The main actual issues with minimally processed whole plant foods are vitamins D and B12. D can technically be managed with UV-irradiated mushrooms, but they are expensive, so supplementing D is the more likely choice. B12 is not solvable. It is present (and bioavailable) in mushrooms but the quantities are small, so either supplementing B12 or adding seafood (likely the least bad animal food) is necessary.
These lines of argument do have the problem that our health decisions should be based primarily on systematic reviews of dietary patterns, rather than individual foods or individual nutrients, and they should be based on health outcomes and especially the most important ones (all-cause mortality, heart disease, cancer).
Government Food Labels Are Often Obvious Nonsense
Yes, you can get more detailed and accurate data using Cronometer, or foodb.ca for the missing nutrients (and way too much information). Thankfully, most minimally processed foods are covered, so the inaccurate FDA nutrition facts can be ignored. Classification systems like Nova and Nutri-Score also don’t seem worthwhile to me due to inaccuracy and oversimplification.
This may be not factually true, btw, - current LLMs can create good models of past people without running past simulation of their previous life explicitly.
Yup, I agree.
It is a variant of Doomsday argument. This idea is even more controversial than simulation argument. There is no future with many people in it.
This makes my counterargument even stronger! The counterargument says that if a Friendly AI has no issues with simulating conscious beings in general, then it would (on expectations) simulate more observers in blissful worlds than in worlds like ours.
If the Doomsday Argument tells us that Friendly AI didn’t simulate more observers in blissful worlds than in worlds like ours, then that gives us even more reasons to think that we are not being simulated by a Friendly AI in the way that you have described.
I really like the idea of finding steering vectors that maximize downstream differences, and I have a few follow-up questions.
Have you tried/considered modifying c_fc (the MLP encoder layer) bias instead of c_proj (the MLP decoder layer) bias? I don’t know about this context, but (i) c_fc makes more intuitive sense as a location to change for me, (ii) I have seen more success playing with it in the past than c_proj, and (iii) they are not-equivalent because of the non-linearity between them.
I like how you control for radius by projecting gradients onto the tangent space and projecting the steering vector of the sphere, but have you tried using cosine distance as the loss function so there is less incentive for R to naturally blow up? Let in .
When you do iterative search for next steering vectors, I do not expect that constraining the search to an orthogonal subspace to previously found steering vectors to be very helpful, since the orthogonal vectors might very well be mapped into the same downstream part of latent space. Since the memory demands are quite cheap for learning steering vectors, I would be interested in seeing an objective which learned a matrix of steering vectors simultaneously, maximizing the sum of pairwise distances. Suppose we are learning vectors simultaneously.
But this form of the objective makes it more transparent that a natural solution is to make each steering vector turn the output into gibberish (unless the LM latent space treats all gibberish alike, which I admit is possible). So maybe we would want a tunable term which encourages staying close to the unsteered activations, while staying far from the other steered activations.Lastly, I would be interested in seeing the final output probability distribution over tokens instead of using KL for the distance, since in that domain we can extract very fine grained information from the model’s activations. Let in
Not sure how representative your guess is of most dating app users. Certainly isn’t the case for me.
Yeah. I think the part of the DNA specifying the brain is comparable to something like the training algorithm + initial weights of an LLM. I don’t know how much space those would take if compressed, but probably very little, with the resulting model being much bigger than that. (And the model is in turn much smaller than the set of training data that went into creating it.)
Page 79-80 of the Whole Brain Emulation roadmap gave estimated storage requirements for uploading a human brain. The estimate depends on what we expect to be the scale on which the brain needs to be emulated. Workshop consensus at the time was that the most likely scale would be level 4-6 (see p. 13-14). This would put the storage requirements somewhere between 8000 and 1 million terabytes.
I vouch for Robert as a good replacement for me.
Hopefully there is enough funding to onboard a third person for next camp. Running AISC at the current scale is a three person job. But I need to take a break from organising.