Your point b) seems like it should also make you somewhat sceptical of any of this accelerating AI capabilities, unless you belief that capabilities-focused actors would change their actions based on forecasts, while safety-focused actors wouldn’t. Obviously, this is a matter of degree, and it could be the case that the same amount of action-changing by both actors still leads to worse outcomes.
I think that if OpenAI unveiled GPT4 and it did not perform noticeably better than GPT3 despite a lot more parameters, that would be a somewhat important update. And it seems like a similar kind of update could be produced by well-conducted research on scaling laws for complexity.
Most recent large safety projects seem to be focused on language models. So in case the evidence pointed towards problem complexity not mattering that much, I would expect the shift in prioritization towards more RL-safety research to outweigh the effect on capability improvements (especially for the small version of the project, about which larger actors might not care that much). I am also sceptical whether the capabilities of the safety community are in fact increasing exponentially.
I am also confused about the resources/reputation framing. To me this is a lot more about making better predictions when we will get to transformative AI, and how this AI might work, such that we can use the available resources as efficiently as possible by prioritizing the right kind of work and hedging for different scenarios to an appropriate degree. This is particularly true for the scenario where complexity matters a lot (which I find overwhelmingly likely), in which too much focus on very short timelines might be somewhat costly (obviously none of these experiements can remotely rule out short timelines, but I do expect that they could attenuate how much people update on the XLand results).
Still, I do agree that it might make sense to publish any results on this somewhat cautiously.
I agree that switching the simulator could be useful where feasible (you’d need another simulator with compatible state- and action-spaces and somewhat similar dynamics.)
It indeed seems pretty plausible that instructions will be given in natural language in the future. However, I am not sure that would affect scaling very much, so I’d focus scaling experiments on the simpler case without NLP for which learning has already been shown to work.
IIRC, transformers can be quite difficult to get to work in an RL setting. Perhaps this is different for PIO, but I cannot find any statements about this in the paper you link.
I guess finetuning a model to produce truthful statements directly is nontrivial (especially without a discriminator model) because there are many possible truthful and many possible false responses to a question?
Oh, right; I seemed to have confused Gibbard-Satterthwaite with Arrow.
Do you know whether there are other extensions of Arrow’s theorem to single-winner elections? Having a voting method return a full ranking of alternatives does not appear to be super important in practice...
Doesn’t Gibbard’s theorem retain most of Arrow’s bite?
Re neural networks: All one billion parameter networks should be computable in polynomial time, but there exist functions that are not expressible by a one billion parameter network (perhaps unless you allow for an arbitrary choice of nonlinearity)
“If the prices do not converge, then they must oscillate infinitely around some point. A trader could exploit the logical inductor by buying the sentence at a high point on the oscillation and selling at a low one.”
I know that this is an informal summary, but I don’t find this point intuitively convincing. Wouldn’t the trader also need to be able to predict the oscillation?
If I understood correctly, the model was trained in Chinese and probably quite expensive to train.
Do you know whether these Chinese models usually get “translated” to English, or whether there is a “fair” way of comparing models that were (mainly) trained on different languages (I’d imagine that even the tokenization might be quite different for Chinese)?
I don’t really know a lot about performance metrics for language models. Is there a good reason for believing that LAMBADA scores should be comparable for different languages?
“This desiderata is often difficult to reconcile with clear scoring, since complexity in forecasts generally requires complexity in scoring.”
Can you elaborate on this? In some sense, log-scoring is simple and can be applied to very complex distributions; Are you saying that the this would still be “complex scoring” because the complex forecast needs to be evaluated, or is your point about something different?
Partial resolution could also help with getting some partial signal on long term forecasts.
In particular, if we know that a forecasting target is growing monotonously over time (like “date at which X happens” or “cumulative number of X before a specified date”), we can split P(outcome=T) into P(outcome>lower bound)*P(outcome=T|outcome>lower bound). If we use log scoring, we then get log(P(outcome>lower bound)) as an upper bound on the score.
If forecasts came in the form of more detailed models, it should be possible to use a similar approach to calculate bounds based on conditioning on more complicated events as well.
I don’t know what performance measure is used to select superforecasters, but updating frequently seems to usually improve your accuracy score on GJopen as well (see “Activity Loading” inthis thread on the EA forum. )
“Beginners in college-level math would learn about functions, the basics of linear systems, and the difference between quantitative and qualitative data, all at the same time.”
This seems to be the standard approach for undergraduate-level mathematics at university, at least in Europe.
Makes sense, I was thinking about rewards as function of the next state rather than the current one.
I can stil imagine that things will still work if we replace the difference in Q-values by the difference in the values of the autoencoded next state. If that was true, this would a) affect my interpretation of the results and b) potentially make it easier to answer your open questions by providing a simplified version of the problem.
Edit: I guess the “Chaos unfolds over time” property of the safelife environment makes it unlikely that this would work?
I’m curious whether AUP or the autencoder/random projection does more work here. Did you test how well AUP and AUP_proj with a discount factor of 0 for the AUP Q-functions do?
“So if you wouldn’t sacrifice >0.01AUC for the sake of what a human thinks is the “reasonable” explanation to a problem, in the above thought experiment, then why sacrifice unknown amounts of lost accuracy for the sake of explainability?” You could think of explainability as some form of regularization to reduce overfitting (to the test set).
“Overall, access to the AI strongly improved the subjects’ accuracy from below 50% to around 70%, which was further boosted to a value slightly below the AI’s accuracy of 75% when users also saw explanations. “But this seems to be a function of the AI system’s actual performance, the human’s expectations of said performance, as well as the human’s baseline performance. So I’d expect it to vary a lot between tasks and with different systems.
“My own guess is that humans are capable of surviving far more severe climate shifts than those projected in nuclear winter scenarios. Humans are more robust than most any other mammal to drastic changes in temperature, as evidenced by our global range, even in pre-historic times”
I think it is worth noting that the speed of climate shifts might play an important role, as a lot of human adaptability seems to rely on gradual cultural evolution. While modern information technology has greatly sped up the potential for cultural evolution, I am unsure if these speedups are robust to a full-scale nuclear war.