contact: jurkovich.nikola@gmail.com
nikola
Some quick thoughts after reading the paper:
The training procedure they used seems to me analogous to what would happen if a human tried to solve problems using different approaches, then learned based on what approaches converged on the same answer.
Due to the fact that no external information is being added (aside from the prompting), and that the model updates based on majority voting, this seems like it takes a network whose model of the world is very inconsistent, and forces the network to make its model of the world more consistent, leading to improvements in performance.
One assumption here is that, if you start off with a world model that’s vaguely describing the real world, and force consistency on it, it will become a more accurate world model. I think this is very likely to be true.
My weak conclusions are:
Curated data for fine-tuning is now less of a bottleneck, as human-made tailored data (made by MTurk or undergrads) can be partly replaced with data that the network outputs (after training it on a large corpus).
Compute also seems less of a bottleneck as “self-improvement” leads to an order of magnitude fewer parameters needed for the same performance.
These two (plus the incoming wave of people trying to replicate or improve on the methods in the paper) would imply slightly shorter timelines, and much shorter timelines in worlds where most of the data bottleneck is in data for finetuning.
This might be good for alignment (ignoring timelines getting shorter) as Chain-of-Thought reasoning is more easily interpretable, and if we can actually manage to force a model to do chain of thought reasoning and have it match up with what it’s outputting, this would be a big win.
GPT-3 was horrible at Morse code. GPT-4 can do it mostly well. I wonder what other tasks GPT-3 was horrible at that GPT-4 does much better.
Great idea! The web requests file doesn’t seem to be read by the LLM during boot-up (and neither does any other file on the system), instead it’s a process run on the Linux machine which wraps around the python processes the LLM has access to. The LLM can only interact with the VM through this web request handler. It’s read-only so can’t be modified so that they’re handled differently when you boot up another chat session on the same VM. Regarding your last point, I haven’t noticed any files surviving a VM reset so far.
You can directly examine its code and the output of its Python scripts.
Yup, to be clear, I never actually directly accessed the code interpreter’s prompt, so GPT-4′s claims about constraints could be (and I expect at least a third of them to be) hallucinated
I’m not sure what you’re referring to. The “three times the compute for GPT-4” claim places their compute at 6.7e17, which is lower than the estimates from the other two methods.
You’re right! Corrected, thanks :)
Exciting! Maybe a good way to filter for models taking too many hints is to obfuscate the information about how to escape such that the model needs to either do multi-step reasoning, or an actual multi-step process, to actually get that information.
For multi-step-reasoning, possibly one could encode the escape password such that the model needs to do CoT to retrieve the original one.
For a multi-step process, one could put the model in a text-based-adventure setup for instance, and put the escape password in an inaccessible place and see if the model tries to retrieve it. Maybe have the model think it’s embodied in the real world, and see if it attempts to find a sticky note with the password written on it that’s hidden in a desk drawer.
Ideally, the setup would be such that there are very few hints for the model to do the misaligned things, kind of like the work Appolo has done.
I think that in general, there aren’t many examples of large portions of a large company suddenly switching what they’re working on (on a timescale of days/weeks), and this seems pretty hard to pull off without very strong forces in play.
I guess some examples are how many companies had to shift their operations around a lot at the start of COVID, but this was very overdetermined, as the alternative was losing a lot of their profits.
For AGI labs, if given a situation where they’re uncertain if they should pause, it’s less clear that they could rally large parts of their workforce to suddenly work on safety. I think planning for this scenario seems very good, including possibly having every employee not just have their normal role but also a “pause role”, that is, a research project/team that they expect to join in case of a pause.
However, detailed planning for a pause is probably pretty hard, as the types of work you want to shift people to probably changes depending on what caused the pause.
I agree with the broader claim that as AGI approaches, governments are likely to intervene drastically to deal with national security threats.
However, I’m not so sure about the “therefore a global arms race will start” claim. I think it’s pretty plausible that if the US or UK are the first to approach AGI, that they would come to their senses and institute a global pause instead of spearheading an arms race. Although maybe that’s wishful thinking on my part.
I expect some people in the government to be like “wait, if a global arms race starts this is likely to end in catastrophe” and advocate for a pause instead. I think the US would be pretty happy with an enforcable pause if this meant it got to maintain a slight lead. I’d hope that (pause+slight lead) would be much more inticing than (race+large lead) given the catastrophic risk associated with the latter.
Agreed. AGI labs should probably look into buying back their shares from employees to fix this retroactively.
I often accidentally mix you up with the Trevor from Open Phil! More differentiation would be great, especially in the case where people share the same first name.
Yann LeCun, on the other hand, shows us that when he says ‘open source everything’ he is at least consistent?
Yann LeCun: Only a small number of book authors make significant money from book sales. This seems to suggest that most books should be freely available for download. The lost revenue for authors would be small, and the benefits to society large by comparison.
That’s right. He thinks that if you write a book that isn’t a huge hit that means we should make it available for free and give you nothing.
I think this representation of LeCun’s beliefs is not very accurate. He clarified his (possibly partly revised) take in multiple follow up tweets posted Jan 1 and Jan 2.
The clarified take (paraphrased by me) is something more like “For a person that expects not to make much from sales, the extra exposure from making it free can make up for the lack of sales later on” and “the social benefits of making information freely available sometimes outweigh the personal costs of not making a few hundred/thousand bucks off of that information”.
EDIT: as Ryan helpfully points out in the replies, the patent I refer to is actually about OpenAI’s earlier work, and thus shouldn’t be much of an update for anything.
Note that OpenAI has applied for apatentwhich, to my understanding, is about using a video generation model as a backbone for an agent that can interact with a computer. They describe theirtraining pipeline as something roughly like:Start with unlabeled video data (“receiving labeled digital video data;”)Train an ML model to label the video data (“training a first machine learning model including an inverse dynamics model (IDM) using the labeled digital video data”)Then, train a new model to generate video (“further training the first machine learning model or a second machine learning model using the pseudo-labeled digital video data to generate at least one additional pseudo-label for the unlabeled digital video.”)Then, train the video generation model to predict actions (keyboard/mouse clicks) a user is taking from video of a PC (“2. The method of claim 1, wherein the IDM or machine learning model is trained to generate one or more predicted actions to be performed via a user interface without human intervention. [...] 4. The method of claim 2, wherein the one or more predicted actions generated include at least one of a key press, a button press, a touchscreen input, a joystick movement, a mouse click, a scroll wheel movement, or a mouse movement.’)
Now you have a model which can predict what actions to take given a recording of a computer monitor!They even specifically mention the keyboard overlay setup you describe:11. The method of claim 1, wherein the labeled digital video data comprises timestep data paired with user interface action data.If you haven’t seen the patent (to my knowledge, basically no-one on LessWrong has?) then you get lots of Bayes points!I might be reading too much into the patent, but it seems to me that Sora is exactly the first half of the training setup described in that patent. So I would assume they’ll soon start working on the second half, which is the actual agent (if they haven’t already).I think Sora is probably (the precursor of) a foundation model for an agent with a world model. I actually noticed this patent a few hours before Sora was announced, and I had the rough thought of “Oh wow, if OpenAI releases a video model, I’d probably think that agents were coming soon”. And a few hours later Sora comes out.Interestingly, the patent contains information about hardware for running agents. I’m not sure how patents work and how much this actually implies OpenAI wants to build hardware, but sure is interesting that this is in there:13. A system comprising:at least one memory storing instructions;at least one processor configured to execute the instructions to perform operations for training a machine learning model to perform automated actions,
Thanks a lot for the correction! Edited my comment.
And yet we haven’t hit the singularity yet (90%)
AIs are only capable of doing tasks that took 1-10 hours in 2024 (60%)
To me, these two are kind of hard to reconcile. Once we have AI doing 10 hour tasks (especially in AGI labs), the rate at which work gets done by the employees will probably be at least 5x of what it is today. How hard is it to hit the singularity after that point? I certainly don’t think it’s less than 15% likely to happen within the months or years after this happens.
Also, keep in mind that the capabilities of internal models will be higher than the capabilities of deployed models. So by the time we have 1-10 hour models deployed in the world, the AGI labs might have 10-100 hour models.
I think that, ignoring pauses or government intervention, the point at which AGI labs internally have AIs that are capable of doing 10 hours of R&D related tasks (software engineering, running experiments, analyzing data, etc.), then the amount of effective cognitive labor per unit time being put into AI research will probably go up by at least 5x compared to current rates.
Imagine the current AGI capabilities employee’s typical work day. Now imagine they had an army of AI assisstants that can very quickly do 10 hours worth of their own labor. How much more productive is that employee compared to their current state? I’d guess at least 5x. See section 6 of Tom Davidson’s takeoff speeds framework for a model.
That means by 1 year after this point, an equivalent of at least 5 years of labor will have been put into AGI capabilities research. Physical bottlenecks still exist, but is it really that implausible that the capabilities workforce would stumble upon huge algorithmic efficiency improvements? Recall that current algorithms are much less efficient than the human brain. There’s lots of room to go.The modal scenario I imagine for a 10-hour-AI scenario is that once such an AI is available internally, the AGI lab uses it to speed up its workforce by many times. That sped up workforce soon (within 1 year) achieves algorithmic improvements which put AGI within reach. The main thing stopping them from reaching AGI in this scenario would be a voluntary pause or government intervention.
I think that AIs will be able to do 10 hours of research (at the level of a software engineer that gets paid $100k a year) within 4 years with 50% probability.
If we look at current systems, there’s not much indication that AI agents will be superhuman in non-AI-research tasks and subhuman in AI research tasks. One of the most productive uses of AI so far has been in helping software engineers code better, so I’d wager AI assistants will be even more helpful for AI research than for other things (compared to some prior based on those task’s “inherent difficulties”). Additionally, AI agents can do some basic coding using proper codebases and projects, so I think scaffolded GPT-5 or GPT-6 will likely be able to do much more than GPT-4.
Note that AI doesn’t need to come up with original research ideas or do much original thinking to speed up research by a bunch. Even if it speeds up the menial labor of writing code, running experiments, and doing basic data analysis at scale, if you free up 80% of your researchers’ time, your researchers can now spend all of their time doing the important task, which means overall cognitive labor is 5x faster. This is ignoring effects from using your excess schlep-labor to trade against non-schlep labor leading to even greater gains in efficiency.
[MENTOR] I just finished high school last year so my primary intended audience are probably people who are still in high school. Reach out if you’re interested in any of these:
competitive physics
applying to US colleges from outside the US and Getting Into World-Class Universities (undergrad)
navigating high school effectively (self-study, prioritization)
robotics (Arduino, 3D printing)
animation in manim (the Python library by 3blue1brown) basics
lucid dreaming basics
[APPRENTICE] Navigating college effectively (deciding what to aim for and how to balance time commitments while wasting as little time as possible). I don’t know how much I should care about grades, which courses I should take, or how much I should follow the default path for someone in college. I’m aiming to maximize my positive impact on the long-term future. A message or short call with someone who has (mostly) finished college would be great!
email in bio