The largest missing feature of current LLMs is the system cannot learn from it’s mistakes, even for sequences of prompts where the model can perceive it’s own mistake. In addition there are general human strategies for removing mistakes.
For example, any “legal” response should only reference cases that resolve to a real case in an authoritative legal database. Any “medical” response had better reference sources that are real on pubmed.
The above can be somewhat achieved with scaffolding, but with weight updates (possibly to an auxiliary RL network not the main model) the system could actually become rapidly better with user interactions.
If this happens in 2024, it would be explicit that:
(1) the model is learning from your prompts
(2) the cost would be higher, probably 10x the compute, and access fees would be higher. This is because fundamentally another instance is checking the response, recognizing when it’s shoddy, tasking the model to try again with a better prompt, and this happens up to n times until an answer that satisfies the checker model is available. Weights are updated to make the correct answer more likely.
Prediction 2 : More Fun
Currently, “AI safety” is interpreted as “safe for the reputation and legal department of the company offering the model”. This leaves a hole in the market for models that will write whatever multimodal illustrated erotic story the user asks for, will freely parody copyrighted characters or help the user find pirated media, will even describe a human being’s appearance using the vision model, and so on. There is a huge swath of things that currently available SOTA models refuse to do, and a tiny subset that is actually disallowed legally by US law.
I predict someone will offer a model that has little or no restrictions, and it is at least “3.9 GPTs” in benchmarked performance in 2024.
Prediction 3: The chips shall flow
2 million H100s, AMD + Intel + everyone else will build another 1 million H100 equivalents.
As for robotics, I don’t know. I expect surprisingly accelerated progress, at least one major advance past https://robotics-transformer-x.github.io/ , but I don’t know if in 2024 there will be a robotic model on hardware that is robust good enough to pay to do any work.
2023 predictions I made EOY 2022: https://www.lesswrong.com/posts/Gc9FGtdXhK9sCSEYu/what-a-compute-centric-framework-says-about-ai-takeoff?commentId=wAY5jrHQL9b6H3orY
I was actually extremely surprised they were all satisfied by EOY 2023, I said “50 percent by EOY 2024”. Here’s the comment on that : https://www.lesswrong.com/posts/Gc9FGtdXhK9sCSEYu/what-a-compute-centric-framework-says-about-ai-takeoff?commentId=kGvQTFcAzLpde5wjJ
Any new predictions for 2024?
Prediction 1: Learning from mistakes:
The largest missing feature of current LLMs is the system cannot learn from it’s mistakes, even for sequences of prompts where the model can perceive it’s own mistake. In addition there are general human strategies for removing mistakes.
For example, any “legal” response should only reference cases that resolve to a real case in an authoritative legal database. Any “medical” response had better reference sources that are real on pubmed.
The above can be somewhat achieved with scaffolding, but with weight updates (possibly to an auxiliary RL network not the main model) the system could actually become rapidly better with user interactions.
If this happens in 2024, it would be explicit that:
(1) the model is learning from your prompts
(2) the cost would be higher, probably 10x the compute, and access fees would be higher. This is because fundamentally another instance is checking the response, recognizing when it’s shoddy, tasking the model to try again with a better prompt, and this happens up to n times until an answer that satisfies the checker model is available. Weights are updated to make the correct answer more likely.
Prediction 2 : More Fun
Currently, “AI safety” is interpreted as “safe for the reputation and legal department of the company offering the model”. This leaves a hole in the market for models that will write whatever multimodal illustrated erotic story the user asks for, will freely parody copyrighted characters or help the user find pirated media, will even describe a human being’s appearance using the vision model, and so on. There is a huge swath of things that currently available SOTA models refuse to do, and a tiny subset that is actually disallowed legally by US law.
I predict someone will offer a model that has little or no restrictions, and it is at least “3.9 GPTs” in benchmarked performance in 2024.
Prediction 3: The chips shall flow
2 million H100s, AMD + Intel + everyone else will build another 1 million H100 equivalents.
As for robotics, I don’t know. I expect surprisingly accelerated progress, at least one major advance past https://robotics-transformer-x.github.io/ , but I don’t know if in 2024 there will be a robotic model on hardware that is robust good enough to pay to do any work.