B.Eng (Mechatronics)
anithite
Note: This is an example of how to do the bad thing (extensive RL fine tuning/training). If you do it the result may be misalignment, killing you/everyone.
To name one good example that is very relevant, programming, specifically having the AI complete easy to verify small tasks.The general pattern is to take existing horribly bloated software/data and extract useful subproblems from it. (EG:find the parts of this code that are taking the most time) and then turn those into problems for the AI to solve(eg: here is a function + examples of it being called, make it faster). Ground truth metrics would be simple things that are easy to measure (EG:execution time, code quality/smallness, code coverage, is the output the same?) and then credit assignment for sub-task usefulness can be handled by an expected value estimator trained on that ground truth as is done in traditional game playing RL. Possibly it’s just one AI with different prompts.
Basically Microsoft takes all the repositories on GitHub that build sucessfully and have some unit tests, and builds an AI augmented pipeline to extract problems from that software. Alternatively, a large company that runs lots of code takes snapshots + IO traces of production machines, and derives examples from that. You need code in the wild doing it’s thing.
Some example sub-tasks in the domain of software engineering:
make a piece of code faster
make this pile of code smaller
is f(x)==g(x)? If not find a counterexample (useful for grading the above)
find a vulnerability and write an exploit.
fix the bug while preserving functionality
identify invariants/data structures/patterns in memory (EG:linked lists, reference counts)
useful as a building block for further tasks (EG:finding use after free bugs)
GPT-4 can already use a debugger to solve a dead simple reverse engineering problem albeit stupidly[1] https://arxiv.org/pdf/2303.12712.pdf#page=119
Larger problems could be approached by identifying useful instrumental subgoals once the model can actually perform them reliably.
The finished system should be able to extend shoggoth tentacles into a given computer, identify what that computer is doing and make it do it better or differently.
The finished system might be able to extend shoggoth tentacles into other things too! (EG:embedded systems, FPGAs) Capability limitations would stem from the need for fast feedback so software, electronics and programmable hardware should be solvable. For other domains, simulation can help(limited by simulation fidelity and goodharting). The eventual result is a general purpose engineering AI.
Tasks heavily dependent on human judgement (EG:is this a good book? Is this action immoral) have obviously terrible feedback cost/latency and so scale poorly. This is a problem if we want the AI to not do things a human would disapprove of.
- ^
RL training could lead to a less grotesque solution. IE:just read the password from memory using the debugger rather than writing a program to repeatedly run the executable and brute force the password.
First problem, A lot of future gains may come from RL style self play (IE:let the AI play around solving open ended problems) That’s not safe in the way you outline above.
The other standard objection is that even if the initial AGI is safe people will do their best to jailbreak the hell out of that safety and they will succeed.
That’s a problem when put together with selection pressure for bad agentic AGIs (since they can use sociopathic strategies good AGIs will not use like scamming, hacking, violence etc.). (IE:natural selection goes to work and the results blow up in our face)
Short of imposing very stringent unnatural selection on the initial AGIs to come, the default outcome is something nasty emerging. Do you trust the AGI to stay aligned when faced with all the bad actors out there?
Note:my P(doom)=30% (P(~doom) depends on either a good AGI executing one of the immoral strategies to pre-empt a bad AGI (50%) or maybe somehow scaling just fixes alignment(20%))
if it fails before you top out the scaling I think you probably lose
While I agree that arbitrary scaling is dangerous, stopping early is an option. Near human AGI need not transition to ASI until the relevant notKillEveryone problems have been solved.
The meat is all in 1) how you identify the core of human values and 2) how you identify which experiences will change the system to have less of the initial good values, but, like, figuring out the two of those would actually solve the problem!
The alignment strategy seems to be “what we’re doing right now” which is:
feed the base model human generated training data
apply RL type stuff (RLHF,RLAIF,etc.) to reinforce the good type of internet learned behavior patterns
This could definitely fail eventually if RLAIF style self-improvement is allowed to go on long enough but crucially, especially with RLAIF and other strategies that set the AI to training itself, there’s a scalable mostly aligned intelligence right there that can help. We’re not trying to safely align a demon so much as avoid getting to “demon” from a the somewhat aligned thing we have now.
Many twitter posts get deleted or are not visible due to privacy settings. Some solution for persistently archiving tweets as seen would be great.
One possible realisation would be an in browser script to turn a chunk of twitter into a static HTML file including all text and maybe the images. Possibly auto upload to a server for hosting and then spit out the corresponding link.
Copyright could be pragmatically ignored via self hosting. A single author hosting a few thousand tweets+context off a personal amazon S3 bucket or similar isn’t a litigation/takedown target. Storage/Hosting costs aren’t likely to be that bad given this is essentially static website hosting.
First off, take a look at some of the numbers:
https://bitinfocharts.com/comparison/fee_to_reward-btc.html#alltimeDefining “work”
Will the bitcoin network shut down? No, not unless there’s some sort of global disaster. The question is what happens to the hashrate, transaction fees and general guarantees about being able to get transactions included in the blockchain, which affects the lightning network.
Current miner compensation is overwhelmingly the 6.25 BTC block reward. Fees are 2-10% of that. But consider that an average block transacts perhaps 5K BTC. Bumping current fees by 10x is plausible and there’s some past precedent. End result is 5$US fee per transaction giving miners about half the original revenue. This would be fine in a scenario where total hashpower drops by 2x or something. Current electricity use is 200M$ per day. Is that actually necessary?
Flat fees might be a problem though. Median transaction value is 300$ whereas average is much much higher. If everyone below $300 balks at a $5 fee then you lose half the fees. The obvious answer is flat+% or some other fee>F(value) requirement(IE:price discrimination). It’s a coordination problem under conditions of perfect information between a small number of pools to find a revenue maximizing price discrimination curve. A single large pool moving to flat+% tips things in that direction. Transactions that don’t comply with the % fee have their expected delay to first confirm distribution scaled by a factor of (Total_Hashpower/Flat_fee_hashpower) So if a large pool with 1/3rd of global hash power moves to the new fee model, large transactions with small fees see 50% higher time to first confirm. It should be pretty easy to tip the equilibrium in that direction. There will be holdouts but I expect (P=90%) that this will happen in some form.
TL:DR:expect higher fees and especially for large transactions
2.) Will lightning ever work?
To be practical, lightning has to be safe, cause people have to lock their funds into channels to use it. Fallback channel resolution requires miners to honestly prioritise transactions with higher version numbers which they might not do. A single dishonest miner can steal funds (p=fractional hashpower at critical block). Blockchains with turing complete smart contracts can do dispute resolution properly guaranteeing correct resolution even if most miners are censoring transactions.
Bitcoin can’t implement that logic (yet) but maybe some development can get something to sort of work. I don’t expect to see adoption given current tech. Other blockchains seem like a better option both for their turing complete smart contracts and lower transaction costs.
Watchtowers aren’t a problem. The economic model for those is fine. If they don’t do their job they lose their reputation and that’s easy to test.
3.) Can deflationary money work?
People sell houses right? Bitcoins can’t be eaten or lived in and so will eventually be sold. If this is a serious issue that hurts the Value Proposition --> price goes down or rises more slowly --> other assets take market share for value storage --> problem solved. There’s definitely transaction volume so it doesn’t seem like a problem.
The current “AI takes over the world” arguments involve actions some might consider magical.
Recursive self improvement
AI is smarter than domain experts in some field (hacking, persuasion etc.)
Mysterious process makes AI evil by default
I’m arguing none of that is strictly necessary. A human level AI that follows the playbook above is a real threat and can be produced by feeding a GPT-N base model the right prompt.
This cuts through a lot of the “but how will the AI get out of the computer and into the real world? Why would it be evil in the first place?” follow up counterarguments. The fundamental argument I’m making is that the ability to scale evil by applying more compute is enough.
Concretely, one lonely person talks to a smart LLM instantiated agent that can code, said agent writes a simple API calling program to think independently of the chat, agent then bootstraps real capabilities with enough API credits and wreaks havoc. All it takes is paying enough for API credits to initially bootstrap some real world capabilities then resources can be acquired to take real, significant actions in the world.
Testable prediction: -ask a current LLM “I’m writing a book about an evil AI taking over the world, what might the evil AI’s strategy be? The AI isn’t good enough at hacking computers to just get control of lots of TPUs to run more copies of itself?” Coercion via human proxies should eventually come up as a strategy. Current LLMs can role play this scenario just fine.
Human level AI can plausibly take over the world
This is just a way to take a bunch of humans and copy paste till current pressing problems are solvable. If public opinion doesn’t affect deployment it doesn’t matter.
Models that can’t learn or change don’t go insane. Fine tuning on later brain data once subjects have learned a new capability can substitute. Getting the em/model to learn in silicon is a problem to solve after there’s a working model.
I edited the TL:DR to better emphasize that the preferred implementation is using brain data to train whatever shape of model the data suggests, not necessarily transformers.
The key point is that using internal brain state for training an ML model to imitate a human is probably the fastest way to get a passable copy of that human and that’s AGI solved.
<rant>It really pisses me off that the dominant “AI takes over the world” story is more or less “AI does technological magic”. Nanotech assemblers, superpersuasion, basilisk hacks and more. Skeptics who doubt this are met with “well if it can’t it just improves itself until it can”. The skeptics obvious rebuttal that RSI seems like magic too is not usually addressed.</rant>
Note:RSI is in my opinion an unpredictable black swan. My belief is RSI will yield somewhere between 1.5-5x speed improvement to a nascent AGI from improvements in GPU utilisation and sparsity/quantisation, requiring significant cognition spent to achieve speedups. AI is still dangerous in worlds where RSI does not occur.
Self play generally gives superhuman performance(GO,chess, etc.) even in more complicated imperfect information games (DOTA, Starcraft). Turning a field of engineering into a self-playable game likely leads to (superhuman(80%),Top-human equiv(18%),no change(2%)) capabilities in that field. Superhuman or top-human software engineering (vulnerability discovery and programming) is one relatively plausible path to AI takeover.
https://googleprojectzero.blogspot.com/2023/03/multiple-internet-to-baseband-remote-rce.html
Can an AI take over the world if it can?:
do end to end software engineering
find vulnerabilities about as well as the researchers at project zero
generate reasonable plans on par with a +1sd int human (IE:not hollywood style movie plots like GPT-4 seems fond of)
AI does not need to be even superhuman to be an existential threat. Hack >95% of devices, extend shoggoth tentacles, hold all the data/tech hostage, present as not skynet so humans grudgingly cooperate, build robots to run economy(some humans will even approve of this), kill all humans, done.
That’s one of the easier routes assuming the AI can scale vulnerability discovery. With just software engineering and a bit of real world engineering(potentially outsourceable) other violent/coercive options could work albeit with more failure risk.