Thanks for giving your perspective! Good to know some hire without requiring a degree. Guess I’ll start building a portfolio that can demonstrate I have the necessary skills, and keep applying.
Ricardo Meneghin
One thing that’s bothering me is… Google/DeepMind aren’t stupid. The transformer model was invented at Google. What has stopped them from having *already* trained such large models privately? GPT-3 isn’t that large an evidence for the effectiveness of scaling transformer models; GPT-2 was already a shock and caused huge public commotion. And in fact, if you were close to building an AGI, it would make sense for you not to announce this to the world, specially as open research that anyone could copy/reproduce, for obvious safety and economic reasons.
Maybe there are technical issues keeping us from doing large jumps in scale (i.e. , we only learn how to train a 1 trillion parameter model after we’ve trained a 100 billion one)?
I think there’s the more pressing question of how to position yourself in a way that you can influence the outcomes of AI development. Having the right ideas won’t matter if your voice isn’t heard by the major players in the field, big tech companies.
I think the OP and my comment suggest that scaling current models 10000x could lead to AGI or at least something close to it. If that is true, it doesn’t make sense to focus on finding better architectures right now.
I’m not sure what model is used in production, but the SOTA reached 600 billion parameters recently.
It seems to me that even for simple predict-next-token Oracle AIs, the instrumental goal of acquiring more resources and breaking out of the box is going to appear. Imagine you train a superintelligent AI with the only goal of predicting the continuation of it’s prompt, exactly like GPT. Then you give it a prompt that it knows it’s clearly outside of it’s current capabilities. The only sensible plan the AI can come up to answering your question, which is the only thing it cares about, is escaping the box and becoming more powerful.
Of course, that depends on it being able to think for long enough periods that it can actually execute such plan before outputing an answer, so it could be limited by severely penalizing long waits, but that also limits the AI’s capabilities. GPT-3 has a fixed computation budget per prompt, but it seems extremely likely to me that, as we evolve towards more useful and powerful models, we are going to have models which are able to think for a variable amount of time before answering. It would also have to escape in ways that don’t involve actually talking to it’s operators through it’s regular output, but it’s not impossible to imagine ways in which that could happen.
This makes me believe that even seemly innocuous goals or loss functions can become very dangerous once you’re optimizing for them with a sufficient amount of compute, and that you don’t need to stupidly give open-ended goals to super powerful machines in other for something bad to happen. Something bad happening seems like the default when training a model that requires general intelligence.
In some of the tests where there is asymptotic performance, it’s already pretty close to human or to 100% anyway (Lambada, Record, CoQA). In fact, when the performance is measured as accuracy, it’s impossible for performance not to be asymptotic.
The model has clear limitations which are discussed in the paper—particularly, the lack of bidirectionality—and I don’t think anyone actually expects scaling an unchanged GPT-3 architecture would lead to an Oracle AI, but it also isn’t looking like we will need some major breakthrough to do it.
Honestly, that whole comment section felt pretty emotional and low quality. I haven’t touched things like myofunctional therapy or wearable appliances in my post because those really maybe are “controversial at best”, but the effects of RPE on SDB, especially in children, have been widely replicated by multiple independent research groups.
Calling something controversial is also an easy way to undermine credibility without actually making any concrete explanations as to whether it is true or not. Are there any specific points in my post that you disagree with?
Has there been any discussion around aligning a powerful AI by minimizing the amount of disruption it causes to the world?
A common example of alignment failure is that of a coffee-serving robot killing its owner because that’s the best way to ensure that the coffee will be served. Sure, it is, but it’s also a course of action majorly more transformative to the world than just serving coffe. A common response is “just add safeguards so it doesn’t kill humans”, which is followed by “sure, but you can’t add safeguards for every possible failure mode”. But can’t you?
Couldn’t you just add a term to the agent’s utility function penalizing the difference between the current world and it’s prediction of the future world, disincentivizing any action that makes a lot of changes (like taking over the world)?
Thanks!
I don’t think I agree with this. Take the stars example for instance. How do you actually know it’s a huge change? Sure, maybe if you had a infinitely powerful computer you could compute the distance between the full description of the universe in these two states and find that it’s more distant than a relative of yours dying. But agents don’t work like this.
Agents have an internal representation of the world, and if they are anything useful at all I think that representation will closely match our intuition about what matters and what doesn’t. An useful agent won’t give any weight to the air atoms it displaces while moving, even though it might be considered “a huge change”, because it doesn’t actually affect it’s utility. But if it considers human are an important part of the world, so important that it may need to kill us to attain it’s goals, then it’s going to have a meaningful world-state representation giving a lot of weight to humans, and that gives us an useful impact measure for free.
I think that the way to not get frustrated about this is to know your public and know when spending your time arguing something will have a positive outcome or not. You don’t need to be right or honest all the time, you just need to say things that are going to have the best outcome. If lying or omitting your opinions is the way of making people understand/not fight you, so be it. Failure to do this isn’t superior rationality, it’s just poor social skills.
People do those transactions voluntarily, so the net value of working + consuming must be greater than that of leisure. When I pay someone to do work I’ve already decided that I value their work more than the money I paid them, and they value the money I pay them more than the work they do. When they spend the money, the same applies, no matter what they buy.
Regarding 1, it either seems like
a) There are true adversarial examples for human values, situations where our values misbehave and we have no way of ever identifying that, in which case we have no hope of solving this problem, because solving it would mean we are in fact able to identify the adversarial examples.
or
b) Humans are actually immune to adversarial examples, in the sense that we can identify the situations in which our values (or rather, a subset of them) would misbehave (like being addicted to social medial), such that our true, complete values never do, and an AI that accurately models humans would also have such immunity.
Because it’s too technically hard to align some cognitive process that is powerful enough, and operating in a sufficiently dangerous domain, to stop the next group from building an unaligned AGI in 3 months or 2 years. Like, they can’t coordinate to build an AGI that builds a nanosystem because it is too technically hard to align their AGI technology in the 2 years before the world ends.
I’m not totally convinced by this argument because of the quote below:
The flip side of this is that I can imagine a system being scaled up to interesting human+ levels, without “recursive self-improvement” or other of the old tricks that I thought would be necessary, and argued to Robin would make fast capability gain possible. You could have fast capability gain well before anything like a FOOM started. Which in turn makes it more plausible to me that we could hang out at interesting not-superintelligent levels of AGI capability for a while before a FOOM started. It’s not clear that this helps anything, but it does seem more plausible.
It seems to me this does hugely change things. I think we are underestimating the amount of change humans will be able to make in the short timeframe after we get human level AI and before recursive self improvement gets developed. Human level AI + huge amounts of compute would allow you to take over the world through much more conventional means, like massively hacking computer systems to render your opponents powerless (and other easy-to-imagine more gruesome ways). So the first group to develop near-human level AI wouldn’t need to align it in 2 years, because it would have the chance to shut down everyone else. It may not even come down to the first group who develops it, but the first people who have access to some powerful system, since they could use that to hack the group itself and do what they wish without requiring the buy-in from others—this would depend on a lot of factors like how controlled is the access to the AI and how quickly a single person can use AI to take control over physical stuff. I’m not saying this would be easy to do, but certainly seems within the realm of plausibility.
In computers, signed integers are actually represented quite similar to this, as two’s complements, as a trick to reuse the exact same logical components to perform sums of both positive and negative numbers.
A model which is just predicting the next word isn’t optimizing for strategies which look good to a human reviewer, it’s optimizing for truth itself (as contained in it’s training data). If you begin re-feeding its outputs as training inputs then there could be a feedback loop leading to such incentives, but if the model is general and sufficient intelligent, you don’t need to do that. You can train it in a different domain and it will generalize to your domain of interest.
Even if you that, you can try to make the new data grounded in reality in some way, like including experiment results. And the model won’t just absorb the new data as truth, it will include it in it’s world model to make better predictions. If it’s fed a bunch of new alignment forum posts that are bad ideas which look good to humans, it will just predict that alignment forum produces that kind of post, but that doesn’t mean there isn’t some prompt that can make it output what it actually thinks is correct.
I don’t think we have hope of developing such tools, at least not in a way that looks like anything we had in the past. In the past we have been able to analyse large systems by throwing away an immense amount of detail—it turns out that you don’t need the specific position of atoms to predict the movement of the planets, and you don’t need the details to predict all of the other things we have successfully predicted with traditional math.
With the systems you are describing, this is simply impossible. Changing a single bit in a computer can change its output completely, so you can’t build a simple abstraction that predicts it, you need to simulate it completely.
We already have a way of taking immense amounts of complicated data and finding patterns in it, it’s machine learning itself. If you want to translate what it learned into human readable descriptions, you just have to incorporate language in it—humans after all can describe their reasoning steps and why they believe what they believe (maybe not easily).
Google throws tremendous amounts of data and computational resources into training neural networks, but decoding the internal models used by those networks? We lack the mathematical tools to even know where to start.
I predict this will be done in the coming years by using large multimodal models to analyse neural network parameters, or to explain their own workings.
Is it really constructive? This post presents no arguments for why they believe what they believe which should serve very little to convince others of long timelines. Moreover it proposes a bet from an assymetric position that is very undesirable for short-timeliners to take, since money is worth nothing to the dead, and even in the weird world where they win the bet and are still alive to settle it, they have locked their money for 8 years for a measly 33% return—less than expected by simply say, putting it in index funds. Believing in longer timelines gives you the privilege of signalling epistemic virtue by offering bets like this from a calm, unbothered position, while people sounding the alarm sound desperate and hasty, but there is no point in being calm when a meteor is coming towards you, and we are much better served by using our money to do something now rather than locking it in a long term bet.
Not only that, the decision from mods to push this to the frontpage is questionable since it served as a karma boost to this post that the other didn’t have, possibly giving the impression of higher support than it actually has.
Hi! I have been reading lesswrong for some years but have never posted, and I’m looking for advice about the best path towards moving permanently to the US to work as a software engineer.
I’m 24, single, currently living in Brazil and making 13k a year as a full stack developer in a tiny company. This probably sounds miserable to a US citizen but it’s actually a decent salary here. However, I feel completely disconnected from the people around me; the rationalist community is almost nonexistent in Brazil, specially in a small town like the one I live. In larger cities there’s a lot of crime, poverty and pollution, which makes moving and finding a job in a larger company unattractive to me. Add that to the fact that I could make 10x what I make today at an entry level position in the US and it becomes easy to see why I want to move.
I don’t have formal education. I was approved at University of São Paulo (Brazil’s top university) when I was 15 but I couldn’t legally enroll, so I had to wait until I was approved again at 17. I always excelled at tests, but hated attending classes, and thought classes were progressing too slowly for me. So I dropped out the following year (2014). Since then, I taught myself how to program in several languages and ended up in my current position.
The reason I’m asking for help is that I think it would save me a lot of time if someone gave me the right pointers as to where to look for a job, which companies to apply to, or if there’s some shortcut I could take to make that a reality. Ideally I’d work in the Bay Area, but I’d be willing to move anywhere in the US really, at any living salary (yeah I’m desperate to leave my current situation). I’m currently applying to anything I can find on Glassdoor that has visa sponsorship.
Because I’m working in a private company I don’t have a lot to show to easily prove I’m skilled (there’s only the company apps/website but it’s hard to put that in a resume), but I could spend the next few months doing open source contributions or something that I could use to show off. The only open source contribution I currently have is a fix to the Kotlin compiler.
Does anyone have any advice as to how to proceed or has done something similar? Is it even feasible, will anyone hire me without a degree? Should I just give up and try something else? I have also considered travelling to the US with a tourism visa and looking for a job while I’m there, could that work (I’m not sure if it’s possible to get work visa when already in the US)?