If you are going to read just one thing I wrote, read The Problem of the Criterion.
More AI related stuff collected over at PAISRI
If you are going to read just one thing I wrote, read The Problem of the Criterion.
More AI related stuff collected over at PAISRI
Sure, differences are as real as the minds making them are. Once you have minds those minds start perceiving differentiation since they need to extract information from the environment to function. So I guess I’m saying I don’t see what your objection is in this last comment as you’ve not posited anything that seems to claim something that actually disagrees with my point as far as I can tell. I think it’s a bit weird to call the differentiation you’re referring to “objective”, but you explained what you mean.
Eh. I feel like I’m just sharing what they posted because I think people here are interested. I’m not trying to make any claims.
The title of this post is the same title they used for the notice on their website with basically the same text as the email.
My personal commentary: probably not much of a loss, but directionally sucks. No charity was receiving much money this way as far as I know, but was a nice way to feel a bit better about shopping on Amazon since you got to pick where their charitable giving went, and I’m sure it had some marginal impact for some charities. I’m also sure there’s a backstory, like wanting to not be neutral on where the funds go because they leave it up to customers to choose from any registered charity, but that’s not present in the announcement.
Isn’t a special case of aiming at any target we want the goals we would want it to have? And whatever goals we’d want it to have would be informed by our ontology? So what I’m saying is I think there’s a case where the generality of your claim breaks down.
I think that the big claim the post relies on is that values are a natural abstraction, and the Natural Abstractions Hypothesis holds. Now this is admittedly very different from the thesis that value is complex and fragile.
It is not that AI would naturally learn human values, but that it’s relatively easy for us to point at human values/Do What I Mean/Corrigibility, and that they are natural abstractions.
This is not a claim that is satisfied by default, but is a claim that would be relatively easy to satisfy if true.
If this is the case, my concern seems yet more warranted, as this is hoping we won’t suffer a false positive alignment scheme that looks like it could work but won’t. Given the his cost of getting things wrong, we should minimize false positive risks which means not pursuing some ideas because the risk if they are wrong is too high.
For what it’s worth, I think you’re running headlong into an instance of the problem of the criterion and enjoy seeing how you’re grappling with it. I’ve tagged this post as such.
Reading this post I think it insufficiently addresses motivations, purpose, reward functions, etc. to make the bold claim that perfect world-model interpretability is sufficient for alignment. I think this because ontology is not the whole of action. Two agents with the same ontology and very different purposes would behave in very different ways.
Perhaps I’m being unfair, but I’m not convinced that you’re not making the same mistake as when people claim any sufficiently intelligent AI would be naturally good.
This seems straightforward to me: reification is a process by which our brain picks out patterns/features and encodes them so we can recognize them again and make sense of the world given our limited hardware. We can then think in terms of those patterns and gloss over the details because the details often aren’t relevant for various things.
The reason we reify things one way versus another depends on what we care about, i.e. our purposes.
To me this seems obvious: noumena feel real to most people because they’re captured by their ontology. It takes a lot of work for a human mind to learn not to jump straight from sensation to reification, and even with training there’s only so much a person can do because the mind has lots of low-level reification “built in” that happens prior to conscious awareness. Cf. noticing
Oh, I thought I already explained that. There’s at least two different ways “exist” can be meant here, and I think we’re talking past each other.
For some thing to exist that implies it must exist ontologically, i.e. in the map. Otherwise it is not yet a thing. So I’m saying there’s a difference between what we might call existence and being. You exist, in the sense of being an ontological thing, only by virtue of reification, but you are by virtue of the whole world being.
Additionally, bullshit gets worse the more you try to optimize because you start putting worse optimizers in charge of making decisions. I think this is where the worst bullshit comes from: you hire people who just barely know how to do their jobs, they hire people who they don’t actually need because hiring people is what managers are supposed to do, they then have to find something for them to do. They playact at being productive because that’s what they were hired to do, and the business doesn’t notice for a while because they’re focused on trying to optimize past the point of marginally cost effective returns. This is where the worst bullshit pops up and is the most salient, but it’s all downstream of Goodharting.
I think this is because there’s an active watering down of terms happening in some corners of AI capabilities research as a result of trying to only tackle subproblems in alignment and not being abundently clear that these are subproblems rather than the whole thing.
Agreed. I think the two theories of bullshit jobs miss how bullshit comes into existence.
Bullshit is actually just the fallout of Goodhart’s Curse.
(note: it’s possible this is what Zvi means by 2 but he’s saying it in a weird way)
You start out wanting something, like to maximize profits. You do everything reasonable in your power to increase profits. You hit a wall and don’t realize it and keep applying optimization. You throw more resources after marginally worse returns until you start actively making things worse by trying to earn more.
One of the consequences of this is bullshit jobs.
Let me give you an example. Jon works for a SaaS startup. His job is to maximize reliability. The product is already achieving 3 9s, but customers want 4 because they have some vague sense that more is better and your competitor is offering 4. Jon knows that going from 3 to 4 will 10x COGS and tells the executives as much, but they really want to close those deals. Everyone knows 3 9s is actually enough for the customers, but they want 4 so Jon has to give it to them because otherwise they can’t close deals.
Now Jon has to spend on bullshit. He quadruples the size of his team and they start building all kinds of things to eek out more reliability. In order to pull this off they make tradeoffs that slow down product development. The company is now able to offer 4 9s and close deals, but suffers deadweight loss from paying for 4 9s when customers only really need 3 (if only customers would understand they would suffer no material loss by living with 3).
This same story plays out in every function across the company. Marketing and sales are full of folks chasing deals that will never pay back their cost of acquisition. HR and legal are full of folks protecting the company against threats that will never materialize. Support is full of reps who help low revenue customers who end up costing the company money to keep on the books. And on and on.
By the time anyone realizes the company has suffered several quarters of losses. They do a layoff, restructure, and refocus on the business that’s actually profitable. Everyone is happy for a while, but then demand more growth, restarting the business cycle.
Bullshit jobs are not mysterious. They are literally just Goodharting.
Thus, we should not expect them to go away thanks to AI unless all jobs go away, we should just expect them to change, though I think not in the way Zvi expects. Bullshit doesn’t exist for its own sake. Bullshit exists due to Goodharting. So bullshit will change to fit the context of where humans are perceived to provide additional value. The bullshit will continue up until some point at which humans are completely unneeded by AI.
Hmm, I feel like there’s some misunderstanding here maybe?
What you’re calling “strong alignment” seems more like what most folks I talk to mean by “alignment”. What you call “alignment” seems more like what we often call “corrigibility”.
You’re right that corrigibility is not enough to get alignment on its own (i.e that “alignment” is not enough to get “strong alignment”), but it’s necessary.
Anna Salamon made a point like this in a post several years ago: https://www.lesswrong.com/posts/ZGzDNfNCXzfx6hYAH/how-to-learn-soft-skills
It’s something that really stuck with me. Not all minds are alike, and it’s often worth finding your own words to say things that others have said. It’s useful to you, and it can be useful to others.
The thing I think many LW writers get wrong is that they aren’t humble about it. They rediscover something and act like they invented it, mostly because there seems to be some implicit belief that we’re better than those who came before us because we have Rationality(tm). I’ve been guilty of this, as have many others.
I saw another thing recently which put this idea about reinventing ideas in a new light. The author mentioned that when they were studying in a yeshiva everyone celebrated when one of the students rediscovered an argument made by an earlier writer, and the older the original author the better. It was a sign that the person was really grasping the ideas and was getting closer to God than the other students.
Whatever you think of studying rabbinical texts, this seems like a healthy sentiment to adopt when someone rediscovers an idea.
Adding on to the points about elasticity, arguably states with welfare systems already enable people to work very few hours in line with predictions, it just requires navigating benefit programs and being content with living standards that make one low status in the eyes of others. It might not be unreasonable to say that most of elasticity is driven by status needs and a desire to keep up with others rather than be content with less.
Why does there need to be structure? We can just have a non-uniform distribution of energy around the universe in order for there to be information to extract. I guess you could call this “structure” but that seems like a stretch to me.
I don’t know if I can convince you. You seem pretty convinced that there are natural abstractions or something like them. I’m pretty suspicious that there are natural abstractions and instead think there are useful abstractions but they are all contingent on how the minds creating those abstractions are organized and that no abstractions meaningfully exist independent of the minds that create them. Perhaps the structure of our universe limits how minds work in ways that de facto means we all create ontology within certain constraints, but I don’t think we know enough to prove this.
By my view, any sense in which abstractions seem natural is a kind of typical mind fallacy.