I suspect that it becomes more and more rate limiting as technological progress speeds up.
Like, to a first approximation, I think there’s a fixed cost to learning to use and take full advantage of a new tool. Let’s say that cost if a few weeks of experimentation and tinkering. If importantly new tools are are invented on a cadence of once ever 3 years, that fixed cost is negligible. But if importantly new tools are dropping every week, the fixed cost becomes much more of a big deal.
If you’re so price sensitive $1000 is meaningful, well, uh try to find a solution to this crisis. I’m not saying one exists, but there are survival risks to poverty.
Lol. I’m not impoverished, but I want to cheaply experiment with having a car. It isn’t worth it to spend throw away $30,000 on a thing that I’m not going to get much value from.
I recall a Chriss Olah post in which he talks about using AIs as a tool for understanding the world, by letting the AI learn, and then using interpretability tools to study the abstractions that the AI uncovers.
I thought he specifically mentioned “using AI as a microscope.”
Is that a real post, or am I misremembering this one?
Are there any hidden risks to buying or owning a car that someone who’s never been a car owner might neglect?
I’m considering buying a very old (ie from the 1990s), very cheap (under $1000, ideally) minivan, as an experiment.
That’s inexpensive enough that I’m not that worried about it completely breaking down on me. I’m willing to just eat the monetary cost for the information value.
However, maybe there are other costs or other risks that I’m not tracking, that make this a worse idea.
- Some ways that a car can break make it dangerous, instead of non-functional.
- Maybe if a car breaks down in the middle of route 66, the government fines you a bunch?
- Something something car insurance?
Are there other things that I should know? What are the major things that one should check for to avoid buying a lemon?
Assume I’m not aware of even the most drop-dead basic stuff. I’m probably not.
(Also, I’m in the market for a minivan, or other car with 3 rows of seats. If you have an old car like that which you would like to sell, or if know someone who does, get in touch.
Do note that I am extremely price sensitive, but I would pay somewhat more than $1000 for a car, if I were confident that it was not a lemon.)
Question: Have Moral Mazes been getting worse over time? Could the growth of Moral Mazes be the cause of cost disease?
I was thinking about how I could answer this question. I think that the thing that I need is a good quantitative measure of how “mazy” an organization is.
I considered the metric of “how much output for each input”, but 1) that metric is just cost disease itself, so it doesn’t help us distinguish the mazy cause from other possible causes, 2) If you’re good enough at rent seeking maybe you can get high revenue despite you poor production.
What metric could we use?
Is there a standard article on what “the critical risk period” is?
I thought I remembered an arbital post, but I can’t seem to find it.
My guess was: you could have a different map, for different parts of the globe, ie a part that focus on Africa (and therefore has minimal distortions of Africa), and a separate part for America, and a separate part for Asia, and so on.
Is there a LessWrong article that unifies physical determinism and choice / “free will”? Something about thinking of yourself as the algorithm computed on this brain?
I’m not opposed to getting random flash-from-past sequences posts in my notifications.
[Eli’s personal notes. Feel free to ignore or engage]
Any distinction between good and bad behavior with any nuance seems very hard to me.
Related to the following, from here.
But if I want to help Bob figure out whether he should vote for Alice—whether voting for Alice would ultimately help create the kind of society he wants—that can’t be done by trial and error. To solve such tasks we need to understand what we are doing and why it will yield good outcomes. We still need to use data in order to improve over time, but we need to understand how to update on new data in order to improve.Some examples of easy-to-measure vs. hard-to-measure goals:Persuading me, vs. helping me figure out what’s true. (Thanks to Wei Dai for making this example crisp.)Reducing my feeling of uncertainty, vs. increasing my knowledge about the world.Improving my reported life satisfaction, vs. actually helping me live a good life.Reducing reported crimes, vs. actually preventing crime.Increasing my wealth on paper, vs. increasing my effective control over resources.
But if I want to help Bob figure out whether he should vote for Alice—whether voting for Alice would ultimately help create the kind of society he wants—that can’t be done by trial and error. To solve such tasks we need to understand what we are doing and why it will yield good outcomes. We still need to use data in order to improve over time, but we need to understand how to update on new data in order to improve.
Some examples of easy-to-measure vs. hard-to-measure goals:
Persuading me, vs. helping me figure out what’s true. (Thanks to Wei Dai for making this example crisp.)
Reducing my feeling of uncertainty, vs. increasing my knowledge about the world.
Improving my reported life satisfaction, vs. actually helping me live a good life.
Reducing reported crimes, vs. actually preventing crime.
Increasing my wealth on paper, vs. increasing my effective control over resources.
. . .
I think my true reason is not that all reasoning about humans is dangerous, but that it seems very difficult to separate out safe reasoning about humans from dangerous reasoning about humans
Thinking further, this is because of something like...the “good” strategies for engaging with humans are continuous with the “bad”strategies for engaging with humans (ie dark arts persuasion is continuous with good communication), but if your AI is only reasoning about a domain that doesn’t have humans than deceptive strategies are isolated in strategy space from the other strategies that work (namely, mastering the domain, instead of tricking the judge).
Because of this isolation of deceptive strategies, we can notice them more easily?
[Eli’s notes, that you can ignore or engage with]
Threats: This seems to be in direct conflict with alignment—roughly speaking, either your AI system is aligned with you and can be threatened, or it is not aligned with you and then threats against it don’t hurt you. Given that choice, I definitely prefer alignment.
Well, it might be the case that a system is aligned but is mistakenly running an exploitable decision theory. I think the idea is we would prefer to have things set up so that failures are contained, ie if your AI is running an exploitable decision theory, that problem doesn’t cascade into even worse problems.
I’m not sure if “avoiding human models” actually meets this criterion, but it does seem useful to aim for systems that don’t fail catastrophically if you get something wrong.
[Eli’s personal notes. Feel free to ignore or to engage.]
Supposing we intend the first use of AGI to be solving some bounded and well-specified task, but we misunderstand or badly implement it so much that what we end up with is actually unboundedly optimising some objective function. Then it seems better if that objective is something abstract like puzzle solving rather than something more directly connected to human preferences: consider, as a toy example, if the sign (positive/negative) around the objective were wrong.
The basic idea here is that if we screw up so badly that what we thought was a safely bounded tool-AI, is actually optimizing to tile the universe with something, it is better if it tiles the universe with data-centers doing math proofs than something that refers to what humans want?
Why would that be?
[Eli’s personal notes. Feel free to ignore or engage.]
We suggest that an important factor in the answer to this question is whether the AGI system was built using human modelling or not. If it produced a solution to the transit design problem (that humans approve of) without human modelling, then we would more readily trust its outputs. If it produced a solution we approve of with human modelling, then although we expect the outputs to be in many ways about good transit system design (our actual preferences) and in many ways suited to being approved by humans, to the extent that these two targets come apart we must worry about having overfit to the human model at the expense of the good design. (Why not the other way around? Because our assessment of the sandboxed results uses human judgement, not an independent metric for satisfaction of our actual preferences.)
Short summary: If an AI system is only modeling the problem that we want it to solve, and it produces a solution that looks good to us, we can be pretty confident that it it is actually a good solution.
Whereas, if it is modeling some problem, and modeling us, we can’t be sure where the solution lies on the spectrum of “actually good” solutions vs. “bad solutions that appear good to us.”
Does anyone know why this just showed up in my notifications as a new post?
Is there any particular reason why I should assign more credibility to Moral Mazes / Robert Jackall than I would to the work of any other sociologist?
(My prior on sociologists is that they sometimes produce useful frameworks, but generally rely on subjective hard-to-verify and especially theory-laden methodology, and are very often straightforwardly ideologically motivated.)
I imagine that someone else could write a different book, based on the same kind of anthropological research, that highlights different features of the corporate world, to tell the opposite story.
And that’s without anyone trying to be deceptive. There’s just a fundamental problem of case studies that they don’t tell you what’s typical, only give you examples.
I can totally imagine that Jackall landed on this narrative somehow, found that it held together and just confirmation biased for the rest of his career. Once his basic thesis was well-known, and associated with his name, it seems hard for something like that NOT to happen.
And this leaves me unsure what to do with the data of Moral Mazes. Should I default assume that Jackall’s characterization is a good description of the corporate world? Or should I throw this out as a useless set of examples confirmation biased together? Or something else?
It seems like the question of “is the most of the world dominated by Moral Mazes?” is an extremely important one. But also, its seems to me that it’s not operationalized enough to have a meaningful answer. At best, it seems like this is a thing that happens sometimes.
Why does prediction-book 1) allow you to make 100% credence predictions and 2) bucket 99% credence in the 90% bucket instead of the 100% bucket?
Does anyone know?
It means I either need to have an unsightly graph where my measured accuracy falls to 0 in the 100% bucket, or take the unseemly approach of putting 100% (rounding up, of course, not literally 100%) on some extremely likely prediction.
The bucketing also means that if I make many 99% predictions, but few 90% predictions (for instance), I’ll appear uncalibrated, even if I have perfect calibration, (since items in that bucket would be accurate more than 90% of the time). Not realizing this, I might think that I need to adjust more.
This is an amazing anecdote.
Are you going to stand there on the other side of the door and think about important AI problems while the old lady struggles to open it?
I visualized this scenario and laughed out loud.