If you never miss a commuter train, you’re always on the station too early. If you never miss a holiday flight, that’s fine.
If you’ve never failed a job interview, you could get paid much more. If you never get fired, you might be leaving something on the table, but I wouldn’t complain.
If your jokes never offend anyone, you’re not going to be a standup comedian. If your jokes always offend someone, consider that you might not that funny after all.
A pessimist won’t be disappointed. But an optimist might be happier. Pessimist will be right a lot more, though.
If your business never encounters fraud, you could be saving money on security measures. If everyone knew exactly how likely it’s to get caught, you’d have to spend a lot more. Or perhaps a lot less. Maybe there’s some cheap signaling you could do?
If you have a low risk tolerance, you’re leaving a lot of value on the table. If you’re insensitive or oblivious to the downsides, you’ll lose a lot more.
I think about this in my head as “in practice you converge faster to the optima if you overshoot sometimes so do that when overshooting is affordable” and have the counterexample that learning to drive shouldn’t involve accidentally killing a couple people.
1: I see the main point of OP as variance-expectation trade off, where variance is bad when risk averse e.g. whet bad outcomes are much more bad than good outcomes are good. Perhaps you meant this—what you said reads like you may have meant that the process of overshooting teaches you new stuff.
2: When learning to park in an empty parking lot I realized I was consistently turning too early and so decided to go for enough that I’d expect to overshoot just as often/by as much; this suddenly made me much better and got me to learn the correct turning time faster. Notably, there was no risk of hitting someone if I overshot to the right instead of to the left.
I haven’t flushed out my idea clearly. I’m saying something like “In asymmetric scenarios, the more costly failures are, the harder it is to reach the optima (for a given level of risk-averseness)” + “In hindsight, most people will think they are too risk-averse for most things”. It isn’t centrally relevant to what OP is saying upon reflection.
We found it fruitful to think in terms of misaligned actors, where:
the user might be misaligned (the user asks for a harmful task),
the model might be misaligned (the model makes a harmful mistake), or
the website might be misaligned (the website is adversarial in some way).
Interesting use of language here. I can understand calling the user or website misaligned, as understood as alignment relative to laws or OpenAI’s goals. But why call a model misaligned when it makes a mistake? To me, misalignment would mean doing that on purpose.
Later, the same phenomenon is described like this:
The second category of harm is if the model mistakenly takes some action misaligned with the user’s intent, and that action causes some harm to the user or others.
Is this yet another attempt to erode the meaning of “alignment”?
My bet is that this isn’t an attempt to erode alignment, but is instead based on thinking that lumping together intentional bad actions with mistakes is a reasonable starting point for building safeguards. Then the report doesn’t distinguish between these due to a communication failure. It could also just be a more generic communication failure.
(I don’t know if I agree that lumping these together is a good starting point to start experimenting with safeguards, but it doesn’t seem crazy.)
Communication is indeed hard, and it’s certainly possible that this isn’t intentional. On the other hand, making mistakes is quite suspicious when they’re also useful for your agenda. But I agree that we probably shouldn’t read too much into it. The system card doesn’t even mention the possibility of the model acting maliciously, so maybe that’s simply not in scope for it?
The primary value of Effective Altruism community comes from providing a social group where incentives on charity spending are better aligned with utilitarism. Information sharing is secondary. This also explains why people like to attend many EA events. Even though it doesn’t make much sense for actually doing good, it provides the social reward for it. This dynamic is undervalued in impact estimates, and organizing more community-building fun would be quite valuable.
(loosely held opinion)
(motivated reasoning warning: I mostly care about the fun stuff anyway)
Yes, social incentives are important. But it is also important that people donate to actually effective charities… otherwise they could get the same (maybe even better!) social rewards for locating to a local church.
Given that social rewards are usually only very loosely correlated with how good something is, it is great to have a community that aligns them better. But it easy to goodhart these things. (For example by visiting EA events, but actually not donating… maybe with the excuse that “I will donate later… much later...”.)
Flights with return ticket are often only 20-50% more expensive than an one-way ticket. Sometimes the return ticket is cheaper than one-way! Since profit margins in air travel are in low single digit percents, and providing the flight doesn’t get much cheaper by having the same person fly back later, something interesting must be going on. A similar thing sometimes occurs with transfers, where a flight sequence A-B-C is cheaper than just B-C. You’re not allowed to just buy that and then fly B-C, they’ll cancel your later legs if you miss the first one.
At least partially it’s a question of price discrimination. Most price sensitive customers do roundtrip flights for e.g. vacations and they can typically be quite flexible about both timing and destination. This is also part of the reason why sometimes you can get cheap flights if you book well in advance.
I’m somewhat price-sensitive and really like one-way tickets. My vacations sometimes include me just deciding one day that I’ve had enough and flying back home the same evening. It’s very liberating to not have fixed plans.
There are ways to game the system. As almost always is the case in the service industry, they’re Out to Get You and gaming the system requires Getting Ready. I’ve sometimes spent more time researching flights than actually flying. This would be pretty irrational, except that it’s a nice game that I enjoy. Sometimes I overdo it. Good habits die hard.
Concrete tips:
Check if the return ticket is cheaper than one way. Then just don’t show up on the return flight.
Book a one-way ticket for A-B-C and then never board B-C flight (skiplagging). Technically forbidden but likely not a problem if you don’t do it too often.
Book a flexible roundtrip ticket and then move the return flight around as needed. Sometimes you can even change the airport you depart the return flight from. Flex tickets are often 30% or so more expensive but still way cheaper than two one-ways.
Never book from carrier website without checking Google flights or similar first. Sometimes you save 50%.
Fly in the middle of the week, much cheaper than weekends.
Don’t spend 10 hours researching a 3 hour flight costing less than your hourly wage.
Someone wrote a “contra” post for my post! I’m a real rationalist blogger now! At least until I start thinking I need to achieve some higher goal like writing something actually good. But I sure will ride this high for the next week or so.
In other news, I attended and perhaps slightly organized a small 1-day LWCW-inspired unconference in Espoo, Finland. I was, as usual, facilitating circling and hotseat. Other interesting stuff occurred too. The experience for me was quasi-trancendental, personality-wise. Or perhaps this simply continues my fake enlightenment arc that’s been having across the past week or two. In any case, this is the stuff I crave.
On an unrelated note, optimization is the process of extracting fun from a something. Or perhaps fun is the process of optimizing it out of the world. “All models are wrong, some are useful” and this one is hopefully useless and thus a great source of fun until it gets useful.
There’s an interesting dual asymmetry in cybersecurity: The defender needs to make only a single mistake to lose, and the attackers can observer many targets waiting for such mistakes. Then again, if the defender makes no mistakes, there’s literally nothing an attacker can do.
Of course the above is not strictly true: defence-in-depth approach can sometimes make a particular mistake inconsequental. This in turn can make the defenders ignore such mistakes when they’re not exploitable.
Modern software supply chains are long and wide. A typical software might depend on thousands of libraries, and nobody can realistically audit them all. And there’s hardware, too. Processor-level vulnerabilities in particular are not realistically avoidable.
The cost of exploiting vulns is going down quickly. The cost of finding and fixing them is falling quickly too. It’s going to be really interesting to see what the new equilibrium is going to be like.
Ellison is the CTO of Oracle, one of the three companies running the Stargate Project. Even if aligning AI systems to some values can be solved, selecting those values badly can still be approximately as bad as the AI just killing everyone. Moral philosophy continues to be an open problem.
If you never miss a commuter train, you’re always on the station too early. If you never miss a holiday flight, that’s fine.
If you’ve never failed a job interview, you could get paid much more. If you never get fired, you might be leaving something on the table, but I wouldn’t complain.
If your jokes never offend anyone, you’re not going to be a standup comedian. If your jokes always offend someone, consider that you might not that funny after all.
A pessimist won’t be disappointed. But an optimist might be happier. Pessimist will be right a lot more, though.
If your business never encounters fraud, you could be saving money on security measures. If everyone knew exactly how likely it’s to get caught, you’d have to spend a lot more. Or perhaps a lot less. Maybe there’s some cheap signaling you could do?
If you have a low risk tolerance, you’re leaving a lot of value on the table. If you’re insensitive or oblivious to the downsides, you’ll lose a lot more.
I think about this in my head as “in practice you converge faster to the optima if you overshoot sometimes so do that when overshooting is affordable” and have the counterexample that learning to drive shouldn’t involve accidentally killing a couple people.
1: I see the main point of OP as variance-expectation trade off, where variance is bad when risk averse e.g. whet bad outcomes are much more bad than good outcomes are good. Perhaps you meant this—what you said reads like you may have meant that the process of overshooting teaches you new stuff.
2: When learning to park in an empty parking lot I realized I was consistently turning too early and so decided to go for enough that I’d expect to overshoot just as often/by as much; this suddenly made me much better and got me to learn the correct turning time faster. Notably, there was no risk of hitting someone if I overshot to the right instead of to the left.
I haven’t flushed out my idea clearly. I’m saying something like “In asymmetric scenarios, the more costly failures are, the harder it is to reach the optima (for a given level of risk-averseness)” + “In hindsight, most people will think they are too risk-averse for most things”. It isn’t centrally relevant to what OP is saying upon reflection.
While reading OpenAI Operator System Card, the following paragraph on page 5 seemed a bit weird:
Interesting use of language here. I can understand calling the user or website misaligned, as understood as alignment relative to laws or OpenAI’s goals. But why call a model misaligned when it makes a mistake? To me, misalignment would mean doing that on purpose.
Later, the same phenomenon is described like this:
Is this yet another attempt to erode the meaning of “alignment”?
My bet is that this isn’t an attempt to erode alignment, but is instead based on thinking that lumping together intentional bad actions with mistakes is a reasonable starting point for building safeguards. Then the report doesn’t distinguish between these due to a communication failure. It could also just be a more generic communication failure.
(I don’t know if I agree that lumping these together is a good starting point to start experimenting with safeguards, but it doesn’t seem crazy.)
Communication is indeed hard, and it’s certainly possible that this isn’t intentional. On the other hand, making mistakes is quite suspicious when they’re also useful for your agenda. But I agree that we probably shouldn’t read too much into it. The system card doesn’t even mention the possibility of the model acting maliciously, so maybe that’s simply not in scope for it?
It doesn’t seem very useful for them IMO.
The primary value of Effective Altruism community comes from providing a social group where incentives on charity spending are better aligned with utilitarism. Information sharing is secondary. This also explains why people like to attend many EA events. Even though it doesn’t make much sense for actually doing good, it provides the social reward for it. This dynamic is undervalued in impact estimates, and organizing more community-building fun would be quite valuable.
(loosely held opinion) (motivated reasoning warning: I mostly care about the fun stuff anyway)
Yes, social incentives are important. But it is also important that people donate to actually effective charities… otherwise they could get the same (maybe even better!) social rewards for locating to a local church.
Given that social rewards are usually only very loosely correlated with how good something is, it is great to have a community that aligns them better. But it easy to goodhart these things. (For example by visiting EA events, but actually not donating… maybe with the excuse that “I will donate later… much later...”.)
Flights with return ticket are often only 20-50% more expensive than an one-way ticket. Sometimes the return ticket is cheaper than one-way! Since profit margins in air travel are in low single digit percents, and providing the flight doesn’t get much cheaper by having the same person fly back later, something interesting must be going on. A similar thing sometimes occurs with transfers, where a flight sequence A-B-C is cheaper than just B-C. You’re not allowed to just buy that and then fly B-C, they’ll cancel your later legs if you miss the first one.
At least partially it’s a question of price discrimination. Most price sensitive customers do roundtrip flights for e.g. vacations and they can typically be quite flexible about both timing and destination. This is also part of the reason why sometimes you can get cheap flights if you book well in advance.
I’m somewhat price-sensitive and really like one-way tickets. My vacations sometimes include me just deciding one day that I’ve had enough and flying back home the same evening. It’s very liberating to not have fixed plans.
There are ways to game the system. As almost always is the case in the service industry, they’re Out to Get You and gaming the system requires Getting Ready. I’ve sometimes spent more time researching flights than actually flying. This would be pretty irrational, except that it’s a nice game that I enjoy. Sometimes I overdo it. Good habits die hard.
Concrete tips:
Check if the return ticket is cheaper than one way. Then just don’t show up on the return flight.
Book a one-way ticket for A-B-C and then never board B-C flight (skiplagging). Technically forbidden but likely not a problem if you don’t do it too often.
Book a flexible roundtrip ticket and then move the return flight around as needed. Sometimes you can even change the airport you depart the return flight from. Flex tickets are often 30% or so more expensive but still way cheaper than two one-ways.
Never book from carrier website without checking Google flights or similar first. Sometimes you save 50%.
Fly in the middle of the week, much cheaper than weekends.
Don’t spend 10 hours researching a 3 hour flight costing less than your hourly wage.
Someone wrote a “contra” post for my post! I’m a real rationalist blogger now! At least until I start thinking I need to achieve some higher goal like writing something actually good. But I sure will ride this high for the next week or so.
In other news, I attended and perhaps slightly organized a small 1-day LWCW-inspired unconference in Espoo, Finland. I was, as usual, facilitating circling and hotseat. Other interesting stuff occurred too. The experience for me was quasi-trancendental, personality-wise. Or perhaps this simply continues my fake enlightenment arc that’s been having across the past week or two. In any case, this is the stuff I crave.
On an unrelated note, optimization is the process of extracting fun from a something. Or perhaps fun is the process of optimizing it out of the world. “All models are wrong, some are useful” and this one is hopefully useless and thus a great source of fun until it gets useful.
There’s an interesting dual asymmetry in cybersecurity: The defender needs to make only a single mistake to lose, and the attackers can observer many targets waiting for such mistakes. Then again, if the defender makes no mistakes, there’s literally nothing an attacker can do.
Of course the above is not strictly true: defence-in-depth approach can sometimes make a particular mistake inconsequental. This in turn can make the defenders ignore such mistakes when they’re not exploitable.
Modern software supply chains are long and wide. A typical software might depend on thousands of libraries, and nobody can realistically audit them all. And there’s hardware, too. Processor-level vulnerabilities in particular are not realistically avoidable.
The cost of exploiting vulns is going down quickly. The cost of finding and fixing them is falling quickly too. It’s going to be really interesting to see what the new equilibrium is going to be like.
Ellison is the CTO of Oracle, one of the three companies running the Stargate Project. Even if aligning AI systems to some values can be solved, selecting those values badly can still be approximately as bad as the AI just killing everyone. Moral philosophy continues to be an open problem.