Martin Randall

Karma: 699

Martin Randall 30 Sep 2023 1:54 UTC
43 points
15
in reply to: 1a3orn’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
To make the analogy more concrete, suppose that Alice posts a 43-point thesis on MacGyver Ruin: A List Of Lethalities, similar to AGI Ruin, that explains that MacGyver is planning to sink our ship and this is likely to lead to the ship sinking. In point 2 of 43, Alice claims that:

MacGyver will not find it difficult to bootstrap to overpowering capabilities independent of our infrastructure. The concrete example I usually use here is exploding the boilers, because there’s been pretty detailed analysis of how what definitely look like physically attainable lower bounds on what should be possible with exploding the boilers, and those lower bounds are sufficient to carry the point. My lower-bound model of “how MacGyver would sink the ship, if he didn’t want to not do that” is that he gets access to the boilers, reverses the polarity of the induction coils, overloads the thermostat, and then the boilers blow up.

(Back when I was first deploying this visualization, the wise-sounding critics said “Ah, but how do you know even MacGyver could gain access to the boilers, if he didn’t already have a gun?” but one hears less of this after the advent of MacGyver: Lost Treasure of Atlantis, for some odd reason.)

Losing a conflict with MacGyver looks at least as deadly as “there’s a big explosion out of nowhere and then the ship sinks”.

Then, Bob comes along and posts a 24min reply, concluding with:

I think if there was a saboteur on board, that would increase the chance of the boiler exploding. For example, if they used the time to distract the guard with a clanging sound, they might be able to reach the boiler before being apprehended. So I think this could definitely increase the risk. However, there are still going to be a lot of human-scale bottlenecks to keep a damper on things, such as the other guard. And as always with practical sabotage, a large part of the process will be figuring out what the hell went wrong with your last explosion.

What about MacGyver? Well, now we’re guessing about two different speculative things at once, so take my words (and everyone else’s) with a double grain of salt. Obviously, MacGyver would increase sabotage effectiveness, but I’m not sure the results would be as spectacular as Alice expects.

I suppose this updates my probability of the boilers exploding downwards, just as I would update a little upwards if Bob had been similarly cagey in the opposite direction.

It doesn’t measurably update my probability of the ship sinking, because the boiler exploding isn’t a load-bearing part of the argument, just a concrete example. This is a common phenomenon in probability when there are agents in play.

Martin Randall 8 Apr 2023 23:57 UTC
36 points
21
in reply to: Razied’s comment on: GPTs are Predictors, not Imitators
When I prompt GPT-5 it’s already out of distribution because the training data mostly isn’t GPT prompts, and none of it is GPT-5 prompts. If I prompt with “this is a rap battle between Dath Ilan and Earthsea” that’s not a high likelihood sentence in the training data. And then the response is also out of distribution, because the training data mostly isn’t GPT responses, and none of it is GPT-5 responses.

So why do we think that the responses are further out of distribution than the prompts?

Possible answer: because we try to select prompts that work well, with human ingenuity and trial and error, so they will tend to work better and be effectively closer to the distribution. Whereas the responses are not filtered in the same way.

But the responses are optimized only to be in distribution, whereas the prompts are also optimized for achieving some human objective like generating a funny rap battle. So once the optimizer achieves some threshold of reliability the error rate should go down as text is generated, not up.

Martin Randall 16 Mar 2022 1:27 UTC
36 points
on: Challenges to Yudkowsky’s Pronoun Reform Proposal

If your default pronoun for those-who-haven’t-asked goes by perceived sex (which one presumes is what Yudkowsky means by “gamete size”—we almost never observe people’s gametes).

I wouldn’t gloss over this with a presumption. In another context Yudkowsky writes:

I am one of the last living descendants of the lineage that ever knew how to say anything concrete at all. [...] People use really abstract descriptions and never imagine anything sufficiently concrete, and this lets the abstract properties waver around ambiguously and inconsistently to give the desired final conclusions of the argument.

Yet his “simplest and best protocol” doesn’t tell me what pronouns I should use for almost anyone, nor does it give any concrete examples. As soon as I plug in a concrete example, the protocol blows up. For example, Yudowsky hasn’t ever told me what pronouns I should use for him, and I haven’t ever observed his gamete size. This is literally the first concrete example I thought of. I don’t think I should need to search the internet to find his gamete size and preferred pronouns “merely in order to speak in passing” of his pronoun reform proposal.

Martin Randall 14 Sep 2023 3:29 UTC
28 points
7
in reply to: jimrandomh’s comment on: Contra Yudkowsky on Epistemic Conduct for Author Criticism
I expect that most people (with an opinion) evaluated Yudkowsky’s ideas prior to evaluating him as a person. After all, Yudkowsky is an author, and almost all of his writing is intended to convey his ideas. His writing has a broader reach, and most of his readers have never met him. I think the linked post is evidence that omnizoid in particular evaluated Yudkowsky’s ideas first, and that he initially liked them.

It’s not clear to me what your hypothesis is. Does omnizoid have a conflict of interest? Were they offended by something? Are they lying about being a big fan for two years? Do they have some other bias?

Even if someone is motivated by an epistemic failure mode, I would still like to see the bottom line up front, so I can decide whether to read, and whether to continue reading. Hopefully the failure mode will be obvious and I can stop reading sooner. I don’t want a norm where authors have to guess whether the audience will accuse them of bias in order to decide what order to write their posts in.

Martin Randall 29 Sep 2022 4:02 UTC
24 points
5
on: Petrov Day Retrospective: 2022
A quick retrospective on the four Manifold markets on the day, focusing on the possibility of good or bad incentives flowing out of the markets. Several people expressed concerns that the Manifold markets would cause LessWrong’s home page to go down, or go down earlier, due to incentives. People also placed bets and limit orders to try to mute those bad incentives. Some examples:

“This is a classical example where having a prediction market creates really bad incentives.” “I think the right thing to do here is buy yes so that it’s more beneficial for others to buy and hold no.” “I’m not wild about this market existing, but given that it does exist I’m strongly in favour of making it profitable for others to not press the button.”

I decided to look into this. I want to say up front that I think all the incentives were very small relative to perceived stakes, and I have no suspicions of anyone after writing this comment. I am also not going to give any bettor names.

By the end of the YES/NO market, several users had bet large sums on YES and were consequently incentivized to blow up the home page. One user had m26,002 YES shares (USD $260). Another user had 7328 NO shares (USD $73) and was incentivized not to blow up the site. But this doesn’t show the whole picture, because there were also limit orders. As was explained to me:

The choice of whether to leave (public) limit orders on one side or the other is the way you incentivize action.

At various points during the day there were large (m1000, etc) YES limit orders in play. This is completely useless as an incentive for an individual LessWrong user. Sure, if I was the only one who might press the button, I could bet through those limit orders, not press the button, and collect my bounty. But many people could press the button, and this would have lost me mana. They did count as an incentive for LessWrong coders! They could quietly introduce a bug preventing the button from being pressed, bet NO, and collect their winnings. Nobody took up that incentive.

I didn’t see any large NO limit orders. A NO limit order on a market like this is inherently risky, because if the button is pressed the offer will definitely be taken, but if the button is not pressed it probably won’t. If there had been then that could have added to the incentive some bettors had to blow up the home page, but as far as I can tell there weren’t.

The positions in the WHEN market were much smaller, with the largest positions coming in at 552 shares in LATE/NEVER and 325 shares in EARLY. The current design of range markets on Manifold is fun to play with, but is not very “swingy”. In particular, there is no option for outsized winnings if you can predict exactly when something will happen. If a user was planning to blow up the home page in the last few hours they could have bet a combination of LATE/NEVER in this market and YES in the binary market to maximize their gains, but I don’t see any evidence of this. Because there are fewer high karma users, large EARLY bets in this market could slightly limit their incentive to bet EARLY and blow up the home page early. Unlike the YES/NO market, those bets didn’t happen.

If someone was willing to anonymously blow up the home page, then a pattern of placing huge bets on that happening might look suspicious and damage their anonymity. So generally I would expect these incentives to only be effective for someone who was willing to blow up the home page and take credit for it. You can bet on that happening in Will anyone try to take credit for nuking LessWrong?. Or, place a limit NO order to incentivize someone to bet YES and then take the credit (and the mana).

When I created my market—Will LessWrong change their Petrov Day 2022 plan and reduce the chance of the button being pressed? - it was swiftly bet down to a low probability, creating a small incentive for the LessWrong admins to bet YES and then change their plans. Nobody took up that incentive, and the market didn’t attract large limit orders.

Finally mkualquiera made a market—Will my friend agree to defect on Petrov Day?. This market is evidence that two people were thinking about being incentivized to blow up the home page by betting markets. But it seems like this was more about a fun game than making mana. The largest position was 301 shares on YES, and the market resolved NO, so whatever incentive those shares provided, they weren’t enough to make it happen.

In the end I think there is a low (10%) chance that anyone’s behavior was significantly shifted by prediction market incentives. Feel free to reply to this post with how your behavior was shifted so I can update.

I was hopeful that people might shift their behavior based on the prediction market predictions—specifically that the high probability placed on the home page being blown up would lead to design changes. However, this retrospective clarifies that Petrov Day 2022 was a social experiment, so the prediction would have just shown it was expected to work as designed.

… the primary design of the exercises – seeing how long it’d take for the site to go down, even if we were pretty sure that it would.

It’s also possible that people saw the high probability that the home page would be blown up and altered their plans. They may have not blown it up because they expected someone else would (especially if they weren’t confident in their anonymity being preserved). Or they may have blown it up because “if it’s going to happen, it may as well be me”. I think there is a higher chance (25%) that this was an effect, but I don’t know what the net direction would have been. Again, feel free to reply if you think the prediction markets helped you make decisions.

Overall I think the prediction markets were a positive addition to a negative event. I think the main incentive in play was the possibility of seeing, or not seeing, the LessWrong home page being blown up. And I think we all have a lot to figure out about prediction markets.

Martin Randall 18 Sep 2023 13:49 UTC
23 points
16
on: Actually, “personal attacks after object-level arguments” is a pretty good rule of epistemic conduct
For me, this post suffers from excessive meta. It is a top-level response to a top-level response to a comment on a top-level post of unclear merit. As I read it I find myself continually drawn to go back up the stack to determine whether your characterizations of Zach’s characterizations of Yudkowsky’s characterizations of Omnizoid’s characterizations seem fair. This is not a good reading experience for me.

Instead, I would prefer to see a post like this written to make positive claims for the proposed rule of epistemic conduct “personal attacks after object-level arguments”. A hypothetical structure:
1. What is the proposed rule? Does “after” mean chronologically, or within the structure of a single post, book, or sequence? Is it equivalent to Bulverism, poisoning the well, or some other well-known rule, or is it novel? Does it depend on whether the person being attacked is alive, or whether they are a public figure?
2. What are some good, clean, uncontroversial examples of writing that follows the rules vs writing that breaks the rules?
3. What are the justifications for the proposed rule? Will people unconsciously update incorrectly?
4. What are the best counter-arguments against the proposed rule? Why do you think they fail?
5. What are the consequences for breaking the rule? Who shall enforce the rule and its consequences?
I think this would be a better timeless contribution to our epistemic norms.

Martin Randall 20 Mar 2023 4:24 UTC
22 points
8
on: More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Thank you for this work and this post. I am struck by two quotes. First from ARC:

We think that, for systems more capable than Claude and GPT-4, we are now at the point where we need to check carefully that new models do not have sufficient capabilities to replicate autonomously or cause catastrophic harm – it’s no longer obvious that they won’t be able to.

And from OpenAI:

(ARC) did not have access to the final version of the model that we deployed. The final version has capability improvements relevant to some of the factors that limited the earlier models power-seeking abilities

Combining these two statements, I might conclude that OpenAI are deploying a model that may be able to replicate autonomously or cause catastrophic harm. Oh dear.

My first attempt at a saving construction is that ARC claims that a system more capable than the one they tested, up to the level of the final version, would be safe, but any system more capable than that would be potentially risky. I’m not sure how they would know that based on their testing. It also seems like a suspiciously specific and convenient threshold for danger.

My second attempt is that OpenAI did “check carefully” the final version of the model, without further consulting ARC, and concluded that it does not have sufficient capabilities. That seems unlikely, as I would expect such additional checks to be included in the system card.

My third attempt is that OpenAI is deliberately deploying a potentially dangerous model, but expects that its further consultations with ARC post-launch will detect dangers prior to the danger being realized. This places ARC in a race against the combined intelligence of the rest of humanity.

I’m not satisfied with any of my attempted saves, and I expect that someone with knowledge of the field can do better.

Martin Randall 10 Sep 2023 14:11 UTC
17 points
7
in reply to: katriel’s comment on: Sharing Information About Nonlinear
For me your comment is a red flag.

It implies at least a 2x multiplier on salaries for equivalent work. This practice is linked with gender pay gaps, favoritism, and a culture of pay secrecy. It implies that other similar matters, such as expenses, promotions, work hours, and time-off, may be similarly unequal. And yes, there is a risk to team morale.

It risks discriminating against people on characteristics that are, or should be, protected from discrimination. My risk of value drift is influenced by my political and religious views. My need for retirement savings is influenced by my age. My baseline for frugal living is influenced by my children and my spouse and my health.

It shows poor employer-employee boundaries. I would be concerned that if I were to ask for time off from my employer, the answer would depend on management’s opinion of what I was planning to do with the time, rather than on company policy and objective factors.

In general, if some employees are having extremely positive experiences, and other employees are having extremely negative experiences, that is not reassuring. Still, I am glad you had a good experience.

Martin Randall 19 May 2023 20:47 UTC
17 points
9
in reply to: TurnTrout’s comment on: Steering GPT-2-XL by adding an activation vector
I added m1,000 in liquidity.

This idea of determining whether a result is “obvious” in advance seems valuable, I hope it catches on.

Martin Randall 26 Sep 2022 11:42 UTC
16 points
5
in reply to: Martin Randall’s comment on: LW Petrov Day 2022 (Monday, 9/26)

The checkbox will be hidden once Petrov Day starts to prevent people undoing their self-commitment in a moment of weakness.

This does not appear to be working, I can still see the checkbox.

I can also see the big red button. That is perhaps working as designed, but I was hoping to opt out of the whole thing, missiles and all.
What links here?
- Petrov Day Retrospective: 2022 by Ruby (28 Sep 2022 22:16 UTC; 107 points)

Martin Randall 21 Jul 2023 20:12 UTC

15 points

on: Contra Contra the Social Model of Disability

I’m trying to understand the difference between the Interactional and Social Model via a concrete example. I’m short-sighted, and without my glasses I can’t see my kids in a crowd. Both models agree that my myopia is an impairment. It is reasonable for society to accommodate my impairment. by making sure I have money to buy corrective lenses. But there are still occasions when I am not wearing glasses, for example when I go swimming.

(Yes, I should go buy prescription swim goggles, please leave that to one side)

Reality	Interactional Model	Social Model
I am short-sighted	I am impaired	I am impaired
I can’t see my kids	I am disabled	I have an unaccommodated impairment
I can’t see my kids because someone who hates short-sighted people hid my glasses	I am disabled by the interaction of my impairment and discrimination	I am disabled by discrimination
I can’t see my kids because I chose to take off my glasses to go swimming	I am disabled by the interaction of my impairment and my choice	I have an unaccommodated impairment but I am not disabled

Is that right?

Is there any disagreement about base reality between the two models, or is it a disagreement about the best way to use words? If the latter, can we use A Human’s Guide to Words to understand when to use each model?

Martin Randall 23 Jan 2022 14:52 UTC
15 points
in reply to: Viliam’s comment on: Is AI Alignment a pseudoscience?

We do not have a scientific understanding of how to tell a superintelligent machine to “solve problem X, without doing something horrible as a side effect”, because we cannot describe mathematically what “something horrible” actually means to us...

Where is this quote from? I don’t see it in the article or in the author’s other contributions.

Martin Randall 17 Jan 2023 14:25 UTC
14 points
on: MIRI announces new “Death With Dignity” strategy

Martin Randall 29 Sep 2022 0:22 UTC
14 points
12
on: Petrov Day Retrospective: 2022

We can conclude that in gaining more karma than [300], one becomes the kind of person who doesn’t destroy the world symbolically or otherwise.

I imagine this is tongue in cheek, but we really can’t. You mentioned an important reason—someone with more karma could have waited to press the button. The first button press occurred 110 minutes after it could have been pressed. The second button press occurred at least 40 minutes after it could have been pressed, and perhaps 100, 160, 220, etc. In 2020 the button was pressed 187 minutes after it could have been pressed (by a 4000+ karma user).

You excluded known trouble makers from accessing the button but you didn’t exclude unknown trouble makers, and lower karma is correlated with being unknown.

We are also dealing with a hostile intelligence who pressed the button (or caused it to be pressed). Someone with higher karma might deliberately wait to press the button to throw people off the scent, to encourage people to make a naive update about karma scores, or to reduce the negative consequences of bringing the home page down for longer without the negative consequences of leaving it up all day. The timing evidence is thus hostile evidence and updating on it correctly requires superintelligence.

Put this together and I would not place any suspicion on the noble class of 200-299 karma users that I happened to enter on Petrov Day after net positive gains from complaining about the big red button.

I am willing to update that at least one person in the 200+ karma range pressed the button, and at least one person with zero karma pressed the button. This assumes there was not a third bug in play. This does not change my opinion of LessWrong users, but those who predicted that the home page would remain up could update.

Martin Randall 26 Sep 2022 17:51 UTC
14 points
5
on: Ambiguity in Prediction Market Resolution is Harmful
Traders have the option, where a question is ambiguous, of asking the resolver how they would resolve it in some hypothetical scenario. This is true on Manifold as on Metaculus. I find this is normally more profitable for me than trying to get inside the head of the resolver.

There is a separate issue of resolver reputation, where some resolvers have a history of being biased in favor of their own positions, or just getting wrong. Definitely a weakness of current Manifold.

This post has a lot of good advice that I agree with, thanks for writing it.

Martin Randall 22 Sep 2022 17:39 UTC
14 points
1
on: LW Petrov Day 2022 (Monday, 9/26)
Thread for people to publicly opt out after updating their LW settings.

I opted out. Not that I expect the site (or the world) to still be up at my Karma (or IQ).
What links here?
- Petrov Day Retrospective: 2022 by Ruby (28 Sep 2022 22:16 UTC; 107 points)

Martin Randall 27 Mar 2023 12:39 UTC
13 points
10
in reply to: Viliam’s comment on: Manifold: If okay AGI, why?
Counter-intuition, if I’m playing Russian Roulette while holding a lottery ticket in my other hand, then staying alive but not winning the lottery is an “okay” outcome.

Martin Randall 15 Mar 2023 23:54 UTC
13 points
2
in reply to: Zack_M_Davis’s comment on: Enemies vs Malefactors
Sometimes also tantrums work in the training distribution of childhood and don’t work in the deployment environment of professional work.

Martin Randall 31 Mar 2024 2:18 UTC
12 points
3
in reply to: habryka’s comment on: ‘Empiricism!’ as Anti-Epistemology
Ideally Yudkowsky would have linked to the arguments he is commenting on. This would demonstrate that he is responding to real, prominent, serious arguments, and that he is not distorting those arguments. It would also have saved me some time.

But now imagine if—like this Spokesperson here—the AI-allowers cried ‘Empiricism!‘, to try to convince you to do the blindly naive extrapolation from the raw data of ‘Has it destroyed the world yet?’

The first hit I got searching for “AI risk empiricism” was Ignore the Doomers: Why AI marks a resurgence of empiricism. The second hit was AI Doom and David Hume: A Defence of Empiricism in AI Safety, which linked Anthropic’s Core Views on AI Safety. These are hardly analogous to the Spokesman’s claims of 100% risk-free returns.

Next I sampled several Don’t Worry about the Vase AI newsletters and “some people are not so worried”. I didn’t really see any cases of blindly naive extrapolation from the raw data of ‘Has AI destroyed the world yet?’. I found Alex Tabarrok saying “I want to see that the AI baby is dangerous before we strangle it in the crib.”. I found Jacob Buckman saying “I’m Not Worried About An AI Apocalypse”. These things are related but clearly admit the possibility of danger and are arguing for waiting to see evidence of danger before acting.

An argument I have seen is blindly naive extrapolation from the raw data of ‘Has tech destroyed the world yet?’ Eg, The Techno-Optimist Manifesto implies this argument. My current best read of the quoted text above is that it’s an attack on an exaggerated and simplified version of this type of view. In other words, a straw man.

Martin Randall 9 Apr 2023 22:33 UTC
12 points
2
in reply to: Razied’s comment on: GPTs are Predictors, not Imitators
Could we agree on a testable prediction of this theory? For example, looking at the chess degradation example. I think your argument predicts that if we play several games of chess against ChatGPT in a row, its performance will keep going down in later games, in terms of both quality and legality. Potentially such that the last attempt will be complete gibberish. Would that be a good test?