zeshen 1 Dec 2022 2:24 UTC
18 points
5
on: Be less scared of overconfidence
This is a nice post that echoes many points in Eliezer’s book Inadequate Equilibria. In short, it is entirely possible that you outperform ‘experts’ or ‘the market’, if there are reasons to believe that these systems converge to a sub-optimal equilibrium, and even more so when you have more information that the ‘experts’, like in your Wave vs Theorem example.

zeshen 23 Aug 2022 20:18 UTC
8 points
0
on: Toni Kurz and the Insanity of Climbing Mountains
“Because it’s there”—George Mallory in 1923, when asked why he wanted to climb Everest. He died in his summit attempt the following year.

Summary of ML Safety Course

zeshen27 Sep 2022 13:05 UTC

7 points

0 comments6 min readLW link

zeshen 1 May 2023 16:16 UTC
7 points
3
on: Support me in a Week-Long Picketing Campaign Near OpenAI’s HQ: Seeking Support and Ideas from the LessWrong Community
This seems to be another one of those instances where I wish there was a dual-voting system to posts. I would’ve liked to strong disagree with the contents of the post without discouraging well-intentioned people from posting here.

zeshen 30 Apr 2023 17:00 UTC
6 points
3
on: No, *You* Need to Write Clearer
I feel like a substantial amount of disagreement between alignment researchers are not object-level but semantic disagreements, and I remember seeing instances where person X writes a post about how he/she disagrees with a point that person Y made, with person Y responding about how that wasn’t even the point at all. In many cases, it appears that simply saying what you don’t mean could have solved a lot of the unnecessary misunderstandings.

[Question] Is there a way to sort LW search results by date posted?

zeshen12 Mar 2023 4:56 UTC

5 points

1 comment1 min readLW link

zeshen 23 Dec 2023 10:37 UTC
4 points
1
on: Funding case: AI Safety Camp
Strong upvoted. I was a participant of AISC8 in the team that went on to launch AI Standards Lab, which I think counterfactually would not be launched if not for AISC.

zeshen 12 Feb 2023 8:31 UTC
4 points
0
on: Rationality-related things I don’t know as of 2023
Thanks for this post. I’ve always had the impression that everyone around LW have been familiar with these concepts since they were kids and now know them by heart, while I’ve been struggling with some of these concepts for the longest time. It’s comforting to me that there are long time LWers who don’t necessarily fully understand all of these stuff either.

zeshen 19 Nov 2022 10:27 UTC
4 points
3
on: Reflective Consequentialism
This post makes sense to me though it feels almost trivial. I’m puzzled by the backlash against consequentialism, it just feels like people are overreacting. Or maybe the ‘backlash’ isn’t actually as strong as I’m reading it to be.
I’d think of virtue ethics as some sort of equilibrium that society has landed ourselves in after all these years of being a species capable of thinking about ethics. It’s not the best but you’d need more than naive utilitarianism to beat it (this EA forum post feels like commonsense to me too), which you describe as reflective consequentialism. It seems like it all boils down to: be a consequentialist, as long as you 1) account for second-order and higher effects, and 2) account for bad calculation due to corrupted hardware.

zeshen 2 Sep 2022 10:16 UTC
4 points
1
on: Levelling Up in AI Safety Research Engineering
This is a great guide—thank you. However, in my experience as someone completely new to the field, 100-200 hours on each level is very optimistic. I’ve easily spent double/triple the duration on the first two levels and not get to a comfortable level.

zeshen 19 Mar 2024 8:56 UTC
3 points
0
on: Using axis lines for good or evil
My first impression was also that axis lines are a matter of aesthetics. But then I browsed The Economist’s visual styleguide and realized they also do something similar, i.e. omit the y-axis line (in fact, they omit the y-axis line on basically all their line / scatter plots, but almost always maintain the gridlines).
Here’s also an article they ran about their errors in data visualization, albeit probably fairly introductory for the median LW reader.

zeshen 8 Nov 2022 11:28 UTC
3 points
0
in reply to: Jonathan Yan’s comment on: Has anyone increased their AGI timelines?
I’d love to see a post with your reasonings.

zeshen 2 Nov 2022 11:08 UTC
LW: 3 AF: 3
AF
on: [AN #112]: Engineering a Safer World
I got the book (thanks to Conjecture) after doing the Intro to ML Safety Course where the book was recommended. I then browsed through the book and thought of writing a review of it—and I found this post instead, which is a much better review than I would have written, so thanks a lot for this!

Let me just put down a few thoughts that might be relevant for someone else considering picking up this book.
Target audience: Right at the beginning of the book, the author says “This book is written for the sophisticated practitioner rather than the academic researcher or the general public.” I think this is relevant, as the book goes to a level of detail way beyond what’s needed to get a good overview of engineering safety.

Relevance to AI safety: I feel like most engineering safety concepts are not applicable to alignment, firstly because an AGI would likely not have any human involvement in its optimization process, and secondly the basic underlying STAMP constructs of safety constraints, hierarchical safety control structures, and process models are simply more applicable to engineering systems. As stated in p100, ““STAMP focuses particular attention on the role of constraints in safety management.“ and I highly doubt an AGI can be bounded by constraints. ” Nevertheless, Chapter 8 STPA: A New Hazard Analysis Technique that describes STPA (System Theoretic Process Analysis) may be somewhat relevant to designing safety interlocks. Also, the final chapter (13) on Managing Safety and the Safety Culture, is broadly applicable to any field that involves safety.
Criticisms on conventional techniques: The book often mentions that techniques like STAMP and STPA is superior than other conventional techniques like HAZOP and gives quotes by reviewers that attest to their superiority. I don’t know if those criticisms are really fair, given how it is not really adopted at least in the oil and gas industry that, for all its flaws, takes safety very seriously. Perhaps the criticisms could be fair for very outdated safety practices. Nevertheless, the general concepts of engineering safety feels quite similar whether it uses ‘conventional’ techniques or the ‘new’ techniques described in the book.
Overall, I think this book provides a good overview of engineering safety concepts, but for the general audience (or alignment researchers) it goes into too much detail on specific case studies and arguments.