zeshen

Karma: 367

Feedback welcomed: www.admonymous.co/zeshen

zeshen 1 Dec 2022 2:24 UTC
18 points
5
on: Be less scared of overconfidence
This is a nice post that echoes many points in Eliezer’s book Inadequate Equilibria. In short, it is entirely possible that you outperform ‘experts’ or ‘the market’, if there are reasons to believe that these systems converge to a sub-optimal equilibrium, and even more so when you have more information that the ‘experts’, like in your Wave vs Theorem example.

zeshen 26 Apr 2024 9:00 UTC
9 points
4
on: LLMs seem (relatively) safe
Thanks for this post. This is generally how I feel as well, but my (exaggerated) model of the AI aligment community would immediately attack me by saying “if you don’t find AI scary, you either don’t understand the arguments on AI safety or you don’t know how advanced AI has gotten”. In my opinion, a few years ago we were concerned about recursively self improving AIs, and that seemed genuinely plausible and scary. But somehow, they didn’t really happen (or haven’t happened yet) despite people trying all sorts of ways to make it happen. And instead of a intelligence explosion, what we got was an extremely predictable improvement trend which was a function of only two things i.e. data + compute. This made me qualitatively update my p(doom) downwards, and I was genuinely surprised that many people went the other way instead, updating upwards as LLMs got better.

zeshen 23 Aug 2022 20:18 UTC
8 points
0
on: Toni Kurz and the Insanity of Climbing Mountains
“Because it’s there”—George Mallory in 1923, when asked why he wanted to climb Everest. He died in his summit attempt the following year.

zeshen 1 May 2023 16:16 UTC
7 points
3
on: Support me in a Week-Long Picketing Campaign Near OpenAI’s HQ: Seeking Support and Ideas from the LessWrong Community
This seems to be another one of those instances where I wish there was a dual-voting system to posts. I would’ve liked to strong disagree with the contents of the post without discouraging well-intentioned people from posting here.

zeshen 30 Apr 2023 17:00 UTC
6 points
3
on: No, *You* Need to Write Clearer
I feel like a substantial amount of disagreement between alignment researchers are not object-level but semantic disagreements, and I remember seeing instances where person X writes a post about how he/she disagrees with a point that person Y made, with person Y responding about how that wasn’t even the point at all. In many cases, it appears that simply saying what you don’t mean could have solved a lot of the unnecessary misunderstandings.

zeshen 23 Dec 2023 10:37 UTC
4 points
1
on: Funding case: AI Safety Camp
Strong upvoted. I was a participant of AISC8 in the team that went on to launch AI Standards Lab, which I think counterfactually would not be launched if not for AISC.

zeshen 12 Feb 2023 8:31 UTC
4 points
0
on: Rationality-related things I don’t know as of 2023
Thanks for this post. I’ve always had the impression that everyone around LW have been familiar with these concepts since they were kids and now know them by heart, while I’ve been struggling with some of these concepts for the longest time. It’s comforting to me that there are long time LWers who don’t necessarily fully understand all of these stuff either.

zeshen 19 Nov 2022 10:27 UTC
4 points
3
on: Reflective Consequentialism
This post makes sense to me though it feels almost trivial. I’m puzzled by the backlash against consequentialism, it just feels like people are overreacting. Or maybe the ‘backlash’ isn’t actually as strong as I’m reading it to be.
I’d think of virtue ethics as some sort of equilibrium that society has landed ourselves in after all these years of being a species capable of thinking about ethics. It’s not the best but you’d need more than naive utilitarianism to beat it (this EA forum post feels like commonsense to me too), which you describe as reflective consequentialism. It seems like it all boils down to: be a consequentialist, as long as you 1) account for second-order and higher effects, and 2) account for bad calculation due to corrupted hardware.

zeshen 2 Sep 2022 10:16 UTC
4 points
1
on: Levelling Up in AI Safety Research Engineering
This is a great guide—thank you. However, in my experience as someone completely new to the field, 100-200 hours on each level is very optimistic. I’ve easily spent double/triple the duration on the first two levels and not get to a comfortable level.

zeshen 3 May 2024 3:43 UTC
3 points
0
on: Why is AGI/ASI Inevitable?
Can’t people decide simply not to build AGI/ASI?
Yeah, many people, like the majority of users on this forum, have decided to not build AGI. On the other hand, other people have decided to build AGI and are working hard towards it.

Side note: LessWrong has a feature to post posts as Questions, you might want to use it for questions in the future.

zeshen 19 Mar 2024 8:56 UTC
3 points
0
on: Using axis lines for good or evil
My first impression was also that axis lines are a matter of aesthetics. But then I browsed The Economist’s visual styleguide and realized they also do something similar, i.e. omit the y-axis line (in fact, they omit the y-axis line on basically all their line / scatter plots, but almost always maintain the gridlines).
Here’s also an article they ran about their errors in data visualization, albeit probably fairly introductory for the median LW reader.

zeshen 8 Nov 2022 11:28 UTC
3 points
0
in reply to: Jonathan Yan’s comment on: Has anyone increased their AGI timelines?
I’d love to see a post with your reasonings.

zeshen 2 Nov 2022 11:08 UTC
LW: 3 AF: 3
AF
on: [AN #112]: Engineering a Safer World
I got the book (thanks to Conjecture) after doing the Intro to ML Safety Course where the book was recommended. I then browsed through the book and thought of writing a review of it—and I found this post instead, which is a much better review than I would have written, so thanks a lot for this!

Let me just put down a few thoughts that might be relevant for someone else considering picking up this book.
Target audience: Right at the beginning of the book, the author says “This book is written for the sophisticated practitioner rather than the academic researcher or the general public.” I think this is relevant, as the book goes to a level of detail way beyond what’s needed to get a good overview of engineering safety.

Relevance to AI safety: I feel like most engineering safety concepts are not applicable to alignment, firstly because an AGI would likely not have any human involvement in its optimization process, and secondly the basic underlying STAMP constructs of safety constraints, hierarchical safety control structures, and process models are simply more applicable to engineering systems. As stated in p100, ““STAMP focuses particular attention on the role of constraints in safety management.“ and I highly doubt an AGI can be bounded by constraints. ” Nevertheless, Chapter 8 STPA: A New Hazard Analysis Technique that describes STPA (System Theoretic Process Analysis) may be somewhat relevant to designing safety interlocks. Also, the final chapter (13) on Managing Safety and the Safety Culture, is broadly applicable to any field that involves safety.
Criticisms on conventional techniques: The book often mentions that techniques like STAMP and STPA is superior than other conventional techniques like HAZOP and gives quotes by reviewers that attest to their superiority. I don’t know if those criticisms are really fair, given how it is not really adopted at least in the oil and gas industry that, for all its flaws, takes safety very seriously. Perhaps the criticisms could be fair for very outdated safety practices. Nevertheless, the general concepts of engineering safety feels quite similar whether it uses ‘conventional’ techniques or the ‘new’ techniques described in the book.
Overall, I think this book provides a good overview of engineering safety concepts, but for the general audience (or alignment researchers) it goes into too much detail on specific case studies and arguments.

zeshen 27 Oct 2022 12:47 UTC
3 points
0
on: Epistemic modesty and how I think about AI risk
This is actually what my PhD research is largely about: Are these risks actually likely to materialize? Can we quantify how likely, at least in some loose way? Can we quantify our uncertainty about those likelihoods in some useful way? And how do we make the best decisions we can if we are so uncertain about things?
I’d be really interested in your findings.

zeshen 24 Oct 2022 9:48 UTC
3 points
0
in reply to: mic’s comment on: aisafety.community—A living document of AI safety communities
If there’s an existing database of university groups already, it would be great to include a link to that database, perhaps under “Local EA Group”. Thanks!

zeshen 18 Oct 2022 14:53 UTC
3 points
on: ‘Utility Indifference’ (2010) by FHI researcher Stuart Armstrong
The link in the post no longer works. Here’s one that works.

zeshen 15 Aug 2022 15:59 UTC
3 points
1
on: Comparing Four Approaches to Inner Alignment
This post has helped me clear up some confusions that I had about inner misaglinment for the longest time. Thank you.

zeshen 23 Jul 2022 7:51 UTC
3 points
on: Clarifying Consequentialists in the Solomonoff Prior
The Solomonoff, or Universal, prior is a probability distribution over strings of a certain alphabet (usually over all strings of 1s and 0s). It is defined by taking the set of all Turing machines (TMs) which output strings, assigning to each a weight proportional to
The image right after the paragraph above can’t seem to be displayed after multiple refreshes. The same happens on the AF post. Tried different browsers but it didnt work.

zeshen 28 Feb 2021 14:46 UTC
3 points
on: Consequentialism FAQ
Can this be an article on LW please? This link isn’t very pretty and the raikoth link doesn’t work. Thanks!

zeshen 4 Apr 2023 5:04 UTC
2 points
1
on: “Carefully Bootstrapped Alignment” is organizationally hard
Everyone in any position of power (which includes engineers who are doing a lot of intellectual heavy-lifting, who could take insights with them to another company), thinks of it as one of their primary jobs to be ready to stop
In some industries, Stop Work Authorities are implemented, where any employee at any level in the organisation has the power to stop a work deemed unsafe at any time. I wonder if something similar in spirit would be feasible to be implemented in top AI labs.