Feedback welcomed: www.admonymous.co/zeshen
zeshen
aisafety.community—A living document of AI safety communities
I missed the crux of the alignment problem the whole time
My Thoughts on the ML Safety Course
A newcomer’s guide to the technical AI safety field
What if we approach AI safety like a technical engineering safety problem
Levels of goals and alignment
Embedding safety in ML development
“Because it’s there”—George Mallory in 1923, when asked why he wanted to climb Everest. He died in his summit attempt the following year.
Summary of ML Safety Course
This seems to be another one of those instances where I wish there was a dual-voting system to posts. I would’ve liked to strong disagree with the contents of the post without discouraging well-intentioned people from posting here.
I feel like a substantial amount of disagreement between alignment researchers are not object-level but semantic disagreements, and I remember seeing instances where person X writes a post about how he/she disagrees with a point that person Y made, with person Y responding about how that wasn’t even the point at all. In many cases, it appears that simply saying what you don’t mean could have solved a lot of the unnecessary misunderstandings.
[Question] Is there a way to sort LW search results by date posted?
Strong upvoted. I was a participant of AISC8 in the team that went on to launch AI Standards Lab, which I think counterfactually would not be launched if not for AISC.
Thanks for this post. I’ve always had the impression that everyone around LW have been familiar with these concepts since they were kids and now know them by heart, while I’ve been struggling with some of these concepts for the longest time. It’s comforting to me that there are long time LWers who don’t necessarily fully understand all of these stuff either.
This post makes sense to me though it feels almost trivial. I’m puzzled by the backlash against consequentialism, it just feels like people are overreacting. Or maybe the ‘backlash’ isn’t actually as strong as I’m reading it to be.
I’d think of virtue ethics as some sort of equilibrium that society has landed ourselves in after all these years of being a species capable of thinking about ethics. It’s not the best but you’d need more than naive utilitarianism to beat it (this EA forum post feels like commonsense to me too), which you describe as reflective consequentialism. It seems like it all boils down to: be a consequentialist, as long as you 1) account for second-order and higher effects, and 2) account for bad calculation due to corrupted hardware.
This is a great guide—thank you. However, in my experience as someone completely new to the field, 100-200 hours on each level is very optimistic. I’ve easily spent double/triple the duration on the first two levels and not get to a comfortable level.
My first impression was also that axis lines are a matter of aesthetics. But then I browsed The Economist’s visual styleguide and realized they also do something similar, i.e. omit the y-axis line (in fact, they omit the y-axis line on basically all their line / scatter plots, but almost always maintain the gridlines).
Here’s also an article they ran about their errors in data visualization, albeit probably fairly introductory for the median LW reader.
I’d love to see a post with your reasonings.
I got the book (thanks to Conjecture) after doing the Intro to ML Safety Course where the book was recommended. I then browsed through the book and thought of writing a review of it—and I found this post instead, which is a much better review than I would have written, so thanks a lot for this!
Let me just put down a few thoughts that might be relevant for someone else considering picking up this book.Target audience: Right at the beginning of the book, the author says “This book is written for the sophisticated practitioner rather than the academic researcher or the general public.” I think this is relevant, as the book goes to a level of detail way beyond what’s needed to get a good overview of engineering safety.
Relevance to AI safety: I feel like most engineering safety concepts are not applicable to alignment, firstly because an AGI would likely not have any human involvement in its optimization process, and secondly the basic underlying STAMP constructs of safety constraints, hierarchical safety control structures, and process models are simply more applicable to engineering systems. As stated in p100, ““STAMP focuses particular attention on the role of constraints in safety management.“ and I highly doubt an AGI can be bounded by constraints. ” Nevertheless, Chapter 8 STPA: A New Hazard Analysis Technique that describes STPA (System Theoretic Process Analysis) may be somewhat relevant to designing safety interlocks. Also, the final chapter (13) on Managing Safety and the Safety Culture, is broadly applicable to any field that involves safety.Criticisms on conventional techniques: The book often mentions that techniques like STAMP and STPA is superior than other conventional techniques like HAZOP and gives quotes by reviewers that attest to their superiority. I don’t know if those criticisms are really fair, given how it is not really adopted at least in the oil and gas industry that, for all its flaws, takes safety very seriously. Perhaps the criticisms could be fair for very outdated safety practices. Nevertheless, the general concepts of engineering safety feels quite similar whether it uses ‘conventional’ techniques or the ‘new’ techniques described in the book.
Overall, I think this book provides a good overview of engineering safety concepts, but for the general audience (or alignment researchers) it goes into too much detail on specific case studies and arguments.
This is a nice post that echoes many points in Eliezer’s book Inadequate Equilibria. In short, it is entirely possible that you outperform ‘experts’ or ‘the market’, if there are reasons to believe that these systems converge to a sub-optimal equilibrium, and even more so when you have more information that the ‘experts’, like in your Wave vs Theorem example.