Completed an undergrad in CS and Math at Columbia, where I helped run Columbia Effective Altruism and Columbia AI Alignment Club (CAIAC). I’m pursuing a career in technical AI alignment research (probably).
RohanS
Here’s our current best guess at how the type signature of subproblems differs from e.g. an outermost objective. You know how, when you say your goal is to “buy some yoghurt”, there’s a bunch of implicit additional objectives like “don’t spend all your savings”, “don’t turn Japan into computronium”, “don’t die”, etc? Those implicit objectives are about respecting modularity; they’re a defining part of a “gap in a partial plan”. An “outermost objective” doesn’t have those implicit extra constraints, and is therefore of a fundamentally different type from subproblems.
Most of the things you think of day-to-day as “problems” are, cognitively, subproblems.
Do you have a starting point for formalizing this? It sounds like subproblems are roughly proxies that could be Goodharted if (common sense) background goals aren’t respected. Maybe a candidate starting point for formalizing subproblems, relative to an outermost objective, is “utility functions that closely match the outermost objective in a narrow domain”?
Notes on “How do we become confident in the safety of a machine learning system?”
Quick Thoughts on Language Models
Lots of interesting thoughts, thanks for sharing!
You seem to have an unconventional view about death informed by your metaphysics (suggested by your responses to 56, 89, and 96), but I don’t fully see what it is. Can you elaborate?
Basic idea of 85 is that we generally agree there have been moral catastrophes in the past, such as widespread slavery. Are there ongoing moral catastrophes? I think factory farming is a pretty obvious one. There’s a philosophy paper called “The Possibility of an Ongoing Moral Catastrophe” that gives more context.
~100 Interesting Questions
A Thorough Introduction to Abstraction
Content and Takeaways from SERI MATS Training Program with John Wentworth
How is there more than one solution manifold? If a solution manifold is a behavior manifold which corresponds to a global minimum train loss, and we’re looking at an overparameterized regime, then isn’t there only one solution manifold, which corresponds to achieving zero train loss?
Could you please point out the work you have in mind here?