Yeah, more or less. In the abstract, I “suppose that unlimited computation and a complete low-level causal model of the world and the adult human brains in it are available.” I’ve tended to imagine this as an oracle that just has a causal model of the actual world and the brains in it. But whole brain emulations would likely also suffice.
In the code, the causal models of the world and brains in it would be passed as parameters to the metaethical_ai_u function in main. The world w and each element of the set bs would be an instance of the causal_markov_model class.
Each brain gets associated with an instance of the decision_algorithm class by calling the class function implemented_by. A decision algorithm models the brain in higher level concepts like credences and preferences as opposed to bare causal states. And yeah, in determining both the decision algorithm implemented by a brain and its rational values, we look at their responses to all possible inputs.
For implementation, we aim for isomorphic, coherent, instrumentally rational and parsimonious explanations. For rational values, we aggregate the values of possible continuations weighting more heavily those that better satisfied the agent’s own higher-order decision criteria without introducing too much unrelated distortion of values.
If you or anyone else could point to a specific function in my code that we don’t know how to compute, I’d be very interested to hear that. The only place that I know of that is uncomputable is in calculating Kolmogorov complexity, but that could be replaced by some finite approximation. The rest should be computable, though its complexity may be super-duper exponentially exponential.
In the early stages, I would often find, as you expect, components that I thought would be fairly straightforward to define technically but would realize upon digging in that it was not so clear and required more philosophical progress. Over time, these lessened to more like just technical details than philosophical gaps, until I didn’t find even technical gaps.
Then I started writing automated tests and uncovered more bugs, though for the most part these were pretty minor, where I think a sympathetic programmer could probably work out what was meant to be done. I think around 42% of the procedures defined now have an automated test. Admittedly, these are generally the easier functions and simpler test cases. It turns out that writing code intended for an infinitely powerful computer doesn’t exactly lend itself to being tested on current machines. (Having a proper testing framework, however, with the ability to stub and mock objects might help considerably.)
There’s likely still many bugs in the untested parts but I would expect them to be fairly minor. Still, I’m only one person so I’d love to have more eyes on it. I also like the schema idea and have often thought of my work as a scaffold. Even if you disagree with one component, you might be able to just slot in a different philosophical theory. Perhaps you could even replace every component but still retain something of the flavor of my theory! I just hope it’s more like replacing Newtonian mechanics than phlogiston.
I agree that there can be a skill involved in observation but isn’t there also a cost in attention and energy? In that case, it probably isn’t wise to try to observe anything and everything. Perhaps there are some principles for noticing when observation is likely to be worthwhile.
I also worry about generalizing too much from the example of fiction, which is often crafted to try to make nothing arbitrary. That property seems far less likely to apply to reality.
If you mean an AGI that optimizes for human values exactly as they currently are will be unaligned, you may have a point. But I think many of us are hoping to get it to optimize for an idealized version of human values.
Both eliminative materialism and reductionism can acknowledge that consciousness is not necessary for explanation and seek a physical explanation. But while eliminativists conclude that there is no such thing as consciousness, reductionists say we simply would have discovered that consciousness is different from what we might have initially thought and is a physical phenomenon. Is there a reason you favor the former?One might think eliminativism is metaphysically simpler but reductionism doesn’t really posit more stuff, more like just allowing synonyms for various combinations of the same stuff.Reductionism seems much more charitable. If you can interpret someone either as talking falsely nearly all the time or as often speaking truth, even if some of what they said would need to be revised, I think you’d need a compelling reason to attribute the false claims.Reductionism also seems necessary to make sense of our values, which often makes essential reference to consciousness. How would an eliminativist make sense of suffering being bad if there’s no such thing as conscious suffering? Strictly speaking, a classical hedonic utilitarian who is an eliminative materialist seems committed to the view that nothing really matters and everything is permitted.