Alignment Field Building and AI alignment focused, especially interested in agent foundations
Jonas Hallgren
[Question] Does agent foundations cover all future ML systems?
It feels kind of weird that this post only has 50 upvotes and is hidden in the layers of lesswrong as some skeleton in the closet waiting to strike at an opportune time. A lot of big names commented on this post and even though it’s not entirely true and misrepresenting what happened to an extent it would make sense to kind of promote this type of a post anyway. It’s setting a bad example if we don’t promote as we then show that we don’t encourage criticism which seems very anti-rational. Maybe a summary article of this incident could be done and put on the main website? It doesn’t make sense to me that a post with a whooping 900 comments should be this hidden and it sure doesn’t look good from an outside perspective.
Maybe this isn’t the most productive comment but I just wanted to say that this was a really good post. It’s right down my alley with video games and academics at the same time and I would therefore like to declare it a certified hood classic. (apparently Grammarly thinks this comment is formal which is pretty funny.)
Wow, this changed my life! Never thought I would find something this mind-blowingly overpowered on LessWrong!
[Question] Is it worth making a database for moral predictions?
But the problem runs deeper than that. If we draw an arrow in the direction of the deterministic function, we will be drawing an arrow of time from the more refined version of the structure to the coarser version of that structure, which is in the opposite direction of all of our examples.
As I currently understand this after thinking about it for a bit, we are talking about the coarseness of the model from the perspective of the model in the timeframe that it is in and not the time frame that we are in. It would make sense for our predictions of the model to become more coarse with each step forward in time if we are predicting it from a certain time into the future time-space. I don’t know if this makes sense but I would be grateful for a clarification!
Good question, this is rather applied on a system scale level so for example, a democratic system is going to be inherently more reversible than a non-democratic system. An action that goes against the reversibility of a system could for example be the removal of freedom of speech as it would narrow down the potential pathways of future civilizations. Reversibility has an opportunity cost inherent to it as it asks us to take into consideration the possibility of other morals being correct. This is like Pascal’s mugging but with the stakes that if we have the wrong moral theory then we lose a lot. This means that if you have a utilitarian lens it might be less effective as there are actions that might be good from the utilitarian standpoint such as turning everything into hedonium, that are bad from a reversibility standpoint as we can’t change anything from there.
I will post my favourite poem to describe how I feel:
Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.
Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.
Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.
Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do not go gentle into that good night.
Grave men, near death, who see with blinding sight
Blind eyes could blaze like meteors and be gay,
Rage, rage against the dying of the light.
And you, my father, there on the sad height,
Curse, bless, me now with your fierce tears, I pray.
Do not go gentle into that good night.
Rage, rage against the dying of the light.
I will not go gentle into that cold night.