If I get it correctly, your issue is with the Markov Property of MDP? It simplifies the computation of the policy by not requiring to know the path by which the agent arrived at a given state; but it also removes the information about the history that is not written down into the state itself.Not sure if you know it or if it is that useful, but this section of “Reinforcement Learning: an introduction” discuss ways to go beyond MDP and the Markov property.
That’s a great idea! Are some people interested in a more structured version of this, something like a writing group where everyone proposes its writing and the other comment on it?
Either way, I’m interested on having feedback for something I’m currently writing, whose draft I will probably finish at the end of this week. I’m interested in feedback on content, and on readability.
I’m also up to comment on structure, arguments and readability for others.
I really like the idea that preferences are observed after the fact, because I feel like there is some truth to it for human beings. We act, and then become self-aware of our reactions and thoughts, which leads us to formulate some values. Even when we act contrary to those values, at least inside, we feel shitty.
But that doesn’t address the question of where do these judgements and initial reactions come from. And also how this self-awareness influences the following actions.
Still, this makes me want to read the rest of your research!
I am definitely interested in participating, as I’m learning the field, and starting to work on research. For the moment I don’t feel like I can run one of these myself, but I’ll be eventually there, and will propose myself.
Thanks a lot for the recommendation! I’ll look into it.
Hum, good idea. At least it can’t get worse. ^^
True. Do you think I should still list and quickly explain the stories that are “useless” for this point someplace?
I saw there is a Coronavirus tag now. Is there some way to use this tag to not see any post related to the topic? Because I only managed to go to the page with only these posts, and I think pretty much all the value of such a tag is in filtering. I mean, if I want to see many posts with coronavirus news or advice, I can just look at the front page, I don’t need a tag.
Great post! This makes me think of the problem of specification in formal methods: what you managed to formalize is not necessarily what you wanted to formalize. This is why certified software is only as good as the specification that was used for this certification. And that’s one of my main intuitions about the issues of AI safety.
One part of the problem of specification is probably about interfacing, like you write, between the maths of the real world and the maths we can understand/certify. But one thing I feel was not mentioned here is the issue of what I would call unknown ambiguity. One of the biggest difficulties of proving properties of programs and algorithms, is that many parts of the behavior are considered obvious by the designer. Think something like the number of processes cannot be 0, or this variable will never take this value, even if its of the right type. Most of the times, when you add these obvious parts, you can finish the proof. But sometimes the “trivial” was hiding the real problem, which breaks the whole thing.
So I think another scarce resource are people that can explicit all the bits in the system. People that can go to all the nitpick, and rebuild everything from scratch.
Do you have references of posts of those people who think goal-directedness is binary-ish? That would be very useful, thanks. :)
I don’t get why the client AU from the perspective of the robber doesn’t drop when the robber enters, or just before? Because even if I’m the robber and I know they won’t like it and won’t be able to do things after I’m in, they can still do things in the bank before I’m in. And if they’re out before I come in, their AU will be the same than if I was never there.
If you do a matrix multiplication the obvious way, this results in dot products of rows and columns (one for each element of the resulting matrix). So it seems to me that improving matrix to matrix multiplication performance comes from improving the performance of dot products.
This seems like a decent explanation of Hardware Matrix Multiplication, even if it lacks concrete sources.
As for a tensor, I think these references explain it better that I can at my current level. But the intuition is that it’s a generalization of a matrix to high-dimensions, with additional properties when transformed.
Interesting discussion of epistemic status at the end. I like the intellectual honesty behind it, but your point that they are now internalized also makes sense.
On the part about writing while thinking of the audience, I want to recommend the best book about writing I ever read: Writing with Style, by John Trimble. Although it’s not perfect, a writing book that start by explaining the thought process of experimented writers, and how it differ from the one of the novice, is just amazingly useful.
I’ll let Trimble the last words:
Books on writing tend to be windy, boring, and impractical. I intend this one to be different—short, fun, and genuinely useful.
One connection my Babble made while reading this post is between Circumambulation and The Feynman Method. The latter is inspired by an event in the biography of the late Richard Feynman, where he wrote in a notebook all the things he knew about physics, and poked into every hole.
My Prune tells me this is probably irrelevant, since Circumambulation in this post seems more about the blocks to the generation of ideas than the deep understanding of a subject. But I don’t have to listen to him.
I really like how the posts in this sequence use technical analogies. You refer to some advanced concepts like expanders, but they don’t feel tacked into the ideas. I even learned about implict representation of graphs! (though I knew bounded-degree graphs)
One nitpick is that Ramanujan probably had an amazing Prune too. I feel he’s impressive because he was right so many times. And when he went astray, it was apparently because his lack of schooling in mathematics made him overlooks some aspects of the problem. That feels like the combination of an amazing Babble and Prune, with the Babble getting the better of the Prune for the mistakes.
Thanks for this awesome post! I like the babble/prune distinction, but the analogy to randomized algorithms was probably the more helpful idea in here for me. It made perfect sense, since a lot of probabilistic algorithms are really simple combinations of random babble and efficient pruning.
This analogy makes me wonder: given that many in complexity theory assume that BPP = P, what is the consequence of derandomization on Babble and Prune? Will we eventually be able to babble deterministically, such that we have a high guaranteed probability of finding what we looked for while pruning?
A slight issue with the post: I disagree that poetry is pure babble/phonetic babble. Some parts of poetry are only about the sounds and images, but many poems try to compress and share a feeling, an idea, an intuition. That is to say, meaning matters in poetry.
What I came up with before reading the spoilers or the next posts in the sequence:
A big deal is any event that significantly changes my expected ability to accomplish my goals (whether by having an impact specific to me, or an objective impact).