How do you feel about Bayeslord’s description of Jhana meditation being a positive form of prediction error, creating a sort of feedback loop of bliss?
lillybaeum
Now this is effective altruism.
I’ve seen some convincing arguments that water is not wet.
This isn’t related to the post directly, but do you think that public transportation being free would be a good or bad decision for any reasonably large city (Chicago, Boston, New York, etc)?
Good meaning ‘good for people, good for the city’s local economy generally (via other benefits besides income from fares)’
This is a weird and stupid question, but did you used to be an admin on Hellmoo?
It’s really interesting to hear that people go this far in this regard. I had thought maybe I was overthinking it, but it seems like some people like yourself find a lot of value in cataloguing these things beyond just bookmarking them on the site or vaguely remembering the concepts and searching when they need them.
This is really interesting and useful.
Particularly, the two things you linked are just interesting on their own, but also although I don’t think my brain works in the same way yours does, I appreciate your perspective and how you tend to work with regards to these things. I think that I need something like a reference or a bookmark because these concepts don’t stick quite as strongly in my mind without lots of repeated exposure. I tend to be a ‘ground-up’ learner (if that’s even a thing) as opposed to someone who can keep lots of disparate concepts separately in my mind. Jargon and acroynms seem to fall out of my head like a sieve. I’ve confused the terms ‘anosmia’ and ‘aphasia’ for years. I just had to look up ‘word for not being able to remember words’ in order to remember the word aphasia. Ironic, right? Shiri’s Scissor/sort by controversial is an article I already read once in the past, but completely forgot until you linked it, I clicked it, and I read four paragraphs of it.
I think you might be right. For example, any of the logo changes I described is going to necessarily be related to making the company more attractive to investors by seeming more ‘modern’, and a lot of these changes are probably not simply decided upon by the designers themselves, but are also incentivized and meddled with by higher-ups who want things to look more like another, more popular and profitable app.
[Question] What do you do to remember and reference the LessWrong posts that were most personally significant to you, in terms of intellectual development or general usefulness?
I assume you live in the US or Canada. The fact that you feel the need to give the 9-year-old a kid license (the tile is smart!) I think points to societal issues to do with norms and structure that lead to the sort of effects described in the OP.
US and Canadian cities (and much of Europe and the developing world that designed their cities by the West’s example) are generally not designed in a way that is friendly towards kids exploring and existing in the world safely.
I don’t mean ‘safely’ as in ‘they might fall down and scrape their knee or get lost’, I mean ‘safely’ as in ‘they might get struck by a driver going 40mph while staring at their phone as they barrel down a stroad’ or ‘they need to walk 3 miles to get to the nearest convenience store or park’.
It’s easy to find a number of examples of parents being disciplined or even arrested for allowing their children to walk to school, the store, or the park. To allow a child outside without guidance is considered gravely irresponsible by western society at large in a way that really isn’t healthy or helpful for promoting independence, in my opinion.
https://reason.com/2023/01/30/dunkin-donuts-parents-arrested-kids-cops-freedom/
https://www.cnn.com/2014/07/31/living/florida-mom-arrested-son-park/index.html
In Japan there’s a cultural rite of passage (usually in smaller towns, it seems) where children sometimes as young as 3 or 4 are sent on an errand, usually to go to the store and pick up a few things, or visit a family friend and retrieve something. There’s a Netflix series documenting a slightly more staged version of this, called ‘Old Enough!’. It’s very cute.
Here’s another potentially interesting article regarding this, from NPR, about playground safety:
I hope one day we can organize our society in a way in which kids can experience safe amounts of risk and develop into capable human beings. Thanks for doing your part.
[Question] Do websites and apps actually generally get worse after updates, or is it just an effect of the fear of change?
Haven’t read your entire post yet but agree broadly with the idea. Unsure of your methodology but I think knowledge has to be built from the ground-up. Lack of understanding leads to frustration. Upvote systems encourage that difficult concepts must not simply be described but also taught/explained thoroughly rather than just ‘pointed at’.
For example, I can understand on some level if someone tries to explain to me why object oriented design patterns in programming are inferior to procedural, but if I’ve never made programs with either methodology, I will only understand the broadest strokes, none of the examples given or reasoning will really resonate with me.
On average, when describing any concept, a certain number of people will have the necessary ‘base understanding’ to grok it based on the explanation, and an additional number of people will need significantly more explanation to understand.
I think on one side of the extreme, you have an explanation from someone with an extremely autistic brain, going into far more detail than one might need, assuming the listener is lacking all relevant information.
On the other side, you have the schizophrenic or manic brained explanation, which describes things completely intuitively, assuming that the listener understands all of the unspoken elements without needing them to be explained. Most people would think that it just sounds like complete gibberish.
I think the perfect middle ground is the ‘highly esteemed teacher-brained explanation’, someone who describes things both basically and intuitively in perfect amounts, so the widest audience is capable of understanding even some amount of the concept. Imagine the best teacher you’ve ever had in college, whoever was able to really convey difficult concepts in a way you immediately understood on a fundamental level, allowing you to then develop more complex understanding. I think upvote based systems, at their best, encourage this sort of information.
I think at their WORST, upvote systems discourage valuable discourse that requires an understanding of the subject matter so that you can intuitively grok a difficult, novel piece of information.
This then causes the content to trend towards being easily comprehensible but lower overall quality, novelty and complexity. This is often referred to as speaking to the ‘lowest common denominator’ when referred to derisively. This is the ‘endless summer’ of internet communities. The larger and less specified a demographic is, the less unique, interesting, and high quality it becomes, as the content valued by the average user is different than the content valued by the informed, experienced, insular user.
If your system intends to solve these problems, I support it strongly. I think that a website/app can support a large community without also being lowered in quality. I think the endless summer effect is not an inevitability of all systems of this type, but a symptom of describing the ‘most valuable information’ as the ‘most upvoted or engaged-with information’ which is frequently not the case! I mean, that’s clearly evident to anyone who’s used Reddit.
- 11 Dec 2023 7:27 UTC; 8 points) 's comment on Proposal for improving the global online discourse through personalised comment ordering on all websites by (
You may want to look into Toki Pona, a language ostensibly built around conveying meaning in the fewest, simplest possible expressions.
One can explain the most complex things despite having only 130~ words, almost like ‘programming’ the meaning into the sentence, but as the sentence necessarily gets longer and longer, one begins to wonder the necessity of encoding so much meaning.
You can only point to the Tao, you can’t describe it or name it directly. Information is much the same way, I think.
I was listening to a podcast the other day Lex Friedman interviewing Michael Littman and Charles Isbell, and Charles told an interesting anecdote.
He was asked to teach an ‘introduction to CS’ class as a favor to someone, and he found himself thinking, “how am I going to fill an hour and a half of time going over just variables, or just ‘for’ loops?” and every time he would realize an hour and a half wasn’t enough time to go over those ‘basic’ concepts in detail.
He goes on to say that programming is reading a variable, writing a variable, and conditional branching. Everything else is syntactic sugar.
The Tao Te Ching talks about this, broadly: everything in the world comes from yin and yang, 1 and 0, from the existence of order in contrast to chaos. Information is information and it gets increasingly more complex and interesting the deeper you go. You can study almost anything for 50 years and still be learning new things. It doesn’t surprise me at all that such interesting, complex concepts come from number lines and negative sqrts, these are actually already really complex concepts, they just don’t seem that way because they are the most basic concepts one needs to comprehend in order to build on that knowledge and learn more.
I’ve never been a programmer, but I’ve been trying to learn Rust lately. Somewhat hilariously to me, Rust is known as being ‘a hard language to learn’, similarly to Haskell. It is! It is hard to learn. But so is every other programming language, they just hide the inevitable complexity better, and their particular versions of these abstractions are simpler at the outset. Rust simply expects you to understand the concepts early, rather than hiding them initially like Python or C# or something.
Hope this is enlightening at all regarding your point, I really liked your post.
Thank you! That’s very kind of you to say. I haven’t spent a lot of time ‘assimilating into LessWrong’ so I sometimes worry that I come off as ignorant or uninformed when I post, it’s nice to hear that you think I made some sense.
Regarding ‘shower thoughts’ and ‘distraction-removal’ as far as its’ relation to cell phones and youtube videos and other ‘super fun’ activities as one might call them, I definitely think that there’s something there.
I’ve long had the thought that ‘shower thoughts’ are simply one of the rare times in a post-2015ish world that people actually have the opportunity to be bored. Being bored is important. It makes you pursue things other than endless youtube videos, video games, porn, etc. As well, showering and washing dishes and other ‘boring’ activities are meditative!
It’s a common meme these days that people need to always watch something while they eat. Some people listen to podcasts while they shower. Some people use their phone at stoplights. All of this points to a tendency for people to fill every single empty space of any kind with content of some sort, and it really doesn’t seem healthy for the human brain.
This is an interesting video I watched today while filling every single empty moment in my life with content like I’m being disparaging about, and it relates to the topic. The author describes a process by which you can actually do the sorts of things you want to do by making sure there isn’t anything else in that block of time that’s more fun / satisfying / engaging. If work is the most fun thing you’re allowing yourself to do, then you’re going to work. If you’re locked in a room with a book and a cell phone, you’re going to want to use the cell phone. If you just have a book, you’re going to read the book. You can apply this principle to your entire life.
Sorry if this post seems a little chaotic, lots of thoughts and I didn’t have the time or energy at the end of the day to link them together more coherently...
I recently wrote a Question about learning that lacked a lot of polish but poked at a few of the ideas discussed here. I haven’t had time just now to read the entire post but I plan to come back to it and comb through it to try to shore up the ideas I have about learning right now. I’m also reading Ultralearning which is interesting although a little popsci. I find all this stuff really interesting because I’ve been having a lot of trouble learning things lately, feeling like my brain just isn’t working like it used to since I got covid. I’ve tried programming probably 5-6 times in the past in my life and I’m giving it another go now, hoping it can stick this time.
Also, regarding Downwell: Try playing without ever jumping, just falling. Fall on enemies that are bounce-able without ever jumping or shooting and see how deep you can get. You can get pretty far this way!
Do you want to elaborate on that?
The following is text from Claude Opus 3. I generally find people just dumping answers from LLMs to be kind of cringe, but in this case, as I was using it to try to understand the post and your comments better, I came across some really genuinely insightful-feeling stuff and felt as though Claude’s input might be helpful to you in some way, and that it would be remiss not to share it. I’m sorry if it’s nonsensical, I’m not informed enough on the topic to know.
“Regarding the relationship between belief states and “features”, I think there are a few important connections:
Belief states can be thought of as the “true features” that optimally compress the information needed for prediction. In the simplex visualizations, each point corresponds to a unique belief state—a particular probabilistic weighting of the possible hidden states of the generator at that point in the sequence. These belief states are the most parsimonious representation of the relevant information for predicting the future.
The “features” often discussed in mech interp (e.g. as visualized in attribution maps or extracted by probing classifiers) can be seen as approximations or projections of these true belief states. When we probe an LM’s activations and find that certain neurons or directions correspond to interpretable features like “noun-ness” or “sentiment”, we’re effectively discovering linear projections that approximate some aspect of the true belief state.
The topology and geometry of the belief state simplex can give us intuitions about the structure of these feature spaces. For example, the fact that belief states are constrained to a simplex (rather than an unbounded space) suggests that there may be inherent trade-offs or dependencies between different features. The trajectories of belief states through the simplex as a sequence evolves could also inform us about the dynamics of how features interact and compose.
So in a sense, belief states provide a principled foundation for thinking about what the “ideal features” for a given predictive task would be, and mech interp techniques can be seen as ways to approximate or probe these ideal features in real models.
Regarding toy models and information-theoretic aspects, some elements that could bridge between formal systems and natural language are:
Temporal dependencies and long-range correlations: Natural language exhibits complex temporal structure at multiple scales (e.g. within sentences, across paragraphs, across documents). Toy sequence models with tunable temporal correlations could help us understand how belief states track and compress these dependencies.
Compositionality and recursion: Language exhibits rich compositional structure, where smaller units (morphemes, words, phrases) combine in systematic ways to produce complex meanings. Toy generative grammars or rule systems could help us study how belief states navigate these combinatorial spaces.
Stochasticity and entropy: Language is inherently stochastic and exhibits varying levels of predictability (e.g. some words are much more predictable from context than others). Toy models with tunable entropy could help us understand how belief states adapt to different levels of predictability and how this impacts the geometry of the feature space.
Hierarchical structure: Language operates at multiple levels of abstraction (e.g. phonology, morphology, syntax, semantics, pragmatics). Toy models with explicit hierarchical structure could illuminate how belief states at different levels interact and compose.
The key idea would be to start with minimally complex toy systems that capture some core information-theoretic property of language, fully characterize the optimal belief states in that system, and then test whether the key signatures (e.g. the topology of the belief state space, the trajectory of dynamics) can be found in real language models trained on natural data.
This could provide a roadmap for building up more and more realistic models while maintaining a principled understanding of the underlying information-theoretic structures. The goal would be to eventually bridge the gap between our understanding of toy systems and the much more complex but often qualitatively similar structures found in real language models.
Of course, this is a highly ambitious research program and there are many challenges to contend with. But I believe this kind of cross-pollination between formal methods like Computational Mechanics and more empirical mech interp work could be very fruitful.”