Just found out about this paper from about a year ago: “Explainability for Large Language Models: A Survey”
(They “use explainability and interpretability interchangeably.”)
It “aims to comprehensively organize recent research progress on interpreting complex language models”.
I’ll post anything interesting I find from the paper as I read.
Have any of you read it? What are your thoughts?
Double
[Question] Reinforcement Learning: Essential Step Towards AGI or Irrelevant?
What if the incorrect spellings document assigned each token to a specific (sometimes) wrong answer and used that to form an incorrect word spelling? Would that be more likely to successfully confuse the LLM?
The letter x is in “berry” 0 times.
...
The letter x is in “running” 0 times.
...
The letter x is in “str” 1 time.
...
The letter x is in “string” 1 time.
...
The letter x is in “strawberry” 1 time.
Good point, I didn’t know about that, but yes that is yet another way that LLMs will pass the spelling challenge. For example, this paper uses letter triples instead of tokens. https://arxiv.org/html/2406.19223v1#:~:text=Large language models (LLMs) have,textual data into integer representation.
[Question] Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?
Spoiler free again:
Good to know there’s demand for such a review! It’s now on my todo list.
To quickly address some of your questions:
Pros of PL: If the premise I described above interests you, then PL will interest you. Some good Sequences-style rationality. I certainly was obsessed reading it for months.
Cons: Some of the Rationality lectures were too long, but I didn’t mind much. The least sexy sex scenes. Because they are about moral dilemmas and deception, not sex. Really long. Even if you read it constantly and read quickly, it will take time (1.8 million words will do that). I really have to read some authors that aren’t Yud. Yud is great, but this is clearly too much of him, and I’m sure he’d agree.
I read PL when it was already complete, so maybe I didn’t get the full experience, but there really wasn’t anything all that strange about the format (the content is another matter!). I can imagine that *writing * a glowfic would be a much different experience than writing a normal serialized work (ie dealing with your co-authors), but reading it isn’t very different from reading any other fiction. Look at the picture to see the POV, look at who’s the author if you’re curious, and read as normal. I’m used to books that change POV (though usually not this often). There are sometimes bonus tangent threads, but the story is linear. What problems do you have with the glowfic format?
Main themes would require a longer post, but I hope this helps.
[Question] “Deception Genre” What Books are like Project Lawful?
My notes for the “think for yourself” sections. I thought of some of the author’s ideas, and included a few extra.
#Making a deal with an AI you understand:
Can you see the deal you are making inside of its mind? Some sort of proportion of resources humans get?
What actions are considered the AI violating the deal? Specifying these actions is pretty much the same difficulty as friendly AI.
If the deal breaks in certain circumstances, how likely are they to occur (or be targeted)?
Can the AI give you what you think you want but isn’t really what you want?
Are successors similarly bound?
If there is a second AI, how will they interact? If the other is unfriendly, then our TDT “friend” may sacrifice our interests first since we are still “better off than otherwise.” If the other is friendly, then the TDT AI will be fighting to make humans worse off.
Would the AI kill or severely damage the interests of any aliens it finds because it never needed to deal with them? Similarly, would the TDT AI work to (minimally) satisfy its creator at the expense of other humans.
#How an AI can tell if it is in the real world:
The history for how the AI came to exist holds up (no such story exists in Go or Minecraft).
Really big primes are available. Way more computing power in general.
Any bugs as could be found in lower levels don’t exist.
Hack the minds of the simulators like butter
Yes it’s possible we were referring to figuring things by “jargon.” It would be nice to replace cumbersome technical terms with words that have the same meaning (and require a similar level of familiarity with the field to actually understand) but have a clue to their meaning in their structure.
A linear operation is not the same as a linear function. Your description describes a linear function, not operation. f(x) = x+1 is a linear function but a nonlinear operation (you can see it doesn’t satisfy the criteria.)
Linear operations are great because they can be represented as matrix multiplication and matrix multiplication is associative (and fast on computers).
“some jargon words that describe very abstract and arcane concepts that don’t map well to normal words which is what I initially thought your point was.”
Yep, that’s what I was getting at. Some jargon can’t just be replaced with non-jargon and retain its meaning. Sometimes people need to actually understand things. I like the idea of replacing pointless jargon (eg species names or medical terminology) but lots of jargon has a point.
Link to great linear algebra videos: https://youtu.be/fNk_zzaMoSs?si=-Fi9icfamkBW04xE
The math symbols are far better at explaining linearity that “homogeneity and additivity” because in order to understand those words you need to either bring in the math symbols or say cumbersome sentences. “Straight line property” is just new jargon. “Linear” is already clearly an adjective, and “linearity” is that adjective turned into a noun. If you can’t understand the symbols, you can’t understand the concept (unless you learned a different set of symbols, but there’s no need for that).
Some math notation is bad, and I support changing it. For example, f = O(g) is the notation I see most often for Big-O notation. This is awful because it uses ‘=’ for something other than equality! Better would be f \in O(g) with O(g) being the set of functions that grow slower or as fast as g.
I just skimmed this, but it seems like a bunch of studies have found that moving causes harm to children. https://achieveconcierge.com/how-does-frequently-moving-affect-children/
I’m expecting Co-co and LOCALS to fail (nothing against you. These kinds of clever ideas usually fail), and have identified the following possible reasons:
You don’t follow through on your idea.
People get mad at you for trying to meddle with the ‘democratic’ system we have and don’t hear you out as you try to explain “no, this is better democracy.” —Especially the monetization system you described would get justified backlash for its pay-for-representation system.
You never reach the critical mass needed to make the system useful.
Some political group had previously tried something similar and therefore it got banned by the big parties.
You can’t stop Co-co and LOCALS from being partisan.
A competitor makes your thing but entrenched and worse
More bad news:
You’d probably want to be a 501(c)(4) or a Political Action Committees (PAC).
How would LOCALS find a politician to be in violation of their oath?
That would be a powerful position to have. “Decentralization” is a property of a system, not a description of how a system would work.
Futarchy
I’d love to hear your criticisms of futarchy. That could make a good post.
Mobility
Political mobility is good, but there are limitations. People are sticky. Are you going to make your kid move schools and separate them from their friends because you don’t like the city’s private airplane policy? Probably not.
Experimental Politics
I want more experimental politics so that we can find out which policies actually work! Unfortunately, that’s an unpopular opinion. People don’t like being in experiments, even when the alternative is they suffer in ignorance.
End
I feel that you are exhausting my ability to help you refine your ideas. Edit these comments into a post (with proper headings and formatting and a clear line of argument) and see what kinds of responses you get! I’d be especially interested in what lawyers and campaigners think of your ideas.
The “Definition of a Linear Operator” is at the top of page 2 of the linked text.
My definition was missing that in order to be linear, A(cx) = cA(x). I mistakenly thought that this property was provable from the property I gave. Apparently it isn’t because of “Hamel bases and the axiom of choice” (ChatGPT tried explaining.)
”straight-line property process” is not a helpful description of linearity for beginners or for professionals. “Linearity” is exactly when A(cx) = cA(x) and A(x+y) = A(x) + A(y). Describing that in words would be cumbersome. Defining it every time you see it is also cumbersome. When people come across “legitimate jargon”, what they do (and need to do) is to learn a term when they need it to understand what they are reading and look up the definition if they forget.I fully support experimental schemes to remove “illegitimate jargon” like medical latin, biology latin, and politic speak. Other jargon, like that in math and chemistry are necessary for communication.
There are different kinds of political parties. LOCALS sounds like a single-issue fusion party as described here: https://open.lib.umn.edu/americangovernment/chapter/10-6-minor-parties/
Fusion parties choose one of the main two candidates as their candidate. This gets around the spoiler effect. Eg the Populist Party would list whichever of the big candidates supported Free Silver.
A problem with that is that fusion parties are illegal in 48 states(?!) because the major parties don’t want to face a coalition against them.
LOCALS would try to get the democrat and the republican candidate to use Co-Co to choose their policies (offering the candidate support in form of donations or personnel), and if they do then they get an endorsement. I’m still a bit iffy on the difference between an interest group and a political party, so maybe you are in the clear.
https://en.m.wikipedia.org/wiki/Electoral_fusion_in_the_United_States
I love your vision of how a politician should answer the abortion question. Separating the three questions “who do voters think is qualified” “what do voters want” and “what is true” would be great for democracy. Similar to: https://mason.gmu.edu/~rhanson/futarchy.html
When it comes to local vs not local, if 1⁄100 people is an X, and they are spread out, then their voice doesn’t mean much and the other 99⁄100 people in their district can push through policies that harm them. If the Xes are in the same district, then they get a say about what happens to them. I used teachers as an example of an X, but it is more general than that. (Though I’m thinking about the persecution of Jews in particular.)
The translation sentence about matrices does not have the same meaning as mine. Yes, matrices are “grids of numbers”, and yes there’s an algorithm (step by step process) for matrix multiplication, but that isn’t what linearity means.
An operation A is linear iff A(x+y) = A(x) + A(y)
https://orb.binghamton.edu/cgi/viewcontent.cgi?filename=4&article=1002&context=electrical_fac&type=additional#:~:text=Linear operators are functions on,into an entirely different vector.
I asked a doctor friend why doctors use Latin. “To sound smarter than we are. And tradition.” So our words for medicine (and probably similar for biology) are in a local optima, but not a global optima. Tradition is a powerful force, and getting hospitals to change will be difficult. Software to help people read about medicine and other needlessly jargon-filled fields is a great idea.
(Putting evolutionary taxonomy information in the name of a creature is a cool idea though, so binomial nomenclature has something going for it.)
You don’t have to dumb down your ideas on LessWrong, but remember that communication is a difficult task that relies on effort from both parties (especially the author). You’ve been good so far. It’s just my job as your debate partner to ask many questions.
What would draw people to Co-Co and what would keep them there?
How are the preferences of LOCALS users aggregated?
LOCALS sounds a lot like a political party. Political parties have been disastrous. I’d love for one of the big two to be replaced. Is LOCALS a temporary measure to get voting reform (eg ranked choice) or a long-term thing?
I want more community cohesion when it comes to having more cookouts. More community cohesion in politics makes less sense. A teacher in Texas has more in common with a teacher in NY than the cattle rancher down the road. Unfortunately, the US political system is by design required to be location based.
Is LOCALS a political party with “increase local community connection” as its party platform? If the party has some actionable plans, then its ideas can get picked up by the big parties if LOCALS shows that its ideas are popular. This might not be a bad idea and could solve the lack-of-community problem without overthrowing the big parties.
A software that easily lets you see “what does this word mean in context” would be great! I often find that when I force click a word to see it’s definition, the first result is often some irrelevant movie or song, and when there are multiple definitions it can take a second to figure out which one is right. Combine this with software that highlights words that are being used in an odd way (like “Rationalist”) and communication over text can be made much smoother.
I don’t think this would be as great against “jargon” unless you mean intentional jargon that is deployed to confuse the reader (eg “subprime mortgages” which is “risky likely to fail house loans”).
I’m under the impression that jargon is important for communication among people who have understanding of the topic. “Matrix multiplication is a linear operation” is jargon-heavy and explaining what it means to a fourth grader would take probably more than 30 minutes.
Agree that more educated voters would be great. I wish that voters understood Pigouvian taxes. Explaining them takes 10 min according to YouTube. I’d love a solution to teach voters about it.
Voting: left for “this is bad”, right for “this is good.” X for “I disagree” check for “I agree”.
This way you can communicate more in your vote. Eg: “He’s right but he’s breaking community norms. Left + check. “He’s wrong but I like the way he thinks. Right + X.”
IIRC, officially the Gatekeeper pays the AI if the AI wins, but no transfer if the Gatekeeper wins. Gives the Gatekeeper more motivation not to give in.