• # Good arguments—notes on Craft of Research chapter 7

Arguments take place in 5 parts.

1. Claim: What do you want me to believe?

2. Reasons: Why should I agree?

3. Evidence: How do you know? Can you back it up?

4. Acknowledgment and Response: But what about … ?

5. Warrant: How does that follow?

This can be modeled as a conversation with readers, where the reader prompts the writer to taking the next step on the list.

Claim ought to be supported with reasons. Reasons ought to be based on evidence. Arguments are recursive: a part of an argument is an acknowledgment of an anticipated response, and another argument addresses that response. Finally, when the distance between a claim and a reason grows large, we draw connections with something called warrants.

The logic of warrants proceeds in generalities and instances. A general circumstance predictably leads to a general consequence, and if you have an instance of the circumstance you can infer an instance of the consequence.

Arguing in real life papers is complexified from the 5 steps, because

• Claims should be supported by two or more reasons

• A writer can anticipate and address numerous responses. As I mentioned, arguments are recursive, especially in the anticipated response stage, but also each reason and warrant can necessitate a subargument.

You might embrace a claim too early, perhaps even before you have done much research, because you “know” you can prove it. But falling back on that kind of certainty will just keep you from doing your best thinking.

• # Sources—notes on Craft of Research chapters 5 and 6

## Primary, secondary, and tertiary sources

Primary sources provide you with the “raw data” or evidence you will use to develop, test, and ultimately justify your hypothesis or claim. Secondary sources are books, articles, or reports that are based on primary sources and are intended for scholarly or professional audiences. Tertiary sources are books and articles that synthesize and report on secondary sources for general readers, such as textbooks, articles in encyclopedias, and articles in mass-circulation publications.

The distinction between primary and secondary sources comes from 19th century historians, and the idea of tertiary sources came later. The boundaries can be fuzzy, and are certainly dependent on the task at hand.

I want to reason about what these distinctions look like in the alignment community, and whether or not they’re important.

The rest of chapter five is about how to use libraries and information technologies, and evaluating sources for relevance and reliability.

Chapter 6 starts off with the kind of thing you should be looking for while you read

## Look for creative agreement

• Offer additional support. You can offer new evidence to support a source’s claim.

• Confirm unsupported claims. You can prove something that a source only assumes or speculates about.

• Apply a claim more widely. You can extend a position.

## Look for creative disagreement

• Contradictions of kind. A source says something is one kind of thing, but it’s another.

• Part-whole contradictions. You can show that a source mistakes how the parts of something are related.

• Developmental or historical contradictions. You can show that a source mistakes the origin or development of a topic.

• External cause-effect contradictions. You can show that a source mistakes a causal relationship.

• Contradictions of perspective. Most contradictions don’t change a conceptual framework, but when you contradict a “standard” view of things, you urge others to think in a new way.

The rest of chapter 6 is a few more notes about what you’re looking for while reading (evidence, reasons), how to take notes, and how to stay organized while doing this.

# The alignment community

I think I see the creative agreement modes and the creative disagreement modes floating around in posts. Would it be more helpful if writers decided on one or two of these modes before sitting down to write?

Moreover, what is a primary source in the alignment community? Surely if one is writing about inner alignment, a primary source is the Risks from Learned Optimization paper. But what are Risks’ primary, secondary, tertiary sources? Does it matter?

Now look at Arbital. Arbital started off to be a tertiary source, but articles that seemed more like primary sources started appearing there. I remember distinctively thinking “what’s up with that?” it struck me as awkward for Arbital to change it’s identity like that, but I end up thinking about and citing the articles that seem more like primary sources.

There’s also the problem of stuff in the memeplex not written down is the real “primary” source while the first person who happens to write it down looks like they’re writing a primary source when in fact what they’re doing is really more like writing a secondary or even tertiary source.

• # notes (from a very jr researcher) on alignment training pipeline

Training for alignment research is one part competence (at math, cs, philosophy) and another part having an inside view /​ gears-level model of the actual problem. Competence can be outsourced to universities and independent study, but inside view /​ gears-level model of the actual problem requires community support.

A background assumption I’m working with is that training as a longtermist is not always synchronized with legible-to-academia training. It might be the case that jr researchers ought to publication-maximize for a period of time even if it’s at the expense of their training. This does not mean that training as a longtermist is always or even often orthogonal to legible-to-academia training, it can be highly synchronized, but it depends on the occasion.

It’s common to query what relative ratio should be assigned to competence building (textbooks, exercises) vs. understanding the literature (reading papers and alignment forum), but perhaps there is a third category- honing your threat model and theory of change.

I spoke with a sr researcher recently who roughly said that a threat model with a theory of change is almost sufficient for an inside view /​ gears-level model. I’m working from the theory that honed threat models and your theory of change are important to calculate interventions. See Alice and Bob in Rohin’s faq.

I’ve been trying by doing exercises with a group of peers weekly to hone my inside view /​ gears-level model of the actual problem. But the sr researcher i spoke to said mentorship trees of 1:1 time, not exercises that jrs can just do independently or in groups, is the only way it can happen. This is troublesome to me, as the bottleneck becomes mentors’ time. I’m not so much worried about the hopefully merit-based process of mentors figuring out who’s worth their time, as I am about the overall throughput. It gets worse though- what if the process is credentialist?

Take a look at the Critch quote from the top of Rohin’s faq:

I get a lot of emails from folks with strong math backgrounds (mostly, PhD students in math at top schools) who are looking to transition to working on AI alignment /​ AI x-risk.

Is he implicitly saying that he offloads some of the filtering work to admissions people at top schools? Presumably people from non-top schools are also emailing him, but he doesn’t mention them.

I’d like to see a claim that admissions people at top schools are trustworthy. No one has argued this to my knowledge. I think sometimes the movement falls back on status games, unless there is some intrinsic benefit to “top schools” (besides building social power/​capital) that everyone is aware of. (Indeed if someone’s argument is that they identified a lever that requires a lot of social power/​capital, then they can maybe put that top school on their resume to use, but if the lever is strictly high quality useful research (instead of say steering a federal government) this doesn’t seem to apply).

• Is he implicitly saying that he offloads some of the filtering work to admissions people at top schools?

I don’t think Critch’s saying that the best way to get his attention is through cold emails backed up by credentials. The whole post is about him not using that as a filter to decide who’s worth his time but that people should create good technical writing to get attention.

• Critch’s written somewhere that if you can get into UC Berkeley, he’ll automatically allow you to become his student, because getting into UC Berkeley is a good enough filter.

• Where did he say that? Given that he’s working at UC Berkeley I would expect him to treat UC Berkeley students preferentially for reasons that aren’t just about UC Berkeley being able to filter.

It’s natural that you can sign up for one of the classes he teaches at UC Berkeley by being a student of UC Berkeley.

Being enrolled into MIT might be just as hard as being enrolled into UC Berkeley but it doesn’t give you the same access to courses taught at UC Berkeley by it’s faculty.

• http://​​acritch.com/​​ai-berkeley/​​

If you get into one of the following programs at Berkeley:

• a PhD program in computer science, mathematics, logic, or statistics, or

• a postdoc specializing in cognitive science, cybersecurity, economics, evolutionary biology, mechanism design, neuroscience, or moral philosophy,

… then I will personally help you find an advisor who is supportive of you researching AI alignment, and introduce you to other researchers in Berkeley with related interests.

and also

While my time is fairly limited, I care a lot about this field, and you getting into Berkeley is a reasonable filter for taking time away from my own research to help you kickstart yours.

• Okay, he does speak about using Berkeley as a filter but he doesn’t speak about taking people as his student.

It seems about helping people in UC Berkeley to connect with other people in UC Berkeley.

• # Questions and Problems—thoughts on chapter 4 of Craft of Doing Research

Last time we discussed the difference between information and a question or a problem, and I suggested that the novelty-satisfied mode of information presentation isn’t as good as addressing actual questions or problems. In chapter 3 which I have not typed up thoughts about, A three step procedure is introduced

1. Topic: “I am studying …”

2. Question: ”… because I want to find out what/​why/​how …”

3. Significance: ”… to help my reader understand …” As we elaborate on the different kinds of problems, we will vary this framework and launch exercises from it.

Some questions raise problems, others do not. A question raises a problem if not answering it keeps us from knowing something more important than its answer.

The basic feedback loop introduced in this chapter relates practical with conceptual problems and relates research questions with research answers.

Practical problem -> motivates -> research question -> defines -> conceptual/research problem -> leads to -> research answer -> helps to solve -> practical problem (loop)

## What should we do vs. what do we know—practical vs conceptual problems

Opposite eachother in the loop are practical problems and conceptual problems. Practical problems are simply those which imply uncertainty over decisions or actions, while conceptual problems are those which only imply uncertainty over understanding. Concretely, your bike chain breaking is a practical problem because you don’t know where to get it fixed, implying that the research task of finding bike shops will reduce your uncertainty about how to fix the bike chain.

### Conditions and consequences

The structure of a problem is that it has a condition (or situation) and the (undesirable) consequences of that condition. The consequences-costs model of problems holds both for practical problems and conceptual problems, but comes in slightly different flavors. In the practical problem case, the condition and costs are immediate and observed. However, a chain of “so what?” must be walked.

Readers judge the significance of your problem not by the cost you pay but by the cost they pay if you don’t solve it… To make your problem their problem, you must frame it from their point of view, so that they see its cost to them.

One person’s cost may be another person’s condition, so when stating the cost you ought to imagine a socratic “so what?” voice, forcing you to articulate more immediate costs until the socratic voice has to really reach in order to say that it’s not a real cost.

The conceptual problem case is where intangibles play in. The condition in that case is always the simple lack of knowledge or understanding of something. The cost in that case is simple ignorance.

### Modus tollens

A helpful exercise is if you find yourself saying “we want to understand x so that we can y”, try flipping to “we can’t y if we don’t understand x”. This sort of shifts the burden on the reader to provide ways in which we can y without understanding x. You can do this iteratively: come up with _z_s which you can’t do without y, and so on.

## Pure vs. applied research

Research is pure when the significance stage of the topic-question-significance frame refers only to knowing, not to doing. Research is applied when the significance step refers to doing. Notice that the question step, even in applied research, refers to knowing or understanding.

### Connecting research to practical consequences

You might find that the significance stage is stretching a bit to relate the conceptual understanding gained from the question stage. Sometimes you can modify and add a fourth step to the topic-question-significance frame and make it into topic-conceptual question-conceptual significance-possible practical application. Splitting significance into two helps you draw reasonable, plausible applications. A claimed application is a stretch when it is not plausible. Note: the authors suggest that there is a class of conceptual papers in which you want to save practical implications entirely for the conclusion, that for a certain kind of paper practical applications do not belong in the introduction.

## AI safety

One characterisitic of AI safety that makes it difficult both to do and interface with is the chains of “so what” are often very long. The path from deconfusion research to everyone dying or not dying feels like a stretch if not done carefully, and has a lot of steps when done carefully. As I mentioned in my last post, it’s easy to get sucked into the “novel information for it’s own sake” regime at least as a reader. More practical oriented approaches are perhaps those that seek new regimes for how to even train models, and the “so what?” is answered “so we have dramatically less OODR-failures” or something. The condition-costs framework seems really beneficial for articulating alignment agendas and directions.

## Misc

• “Researchers often begin a project without a clear idea of what the problem even is.”

• Look for problems as you read. When you see contradictions, inconsistencies, incomplete explanations tentatively assume that readers would or should feel the same.

• Ask not “Can I solve it?” but “will my readers think it ought to be solved?”

• “Try to formulate a question you think is worth answering, so that down the road, you’ll know how to find a problem others think is worth solving.”

• # The audience models of research—thoughts on Craft of Doing Research chapter 2

Writers can’t avoid creating some role for themselves and their readers, planned or not

1. I’ve found some new and interesting information—I have information for you

2. I’ve found a solution to an important practical problem—I can help you fix a problem

The authors recommend assuming one of these three. There is of course a wider gap between information and the neighborhood of problems and questions than there is between problems and questions! Later on in chapter four the authors provide a graph illustrating problems and questions: Practical problem -> motivates -> Research question -> defines -> Conceptual/research problem. Information, when provided mostly for novelty, however, is not in this cycle. Information can be leveled at problems or questions, plays a role in providing solutions or answers, but can also be for “its own sake”.

I’m reminded of a paper/​post I started but never finished, on providing a poset-like structure to capabilities. I thought it would be useful if you could give a precise ordering on a set of agents, to assign supervising/​overseeing responsibilities. Looking back, providing this poset would just be a cool piece of information, effectively: I wasn’t motivated by a question or problem so much as “look at what we can do”. Yes, I can post-hoc think of a question or a problem that the research would address, but that was not my prevailing seed of a reason for starting the project. Is the role of the researcher primarily a writing thing, though, applying mostly to the final draft? Perhaps it’s appropriate for early stages of the research to involve multi-role drifting, even if it’s better for the reader experience if you settle on one role in the end.

Additionally, it occurs to me that maybe “I have information for you” mode just a cheaper version of the question/​problem modes. Sometimes I think of something that might lead to cool new information (either a theory or an experiment), and I’m engaged moreso by the potential for novelty than I am by the potential for applications.

I think I’d like to become more problem-driven. To derive possibilities for research from problems, and make sure I’m not just seeking novelty. At the end of the day, I don’t think these roles are “equal” I think the problem-driven role is the best one, the one we should aspire to.

[When you adopt one of these three roles, you must] cast your readers in a complementary role by offering them a social contract: _I’ll play my part if you play yours … if you cast them in a role they won’t accept, you’re likely to lose them entirely… You must report your research in a way that motivates your readers to play the role you have imagined for them.

The three reader roles complementing the three writer roles are

1. Entertain me

2. Help me solve my practical problem

3. Help me understand something better

It’s basically stated that your choice of writer role implies a particular reader role, 1 mapping to 1, 2 mapping to 2, and 3 mapping to 3.

Role 1 speaks to an important difficulty in the x-risk, EA, alignment community; which is how not to get drawn into the phenomenal sensation of insight when something isn’t going to help you on a problem. At my local EA meetup I sometimes worry that the impact of our speaker events is low, because the audience may not meaningfully update even though they’re intellectually engaged. Put another way, intellectual engagement can be goodhartable, the sensation of insight can distract you from your resolve to shatter your bottlenecks and save the world if it becomes an end itself. Should researchers who want to be careful about this avoid the first role entirely? Should the alignment literature look upon the first reader role as a failure mode? We talk about a lot of cool stuff, it can be easy to be drawn in by the cool factor like some of the non-EA rationalists I’ve met at meetups.

I’m not saying reader role number two absolutely must dominate, because it can diverge from deconfusion which is better captured by reader role number three.

## Division of labor between reader and writer, writer roles do not always imply exactly one reader role

Isn’t it the case that deconfusion/​writer role three research can be disseminated to practical (as opposed to theoretical) -minded people, and then those people turn question-answer into problem-solution? You can write in the question-answer regime, but there may be that (rare) reader who interprets it in the problem-solution regime! This seems to be an extremely good thing that we should find a way to encourage. In general reading the drifts across multiple roles seems like the most engaged kind of reading.

• there’s a gap in my inside view of the problem, part of me thinks that capabilities progress such as out-of-distribution robustness or the 4 tenets described in open problems in cooperative ai is necessary for AI to be transformative, i.e. a prereq of TAI, and another part of me that thinks AI will be xrisky and unstable if it progresses along other aspects but not along the axis of those capabilities.

There’s a geometry here of transformative /​ not transformative cross product with dangerous not dangerous.

To have an inside view I must be able to adequately navigate between the quadrants with respect to outcomes, interventions, etc.

• If something can learn fast enough, then it’s out-of-distribution performance won’t matter as much. (OOD performance will still matter -but it’ll have less to learn where it’s good, and more to learn where it’s not.*)

*Although generalization ability seems like the reason learning matters. So I see why it seems necessary for ‘transformation’.

• testing latex in spoiler tag

Testing code block in spoiler tag

:::hm? x :: Bool -> Int -> String :::