Formative vs. summative evaluations
(This is a series of comments that have been turned into a post.)
In the field of usability engineering, there are two kinds of usability evaluations: formative and summative.
Formative evaluations are done as early as possible. Not just “before the product is shipped”, but before it’s in beta, or in alpha, or in pre-alpha; before there’s any code—as soon as there’s anything at all that you can show to users (even paper prototypes), or apply heuristic analysis to, you start doing formative evaluations. Then you keep doing them, on each new prototype, on each new feature, continuously—and the results of these evaluations should inform design and implementation decisions at each step. Sometimes (indeed, often) a formative evaluation will reveal that you’re going down the wrong path, and need to throw out a bunch of work and start over; or the evaluation will reveal some deep conceptual or practical problem, which may require substantial re-thinking and re-planning. That’s the point of doing formative evaluations; you want to find out about these problems as soon as possible, not after you’ve invested a ton of development resources (which you’ll be understandably reluctant to scrap).
Summative evaluations are done at or near the end of the development process, where you’re evaluating what is essentially a finished product. You might uncover some last-minute bugs to be fixed; you might tweak some things here and there. (In theory, a summative evaluation may lead to a decision not to ship a product at all. In practice, this doesn’t really happen.)
It is an accepted truism among usability professionals that any company, org, or development team that only or mostly does summative evaluations, and neglects or disdains formative evaluations, is not serious about usability.
Summative evaluations are useless for correcting serious flaws. (That is not their purpose.) They can’t be used to steer your development process toward the optimal design—how could they? By the time you do your summative evaluation, it’s far too late to make any consequential design decisions. You’ve already got a finished design, a chosen and built architecture, and overall a mostly, or even entirely, finished product. You cannot simply “bolt usability onto” a poorly-designed piece of software or hardware or anything. It’s got to be designed with usability in mind from the ground up. And you need formative evaluation for that.
The same principles apply when evaluating, not products, but ideas.
The time for clarifications like “what did you mean by this word” or “can you give a real-world example” is immediately.
The time for pointing out problems with basic underlying assumptions or mistakes in motivating ideas is immediately.
The time for figuring out whether the ideas or claims in a post are even coherent, or falsifiable, or whether readers even agree on what the post is saying, is immediately.
Immediately—before an idea is absorbed into the local culture, before it becomes the foundation of a dozen more posts that build on it as an assumption, before it balloons into a whole “sequence”—when there’s still time to say “oops” with minimal cost, to course-correct, to notice important caveats or important implications, to avoid pitfalls of terminology, or (in some cases) to throw the whole thing out, shrug, and say “ah well, back to the drawing board”.
To only start doing all of this many months later, is way, way too late.
Note that “Formative evaluations” need not be (indeed, will rarely be) complete works by single contributors. In intellectual discussion as in usability engineering, evaluations will often be collaborative efforts, contributed to by multiple commenters.
And, accordingly, evaluations (whether formative or summative) are made of parts. If a commenter writes “what are some examples”, or “what did you mean by that word?”, or any such thing, that’s not an evaluation. That’s a small contribution to a collaborative process of evaluation. It makes no sense at all to judge the value of such a comment by comparing it to some complete analysis. That is much like saying that a piston is of low value compared to an automobile. We’re not being presented with one of each and then asked to choose which one to take—the automobile or the piston. We’re here to make automobiles out of pistons (and many other parts besides).
Such parts, then, must be judged on their effectiveness as part of the whole—how effectively they contribute to the process of constructing the overall evaluation. A question like “what are examples of this concept you describe?”, or “what does this word, which is central to your post, actually mean?”, are contributions with an unusually high density of value; they offer a very high return on investment. They have the virtuous property of spending very few words to achieve the effect of pointing to a critical lacuna in the discussion, thus efficiently selecting—out of the many, many things which may potentially be discussed in the comments under a post—a particular avenue of discussion which is among the most likely to clarify, correct, and otherwise improve the ideas in the post.
Of course, reviews serve a purpose as well. So do summative evaluations.
But if our only real evaluations are the summative ones, then we are not serious about wanting to be less wrong.
I certainly agree with the emphasis on formative over summative evaluation, but I think the application of these concepts later in this post isn’t quite right.
A core issue for posts (or any other medium, really) which present new ideas is that they usually won’t give the best presentation/explanation of the idea. After all, it’s new, people are still figuring out where the edges of the concept are, what misunderstandings are common in trying to communicate it, how it does/doesn’t generalize, etc. And crucially, that all holds even when the idea is a good one.
So a challenge of useful formative evaluation of new ideas is to separate “fixable” issues, like poor presentation or the idea just not being fully explored yet, from “unfixable” issues, problems which are core and fundamental to the entire idea. And of course this challenge is further exacerbated by various “fixes” requiring specific skill sets which some people possess, but most don’t.
One example consequence of all that: in practice, “can you give a real-world example?” is usually a much more useful contribution to discussion of a new idea than “what do you mean by this word?”. Accurately explaining what one means by a word is an extremely difficult skillset which very few people possess; almost anyone asked what they mean by a word will give some definition or explanation which does not match even their own intuitions about the word, even when their own intuitive understanding is basically correct. (As evidence, one can look at the “definitions” people offer for standard everyday words; think Plato’s chicken.) On the other hand, people are usually able to give real-world examples when their ideas have any concrete basis at all, and this is a useful step in both clarifying and communicating the idea.
Another example, which came up in when writing bounty problems a few months back: we’re pretty sure our problems are gesturing at something real and important, and the high-level mathematical operationalization is right, but some details of the operationalization might be off. This leads to an important asymmetry between the value of a proof vs a counterexample. A proof would be strong evidence that the exact operationalization we have is basically correct. The value of a counterexample, however, depends on the details. If the counterexample merely attacks a noncentral detail of the operationalization, then it would have some value in highlighting that we need to tweak the operationalization, but would not achieve most of the value of solving the problem. On the other hand, a counterexample which rules out anything even remotely similar to the claim, striking directly at the core idea, would achieve the main value.
I agree with this. I think that’s one of the reasons why it’s so important to have discussion in the comments to a post be as unconstrained as possible—so that ideas can be developed, the best versions of them discovered (or constructed).
Totally agreed.
“What do you mean by this word?” is generally not asking for a definition, or a fully accurate explanation, or any such thing; rather, it’s asking for any kind of useful handle on what is even being discussed.
Take the linked example. The answer which ended up being given to “what do you mean by this word” was pretty helpful! But it wasn’t really a “definition”, nor all that “accurate” an explanation… but any further discussion[1] could now proceed usefully on the basis of the ideas in that response, rather than floundering because the intended meaning of the word in question was totally opaque to many readers.
Which, sadly, failed to take place, due to destructive interference from the LW mods.
Unconstrained discussion is not necessarily conducive to robust formative evaluation. Not all criticism is constructive. In fact, criticism can derail discussions by exhausting and demoralizing one’s conversation partner, or by wasting time on noncentral issues.
Steering conversation productively and in a way that reinforces healthy working relationships imposes substantial constraints on the conversation. These constraints are a good thing.
In the marketplace of ideas, given a sufficiently well-designed and well-incentivized karma system, useful and constructive criticism will be rewarded, while useless and counterproductive comments will be punished. This gives the author of a post good initial feedback on what comments contain worthwhile insights that should be addressed/incorporated. The rest, if focused on noncentral issues or appearing useless to the author, can be answered like this:
If this response seems to elide actually important criticisms made by the initial commenter, it is highly likely other commenters will jump in and point out the author’s response is insufficient. This would give further feedback to the author that they should maybe come back and think about the commenter’s perspective some more.
I believe the LW karma system is an instance of this “well-designed and well-incentivized karma system” I was talking about earlier. At the very least, it is more so an instance of it than literally any other comment ranking system I have come across in my life. Somewhat surprisingly to me, several high-status users on this site (including mods!) disagree. I have carefully considered their perspective, and rejected it as self-serving and flatly wrong.
The sibling comment by @sunwillrise says most of what I’d want to say to this, and better than I would’ve said it, so I have little to add but that I endorse his response.
I’ll add just one thing:
In my experience, this is a euphemism for “ensuring that high-status people retain their high status, and that (high-status) conflict-averse people don’t have to deal with real disagreements” approximately 99% of the time.
That’s because this principle is so obvious that it typically goes without saying. Speaking such phrases aloud can indeed serve as a form of rhetoric to defend status hierarchies. Yet when one person is deploying non-constructive criticism, high-status people have the most clout to push back. If their high status genuinely depends on bona fide healthy working relationships and productive conversations, then they’ll be extra motivated to deliver this pushback.
In summary:
The fact that these phrases can also serve as rhetoric to defend status hierarchies is not a particularly strong argument against the ideas they express.
Status hierarchies often depend on productive conversations and healthy working relationships, so it’s no surprise that defending one is entangled with defending the other.
Separately from my other comment:
First, the thing about “one person is [saying/writing whatever things that are somehow problematic]” is that if the things in question are just obviously bad, then they can be downvoted and ignored and then that’s the end of it. If it’s some criticism that has a known and obvious answer, reply with a link to that answer and then ignore. If the criticism is due to the critic having failed to read some part of the post, point that out and then ignore. And so on.
The only reason why you would perceive any need to “push back” is if the criticism were not obviously bad. But if it’s not obviously bad, well, then it deserves an answer. Even if it actually is bad! (And why are you so sure that it’s bad, in this case? But let’s set that aside…)
And in this sort of case, well, by all means “push back”—with a reply. (And once again, please note that it won’t do at all to say “ah but what if the critic persistently makes the same bad criticism”, etc.—in that case, hyperlinks are your friend. Answer it once, then link back to the answer. This is one of the many, many benefits of public discussions: you can write a thing once, and refer back to it thenceforth.)
(This is approximately the same point as made earlier by @sunwillrise.)
Second, as I’ve noted before, the idea that only “constructive” criticism is good is false. Sometimes destructive criticism is good. (“Whatever can be destroyed by the truth, should be.”) The proper standard is is not whether criticism is “constructive”; it’s whether the criticism is correct and relevant.
(In the Soviet Union, it was a standard ploy to reply to certain sorts of arguments or views by saying that they are, for instance, “not in accordance with dialectical materialism”, or “not Marxist-Leninist”, or “that’s Trotskyism!”, etc. This was a considered knockdown argument (and in practical terms it was, because arguing against it means “straight to gulag”)—but of course if you say something and your interlocutor says “that’s Trotskyism!”, this does not actually address the question of whether it’s… true. “Constructive” is one of these sorts of words, which functions as a way to sidestep the question of truth. “That criticism is not constructive!”—yeah, I’m sure it’s also Trotskyism, but let’s get back to the point, shall we?)
Third, on the question of “healthy working relationships” and so on. Sure, it’s usually good, all else being equal, to have a healthy working relationship with someone. But if you make “healthy working relationship” an optimization target, then it gets Goodharted immediately—the measures taken to “reinforce healthy working relationships” immediately become decoupled from the reasons why it’s good (all else being equal! which it’s often not!) to have healthy working relationships. (And of course this is generally deliberate. This sort of line is used, quite knowingly, by high-status people, as a euphemism for “threats to my status are unacceptable”.)
Responding to criticism of any kind is costly in time and emotional energy. Criticism may feel unpleasant to the recipient, or turn out to be incorrect or unimportant, and that’s OK to a certain extent. We need to tolerate a certain amount of net-negative criticism so that people feel like they can afford to make occasional mistakes when attempting to deliver constructive criticism. When an individual delivers a large amount of net-negative criticism (i.e. a mix of incorrect, unpelasant, and time consuming) over an extended period of time, then that eventually becomes a problem.
Constraints on discussion are typically implicit guardrails that most people respect that makes participating in the conversation, project, or community sustainable for most of its members. Those guardrails are sometimes self-imposed (i.e. individuals weighing the consequences of saying vs. not saying X) and sometimes externally enforced (i.e. criticizing the critic, or criticizing the community for having the wrong approach to tolerating criticism).
My argument is that it would be a very bad idea to optimize for the fewest possible guardrails against people’s ability to criticize, and that this seems to be what you’re advocating for in the space of online discourse, at least on LessWrong (“it’s so important to have discussion in the comments to a post be as unconstrained as possible”). A policy of zero guardrails allows conversations to be routinely derailed by Gish gallops, and I claim this not only can happen, but that it’s common knowledge that this happens consistently. There are a number of strategies communities can use to impose guardrails that mitigate this problem, all of which do have substantial costs that those communities appear largely willing to bear because of the much higher cost of allowing Gish gallops to destroy spaces for meaningful discourse.
In certain situations, like Soviet Russia, brutal enforcement of excessive guardrails against true and important criticism can become a much bigger problem than the sort of Gish gallops that degrade online discourse and that we are discussing here. But I regard that as so different from the problem that we’re discussing here that it’s a red herring and I’m not interested in further discussing comparisons between guardrails in online discourse and anti-speech enforcement in totalitarian regimes.
I wholly reject this entire framework.
We ought not even consider the question of whether criticism is “unpleasant”. That it’s unpleasant to receive criticism is just an obvious, banal fact about human psychology. We take it as a baseline assumption, but it’s completely misguided to endorse that reaction. It is a bias to be overcome. Otherwise… well, we’ve been over this.
If criticism is incorrect, then say why it’s incorrect. That’s the whole point of having a discussion. You speak as if everyone always knows in advance what is correct and what is not! If that were true, what the heck would be the point of… any of this? This whole website, the whole rationalist project?
(As for the notion that only “constructive” criticism is good—well, I’ve already addressed that.)
For one thing, I do not advocate, and have never advocated, a policy of having zero guardrails. There ought not to be personal insults, like “you’re an idiot and an asshole” (which is to say, such things should always receive moderator attention, with a view toward heavily discouraging them; I can conceive of exceptions where such comments may be allowed, but they ought to be exceedingly rare). There ought not to be vulgarity. There ought not to be doxxing. There ought not to be spam. The posting of dumb memes and similar low-value content should be discouraged. AI-written text should be heavily policed. Probably I could think of several other obvious sorts of “guardrails” if I gave it more thought, and doubtless you could also. So please refrain from claiming that I endorse a “zero guardrails” policy; I don’t.
As for Gish gallops (and similar things)—I agree that such things are bad! But you know what is a very easy way of dealing with them?
Posting a reply that says “that’s a Gish gallop”.
And then downvoting the comment, and moving on with your life.
(And, as described earlier, if your judgment on that question is mistaken, then other commenters can reply to say “actually, no, that comment makes good points, you’re wrong about it being a Gish gallop, and here’s why”.)
(Indeed, this seems like an excellent role for the moderators to take on: when someone posts bad content like Gish gallops, comment to point this out; when someone unfairly labels a good comment as a Gish gallop, comment to point that out, too.)
You’re quite thoroughly mistaken about this. The problem is not the brutality of the enforcement (do you think that the problem went away when the punishment stopped being “straight to gulag” and became more like “you can kiss goodbye to any career advancement or professional accolades”?); the problem is the ideological approach itself—the “it’s Trotskyism!” reply. If that sort of thing is even allowed to stand without receiving the withering scorn that it deserves, and even more so if it is enforced as the officially sanctioned and presumed-to-be-correct reply, then it’s utterly corrosive to any kind of intellectual work or truth-seeking.
That you see this as a “red herring” is a huge mistake on your part. This kind of problem arises in many forms, and it is fatal to the sort of project that Less Wrong is ostensibly engaged in.
“For one thing, I do not advocate, and have never advocated, a policy of having zero guardrails.”
I’m glad to hear you are comfortable with at least some guardrails. But you did specifically say “it’s so important to have discussion in the comments to a post be as unconstrained as possible.”
“As constrained as possible” means “no guardrails.” If you meant something different, you should have said what you meant, or at least acknowledged that you made a mistake.
(Brackets correcting what I presume is a typo.)
Here is an easy test to see the incorrectness of that reading of my comment: if, on a literal reading, it seems like I really meant “no guardrails” in the sense that you claim to have taken me to mean, then this would mean that I’d be opposed to the moderators deleting obvious (e.g., Russian penis enlargement pill) spam. Does this seem remotely plausible to you?
And before you protest further, let me remind you that we’ve already had this conversation. The link is to a comment thread where I say, in direct response to you specifically, that I dislike and do not endorse vulgarity and name-calling.
Additionally, in this comment (posted in the comment section of the same post as the one linked above), I say:
In this comment (in that same comment section), I say:
Finally, in this more recent comment (on the topic about “moderation tools” etc.), I say:
I have consistently and unambiguously expressed opposition to such behaviors, and support for rules forbidding such behaviors, including (I emphasize again) in direct response to you, personally.
Perhaps you forgot about those past statements. If so, let this be a reminder. I hope that there will be no further confusion on your part about what my position on this matter is.
Can you say more about what makes criticism “constructive” vs. “non-constructive”? If the idea is that constructive criticism proposes solutions (here’s how the thing could be better) rather than just pointing out problems (here’s why the thing is bad), then requiring criticism to be constructive seems bad, because if critics have to propose a solution at the same time that they point out a problem, that prevents pointing out problems that don’t have an immediately apparent solution (but for which a solution might be found in time with further discussion).
I don’t have a comprehensive definition of constructive criticism—I know it when I see it. Presenting solutions is not necessary for constructive criticism.
To take a stab at defining it, constructive criticism is about optimizing the alignment between proposed methods and true goals given the resources available. Non-constructive criticism may be actively worsening the alignment, but it could also just be inefficient. The risk of inefficiency is the issue I’ve been pointing out above.
Without claiming that this is necessarily un-virtuous, I hope that you can see how this sort of thing is evidence for the claim that “constructive criticism” is not primarily a good criterion for truth-seeking, but rather is primarily a weapon for suppression of criticism.
I confess that I have no idea what exactly you could mean by this. I think that it would be most helpful if you could supplement this intensional definition with an extensional one.
In my experience, this is also a euphemism for “threats to the status of high-status people” approximately 99% of the time.
To lay cards on the table, what are you trying to accomplish with this back and forth with me? I’m trying to find common ground and make some important distinctions, but it seems to me like you’re trying to vent or cut off the discussion. Is that what you’re trying to do?
By no means. I am saying true and relevant things, in response to things you are saying which seem to me to be seriously mistaken. The purpose of this is to enable all of us to become less wrong about these very important issues.
I don’t perceive you as having understood what I was saying, or as having addressed my central points.
Please feel free to point out what you think I got wrong, what I missed, etc.
Not only in comments; some ideas are better tested by trying to apply or build on them, rather than just discussing theoretically. It does produce a body of work to reevaluate, though, so those posts should indicate the premises somehow.
Totally agreed. (Ideally, we’d see something like: a post that says “I have an idea, how can I test it? I’m thinking I can try X” [and then the commenters might add Y and Z as additional suggestions for testing] → (the OP and/or other people go and test the idea) → one or more posts that say “I did X Y Z to test that idea, here’s what I observed” [and then there’s discussion] → (rinse, repeat). That’s not necessarily a formal procedure, but something like this cycle happening on a regular basis, with ongoing integration of the learnings from each iteration, would definitely be what we’d want to see, if our interest lies in useful ideas about the real world.)
I think this is good advice. As some feedback, I’d focus on the fact that usabiliy testing is a pull operation, not a push—it takes a lot of effort to guide customers/reviewers toward helpful dimensions. The lessons don’t necessarily apply to online forums or other communication channels.
I also think, that around here, some ideas seem to take root despite significant challenges in early comments—this isn’t a matter of not getting feedback, but of not listening to feedback.
Regardless, this is good advice to people who are looking to strengthen their ideas.
Seems right (Frame Control is an example of this).
Nevertheless, salient top-level comments criticizing the post are nonetheless important for signaling that strong yet intelligent pushback exists (important because of Lonely Dissent reasons) and for coordinating opposition to attempts to introduce a flawed/incoherent/flat-out wrong concept into the site zeitgeist (examples that jump to mind are Kenshō, Circling, etc).
Could you say more about this? I’m not sure that I know what you’re referring to, here.
That’s… true-ish, I think, but my intuition here is that lack of feedback (of a slightly different sort) is responsible for this effect too. But I am not very sure of this; I’d have to think about some specific examples of this to have a better idea of what’s going on there.
It might be worth noting that the formative/summative distinction originated in educational theory and is widely used by schoolteachers.
That makes a lot more sense, since the Summartive Evaluation can inform changes in the way next year’s cohort of students is to be taught (or next round, or whenever it is to be taught to a new group) - so even if you can’t do anything about the last cohort, you can fix it for the next. Whereas the way it’s been presented in this post for Product Design and “ideas”—the cast has been set already forever.
(Note: the parent comment was originally asking about formative evaluations.)
No… this is definitely untrue for product design! That’s the whole point! Formative evaluations are done before anything has been set forever! You do formative evaluations at the beginning of the product development process (and then continuously, throughout the rest of said process).
I really do urge you to follow the link to the NN Group’s article about formative vs. summative evaluations in usability engineering. This is not some sort of utopian, hypothetical, pie-in-the-sky proposal that I’m describing here—it’s how things are in fact done, routinely, in many, many companies and organizations and project teams. (The idea of the “minimum viable product” is a related one. Likewise mockups, proofs of concept, etc.—there are many versions of basically the same idea, and such practices are ubiquitous in effective teams and organizations.)
Sorry I got confused, will fix, I meant “Summarative”—the one that comes after the school year is over or near the end.
Right… well… I do also explain in the post what summative evaluations are for (and why their usefulness is limited, although definitely not zero).
How does this work in practice when someone has an idea they want to present in a post? Are you suggesting that they go towards some kind of immediate peer review as a Formative Evaluation?
How many examples are there of this where the consensus was once treating it as an assumption, and now it isn’t? (I’m more asking within the Rationalist community, not historical paradigms like Heliocentrism or Miasma Theory or even the unfortunate wrongful accusation of Sunil Tripathi.)
Uh… no. “Peer review” is something that happens to a work
after it’s been published(EDIT: Not quite true; see comments below). Formative evaluations are… the opposite of that. I mean, what are you asking, exactly? How do discussions of an idea work? In the usual way: you write a post, people write comments where they discuss the contents of the post. Comments like “what do you mean by that word?”, or “what are some examples of that?”, or “could you clarify what this part means?”, or “if this thing you say in this here part is true, then it seems like X follows—do you agree, and how does this affect your idea?”, or “interesting points; here are some further thoughts on this”, or “here is some related work—what are your thoughts on these things?”, or “does this apply to X?”, or… etc., etc. You know… discussion.(If you’re not familiar with evaluation in usability engineering, it might also help clarify things if you clicked the links I include and read about how it’s done there; analogies to discussion of posts on a forum should present themselves pretty clearly, I think.)
I don’t understand the connection between your question and what you quoted. I never said anything about anything that “the consensus was once treating it as an assumption, and now it isn’t”, so I’m not sure why I would have any examples of this. Please clarify what you’re asking here?
Incorrect, a peer review is reviewing a draft before the work is published.
But you’re saying Formative Evaluations must happen before an idea is absorbed into the local culture. Isn’t when it’s posted too late for that? Since you’re impressing the importance of instantaneous evaluation. I’m just getting very confused how this process looks like in a Forum.
There are a lot of links, which one would you prioritize?
Which specific ideas in the past have you seen been absorbed in to the local culture and become the formation of a dozen or more posts prematurely or that needed more Formative evaluation? Those examples.
True, that was definitely a misstatement on my part.
However, what is still the case is that peer review happens when the work is complete. It doesn’t inform the study design, it doesn’t effect serious changes in the direction of the work, etc.
No. Of course not. Why would that be the case?
Instantaneous…? No…
I… genuinely don’t see what you’re saying here. Why would when it’s posted be too late? This doesn’t make any sense to me.
I mean… I described it. It works by… discussing a post. In the comments. This is… really straightforward and ordinary. I’m really not suggesting anything weird here.
Huh? No, there’s just the one link: “Formative vs. Summative Evaluations”, at nngroup.com.
Oh, I see. Sure, here’s one: “frame control”.
Peer review usually results in papers being accepted with minor or major revisions, and very much can and does effect serious changes in the study design. You can read the peer review back-and-forth in many journals, they are often pretty interesting. Machine learning and computer science are different because they usually publish in conference proceedings. That means there are very tight deadlines, so it’s more common to rebut the reviewer’s comments outside of very minor changes. In my opinion it’s why peer review is seen so poorly in ML, because there’s not much paper-improvement going on as a result of the process.
The study design… of the paper being reviewed?
Yes.
Ok… how does this work exactly? You submit your paper for peer review, they say “you should’ve done this differently from the start”, and you go back and start over…?
Isn’t that… basically what I said? You submit a basically finished product, which might get rejected and you have to start over. But you’re not submitting a paper for peer review midway through the study, right…?
You submit a finished product, yes, and it can be accepted without revisions, but I have never heard of that happening actually and nobody I know has had that happen to them, I believe. Or, it might get rejected (but if so, no, you don’t have to start over. If it was sent for review, you will receive feedback you can use to improve the study, and you may be invited to resubmit after making those changes, or you might submit the same paper to a different journal). Hopefully, it is accepted with major or minor revisions, so you go away and make the requested changes over a few more months, and then the reviewers take another look. And these changes can, but not always, be significant alterations to the study design.
Examples from my recent experience: I submitted a paper recently that developed a new data analysis method and then evaluated it on two different synthetic datasets. I was then asked by the editor for revisions: obtaining and using observational data as well as synthetic data. That’s not changing the original study design, but it is a new chunk of research, a lot of work, and the results have to be interpreted differently. Another paper that I co-authored has been asked for major revisions which, if implemented, would be a massive change in the setup, data used, analysis methodology and narrative of the paper. The lead author is still deciding if they want to do that or instead to withdraw and resubmit somewhere else. On the other hand, often I have only been asked for minor text changes to explain things more clearly.
In Nature, the peer review files are openly available for each article, and they are pretty interesting to read, because papers there often go through quite significant changes before publication. That’s a good way to get an idea of the ways papers and studies can evolve as they go through the peer review process. But, yeah, I assure you, in my experience as an author and reviewer, it is a collaborative process that can really reshape the study design in some cases.
Ah, you’re just suggesting is that people ask more questions like “what does this important word mean?” “Can you give more examples of that”—the onus is falling on the commenters not on some kind of micro-peer-review panel before an author publishes a post?
That the overall process doesn’t need to change, just people ask more of these kinds of questions and to do right after the post is published. Am I oversimplifying it?
At risk of sounding like a broken record: All you’re imploring people to do is to ask more of these types of questions in the comments immediately?
Yes, of course. Just regular commenting on a post. (Of course, the “draft sharing” feature of LW that lets an author share a post draft with some small set of users, for them to comment on it prior to publication, is sort of like a “micro-peer-review panel”, and that’s fine too. Although I wouldn’t call it anything like that; it’s just… commenting on a draft.)
Right, for people to ask more of these kinds of questions, and for authors to invite more of these kinds of questions, and for authors and other commenters to take these kinds of questions as invitations for discussion.
And for the moderation system to not interfere in this process. (This has been the biggest obstacle by far to anything like what I suggest working properly on Less Wrong.)