Scientifically optimizing education: Hard problem, or solved problem? Introducing the Theory of Direct Instruction
Re-edited to remove/integrate much of the added notes—Sep 5th:
This is a long post, and it was a first attempt to simply start trying to explain the whole topic, and see what kind of mistakes I made in the communication.
I did indeed make many mistakes, and started to feel that I should ask people not to read this original attempt at first, and so posted added notes to the beginning to say so and try to clear up the worst confusions.
But now that my audience is starting to close the inferential gap themselves thanks to amazingly wonderful people like Misha with “What Direct Instruction Is”, I think important points that I tried to express in this original foray might start to become more transparent to that audience.
It’s still a very long post, with lots of new terminology, and, as Alicorn said, “sales-y enthusiasm’.
If you do read it, I must ask that you please don’t skim, giving me the benefit of a doubt that anything confusing or nonsensical seeming might actually be something that’s important and meaningful in some non-obvious way that you do not yet understand, and that some of the ‘sales-y enthusiasm’ and “applause lights” may be have been intended to serve some useful purpose.
Again, please don’t skim (although it is completely my fault if you feel like skimming!), because I just don’t know how to do any better until I get more feedback on how the complete whole of what I wrote is understood.
If you do start skimming, and give up, just tell me where you did so.
[The “added notes” from the first edit I’ve removed, and will go through at a later time to extract anything that was original and useful and integrate it into the post itself or whatever.]
Thank you.
Begin original:
In this post, I’m going to introduce Direct Instruction, or DI (pronounced Dee-Eye, capital D, capital I, accept no imitations). DI is essentially the theory of how to find the best way to teach anything to anyone. And I mean a theory in the true scientific sense: parsimonious, rigorously pinned down by experiment, and with an impressive history of predictive successes.
Furthermore, it bestows upon the skillful wielder astonishing powers of engineering, allowing them accomplish educational feats that are nothing short of spectacular compared to what’s traditionally accepted as, well, acceptable. If DI were universally implemented in the school system, it would easily raise the average intelligence to a level that would be considered genius by the standards of today’s average.
Or say someone wanted to set up a community that could consistently raise all its citizens (starting from children or adults) to formidable heights of intelligence, abilities, and rationality. DI would be one of the foundational tools they’d need to make it happen.
It’s obvious how this should interest anyone with the LessWrongian mission of changing the world from the crazy-stupid mess it is now to a sane, smart, good place to live.
And that’s my main purpose in writing this post: to interest. I’m not going to write a tutorial to teach or scholarly report to convince, because that would be redundant with resources already out there (and a lot more work!). Instead, I’m going to do my best to compress a very broad, very deep subject into a (relatively) short “Hey, check this out!”-style piece of writing, quickly hitting the highlights of the science and history well enough to explain why you should go follow the links I’ll post and get your hands on the books I’ll list.
[If you find my compression to be more confusing than intriguing in any place, please help me fix that with feedback.]
Once we’re all on the same page with respect to this background, I’ll be able to write another post on the details how we could use this powerful tool to win. I’ll talk about the highly unusual round-about way I originally came to find out about DI myself, and how that allowed me to notice some creative strategies that (I think) should allow DI to win against strongly established anti-rational forces in the field of education (“How we can help DI win”), as well as ways in which we could apply DI towards our goals ourselves, within our community (“How DI can help us win”).
Again, “what we can do for DI”/“what DI can do for us” in a later post. This post will just be a super-abridged intro to DI itself, pretty much saying “Hey, check this out!”.
And at this point, I feel I should move on to a *slightly* more concrete description of what DI looks like (but nowhere near as concrete as showing a particular program [like “Reading Mastery Signatures level K”] at this point).
So, what does DI look like in practice?
One thing that it’s important to understand about DI is that, while it’s certainly possible for someone who’s very proficient with the theory to teach amazingly well, it is not necessary for the teacher to even know the difference between induction and deduction.
Because of the very logical nature of the sequences implied for teaching any particular learning outcome, following algorithms that are throughly understood by their designers, it is possible for a small number of expert ‘educational engineers’ to create scripted courses that whole schools of non-experts can easily be trained to use for great success. (In somewhat the same way we can have a sufficient number of pilots in the world without having to train each and every one of them to build their own planes, and certainly not expecting them to cobble them together from bits.)
These courses control what the teacher does and says, provide carefully matched expansion activities and independent student work, and even specify the correction procedures that may be necessary. They also provide tests to place new students at the appropriate level in the program, and frequent tests of mastery throughout the course.
These courses are designed using logical rules derived from the basic axioms of the theory (which have been empirically pinned down as correct!), and then, like any high-quality complex machine, field-tested to find any design-errors and correct them before full-scale production (or rather, printing).
[And yes, this logical, algorithmic aspect of the instruction means that DI would be extremely well-suited to creating computer-delivered lessons. If you remember the big fuss about ‘Computer Assisted Learning’ ages ago, yup, DI finally makes it possible for CAL to actually deliver on all those gushing promises.]
Unfortunately, this blessing of scalability-thanks-to-algorithmic-scriptability is also DI’s curse. The very idea of using such tightly scripted lessons immediately sticks in the craw of the vast majority of teachers, and showing them graphs of the overwhelming data showing how much better it is for the students is not very effective, and nor is explaining the theory or attempting to straighten out their philosophical confusions.
For instance, an often articulated concern is that these scripted lessons will restrict the creative freedom of the teacher. A common counter is that this is analogous to claiming the creative freedom of a driver is restricted by not having them design and build their vehicle as they drive it, knowing nothing about the science and engineering necessary.
But of course, this rhetoric, while pretty damn well aligned with evidence, has not been amazingly successful as a strategic tool. It seems that the only significant way in which normal teachers are converted over to DI is by actually using a program correctly themselves, and seeing the amazing difference in their own kids. Unfortunately this does not lead to a multiplication factor greater than one in the spread of DI.
But the subject of DI’s historical and continuing struggle to overthrow the anti-scientific establishment of the field of education is covered in some of the resources I’ll list at the end of this post, so I won’t go into any more detail here. And again, I will discuss creative strategies for turning the tide of this struggle in a later post. At this point, knowing that this is an audience that does properly appreciate experimental evidence, I will move to a discussion of something called Project Follow-Through.
Evidence from Project Follow-Through:
Project Follow-Through was originally conceived as a social program in “The War Against Poverty”, but, due to lack of funding, ended up morphing instead into the largest educational experiment in history. It ran nine years from 1968 to 1977, cost like a billion bucks, and involved over 200,000 students in over 170 communities across the US, from kindergarten though to grade three.
It had a ‘planned variation’ design, otherwise described as a ‘horse race’ between all the different models popular in the field of education at the time, comparing them as composite wholes to find which worked best (rather than prematurely trying to isolate the effects of different variables within the models). And despite some name changes, the range of ideas in these models is pretty much representative of the common ideas in the field today.
Each school site was ‘sponsored’ by one of the competing models, or was self-sponsored. Sponsors got funding and some support to make sure their models actually got implemented in all the schools they were responsible for.
The majority of involved communities had disadvantaged populations, and the average level of performance in the controls was at the 20th percentile.
Data was handled by two third-parties, with Stanford Research Institute using a variety of standardized achievement tests to collect it all, and Abt Associates doing the analysis.
Here’s a graph showing performance in different basic skills for the nine models with sufficient data to evaluate, relative to that 20th percentile baseline:
Pretty striking, eh?
Here’s another graph showing children’s gains/losses for the nine models in the area of basic skills (the above-graphed skills lumped together), cognitive skills (things like problem solving, creative thinking, etc), and affective skills (things like self-esteem, sense of responsibility for own learning, attitude towards school, etc). Baseline zero represents children who did not participate in Follow-Through.
You can see along the bottom that although models had been pre-classified as focused on primarily addressing one of these three areas, none of the ‘affective’ models had a positive effect on affective skills, and none of models had a positive effect on cognitive skills except for ‘basic skills-oriented’ DI, which raised everything, and quite a lot.
Also, while I’ve never looked very deeply into the details of the other models (for rather the same reason you don’t look very deeply into the details of various tribal witch-doctor systems when you want to know about physics), I’d bet that DI was the sponsor with the most problems getting sites to implement the model properly. These results by no means show the limits of what DI can do.
Follow Through was by far the largest and best-funded experiment, doing the most comprehensive comparison of DI and the other competing ‘theories’ in the field. It is also easy to tell as a dramatic story, hence my selection of it for an introduction.
However, there have been many other interesting experiments since, demonstrating impressive things that DI makes possible (and confirming that both low-performers and high-performers are best served by DI!). One researcher who conducted a meta-analysis of 34 studies making 173 direct experimental comparisons of DI and non-DI educational interventions said this:
The mean effect size average per study is more that .75, which confirms that the overall effect is substantial. … effects of .75 and above are rare in educational research. DI’s consistent achievement of such scores is unique in educational research. [My emphasis]
Again, I’ll list resources at the end. At this point, I’m going to move on from evidence demonstrating DI’s outstanding superiority in the field of education, to the theory of DI itself.
A quick sketch of the basic theory:
The LessWrong audience should be uniquely prepared to ‘click’ on DI theory, already understanding things like extensional/intensional definitions, ‘looking into the dark’, thingspace, and being more likely to respond with a “hm, that sounds like it might be interesting” than a blank look if someone says ‘guided induction’.
Still, because of the depth of the subject, I had particular trouble in compressing this section, because I had to choose between:
a) Writing this section as a detailed-and-easy-to-follow intro to the very beginning, but leaving you with no clear idea of how far it goes from there
or
b) Writing it as a super-abridged whirlwind tour in order to better capture the full breadth, but with some risk of ending up burying you in an avalanche of new terminology and lightning jumps from concept to concept
I ended up opting for (b) here, since a tutorial for the basics already exists as an online open module at Athabasca University (as usual, link later), and my purpose in this post is, as I said, more to pique interest than to teach.
[But again, if you find the whirlwind below more confusing than intriguing, please help me fix that with some feedback.]
So, I’m gonna jump right in and kick things off the same way as the book Theory of Instruction: Principles and Applications, and tell you that ‘the analysis of cognitive learning is at the intersection of three other analyses’. One is the behavioral analysis of the learner, also called the ‘response-locus analysis’ in DI. It’s covered in DI theory, but all I’m gonna do here is note that and move on to the other two analyses: that of the communication used to teach, and that of the knowledge systems being taught.
These, the analysis of communications and the analysis of knowledge systems, form the ‘stimulus-locus analysis’, and are the utterly fascinating first focus of DI.
Imagine you want a student to learn something, so you present an instructional sequence, and it fails. You wonder, why? If you had a hundred copies of that that student, and you presented the exact same sequence to all of them, would it fail 100% of the time? Or would it succeed some portion of the time because the student has some random chance of correctly ‘guessing what it’s supposed to mean’ occasionally?
Of course you can’t do an experiment like that, because there’s no way you could control the variable of the learner finely enough. So what about controlling the variable of the stimulus used to communicate with the learner?
What you could do is create a ‘logically faultless communication’, with a structure that you know from logical analysis of the communication itself will be successful with a learner with certain characteristics. (Then, even if the instruction fails, you end up with some highly specific information about the learner, which you can then use to figure out how to create success, by applying it to a behavioral analysis of the learner).
[The term “logically faultless communication” does not suggest that if a learner fails to learn, then the learner is the problem and not the theory. In fact, the most common aphorism in the DI community is, “If the learner hasn’t learned, the teacher hasn’t taught”. Until this seems perfectly consistent to you, you will know for sure you are not understanding the technical meaning of “logically faultless communication”.]
The basic axioms of the stimulus-locus analysis, therefore, are:
1) The learning mechanism of the learner can learn any concept/quality from examples
2) The learning mechanism generalizes based on the samenesses of examples (it ‘makes up a rule’)
(Note that how exactly the ‘learning mechanism’ does these things is unimportant here; This isn’t a theory of learning, but of instruction.)
Given these axioms, and a minimal amount of information about the learner’s prior knowledge, it’s now possible to design the logically faultless communication as a sequence of positive and negative examples of the concept to be taught. The major principles for doing this, which logically follow from the axioms, are:
- Signals of positive and negative must be clear and consistent
- Only the features the learner is supposed to generalize should be shared by the whole set of positive examples
- Greatly different positive examples must be juxtaposed to show the range of variation of the concept
- Minimally different positive and negative examples must be juxtaposed to show the borders of the concept
- The instruction must integrate a test of generalization
This is why I say that a huge part of the basics of DI is ‘guided-induction’ (my term, not used in the field).
Aside: If you’re familiar with the logical induction game ‘zendo’, it’s like you’re playing some sort of backwards version where you as the Master are trying to communicate the koan to the Students as well as possible by first showing a sequence of koans with and without Buddha-nature, and marked so, and then presenting a sequence of unmarked koans for the Students to respond to.
That’s a quick sketch of the basis of the analysis of communications (skipping over lots of details of how this plays out in controlling extrapolation, stipulation, and interpolation and stuff).
Now, as I said, the analysis of communications is the first part of the stimulus-locus analysis, and it leads directly to the second part: the analysis of knowledge systems.
The aim of the knowledge-systems analysis is to create a classification scheme that groups concepts by their samenesses in logical structure, so that samenesses in the logical structure of concepts are systematically related to samenesses in the logical structure of the communications used to teach the concepts. Thus classification of anything you want to teach in this scheme will tell you the basic template forms you must use and the steps you must go through to design effective instruction for it.
I won’t go into any details of this hierarchy here, since that would involve explaining a lot of terminology and the concepts behind it, but that’s all in Theory of Instruction, along with details on the response-locus analysis, and details of designing programs, field-testing them, and using the data from the field-tests to correct design-errors and optimize the whole thing.
I also want share a quote from Siegfried Engelmann, ‘the father of DI’, about when he was writing the text Theory of Instruction with colleague Douglas Carnine:
If we drew a unique logical conclusion about behavior, Doug would indicate that he knew of no experimental data on this issue and would ask if I knew of any empirical data. The answer was usually “no,” so Doug would conduct a study.
Ten studies alone were done on all the details of the template for teaching a basic non-comparative concept like “red”. For instance, the presence of negative examples, the number of features they differed from positives by, the way examples were juxtaposed, variations in presentation wording, etc. In every case, DI theory’s unique and detailed predictions were validated.
However, at this point I feel that I have probably focused enough on the very basic principles of the theory, the application of which is relatively obvious to the teaching of basic discriminations, but at a greater inferential distance to more advanced concepts. Therefore I’m going to hop over to one of my favorite short examples of the original kind of thinking that comes out of DI about how to teach things.
A quick example of a more advanced application of the stimulus-locus analysis:
This section is a quick adaptation of something I wrote elsewhere. I’m including it here as an example of how DI can produce unexpectedly original conclusions about how to teach various things, which differ greatly from what is intuitively obvious, but which, once understood, are obviously logically overwhelmingly superior.
This particular story starts with a list I once saw presented in a book as an example of possible long-term goals for a kindergarten class. In context, it was just used as an example of what an explicit list of goals might look like, but it was the very last bullet that caught my eye:
“Develop basic math concepts (for examples, numbers 1-20 and shapes)”
Numbers 1-20.
How can I best put this.… EPIC FAIL!
It’s not at all obvious at first why this is so wrong, so I’ll explain by outlining the correct way to teach the transformation relationship between numerals and their English names, and the rational for this method.
You don’t teach 1-20 first, you teach 1-99. And you do it in a very special order:
- First, teach 1-10
- Then, do the 40s, 60s, 70s, 80s, and 90s
Why? Because this is the simplest, most regular, largest subset of this numeral-name transformation relationship.
The rule is simply, “First you say the number of tens, add a ‘ty’ (which is just a distorted ‘ten’), and then follow it with the number of ones if it’s not a zero”. So you see “41”, and you think “okay, that’s four-ty-one”.
[Note that this verbal explanation I just presented is not how this is presented to the kids. This a description of what you’re teaching, not how.]
- Then you move on to teaching the 20s, 30s, and 50s.
Why? Because these form another large subset, which involves one more addition to the rules that governed the last subset: You simply distort “two” to “twen”, “three” to “thir”, and “five” to “fif”, so that you get “twen-ty-one” rather than “two-ty-one”.
-Then you can move on to 14, 16, 17, 18, and 19.
This subset is far smaller, and involves more complicated behaviors.
The part of the number’s name that tells the ones digit comes first, followed by the part that tells the tens digit, “-teen”, which is another, different distortion of ten.
You think: “14 → ten-four → teen-four (distort) → four-teen (invert).”
-Then you can do 13 and 15 (the tiniest subset), which are the same as above, but also involve another distortion of “three”->”thir” and “five”->”fif”. (Luckily this distortion is already familiar to the kids from working the 30s and 50s! Clever, eh?)
-And finally, the wacky irregulars 12 and 11 can be thrown in.
This order is optimal for making clear to the learner that there is an orderly relationship here. They get the simple rules that cover the largest single group of cases, then they get the slightly more complicated rules that cover the next largest subtype, etc.
This makes clear what the basic pattern is, that the exceptions are exceptions, and exactly how they are exceptions.
Thus you can teach 1-99 far faster and easier than you can 1-20.
[Note that the student must not be worked to mastery on each subtype before the introduction of the next, because this would induce stipulation that the subtype was universal, but given proper pacing for any sequence of introduction, this order is optimal.]
Can you imagine being a very young kid, truly naïve to this concept (not having had ridiculous amount of informal exposure at home as a kid from a non-disadvantage background), and having someone try to teach you 11 to 20 after just getting up to 1 to 10?
For 11 and 12 you’re thinking somewhere in your brain, “Okay, does every number have its own unique name as you keep counting up?” (You might also wonder: “How many numbers are there, anyway?”)
For 13 you really haven’t seen anything that contradicts that ‘every number gets it’s own unique name’ hypothesis (how likely do you think it is to occur to you that the ‘thir’ is related to the ‘3’ in ’13′ and the ‘teen’ to the ‘1’? Nah, aint gonna happen).
At 14 it might occur to you to wonder if the ‘four’ in ‘fourteen’ has something to do with the ‘4’ in ’14, but since ‘FIFteen’ doesn’t seem to have a ‘five’ in it, you’ll move that hypothesis to the backburner.
At the introduction of 16 you’ll go, “Hm, I wonder if the ‘six’ in ‘sixteen’ is related to the ‘6’ in… Naaaah, I’m not gonna fall for that one again!”
At 17 you start to reconsider it. 18 and 19 bring it back up to full level of serious consideration, by which time you’re pretty sure there’s at least some bits with some sort of pattern in here...
And then they throw ‘twenty’ at you.
Huh? I mean, huh?
Now hopefully you can see how the obvious intuitive way of teaching something can be not merely, “Oh, maybe it could be better if you did, like, this or that”, but actually downright horrifyingly logically broken and wrong and bad.
Whosoever adopts this crazy ‘teach my kindergarteners 1-20’ goal is going to horribly slow down and confuse their kids. Not just ‘they might be able to teach it better’. They’re doing it wrong.
In DI, a relationship like this one between numerals and their names is classified as a ‘transformation concept’, and the treatment I described above is called ‘subtype analysis’. Hopefully it should now seem quite reasonable to think that this abstract concept could be applied to the teaching of many other not obviously related things (like grammatical conjugation rules in a foreign language, for instance), and that similarly suprising-yet-logical conclusions would be drawn by the stimulus-locus analysis for other concepts in the classification schemes as well as these ‘transformations’.
Moving on, I feel at this point that I am unlikely to improve the quality of my super-abridged compression significantly per unit of additional agonizing over it, and I will now present the promised list of resources.
Resources on DI online and in print:
- The introductory open module at Athabasca University
This provides a very short biography of Siegfried Engelmann (as I mentioned, the ‘father of DI’), an overview of Project Follow-Through and associated history, and a much easier to follow introduction to the basics of the theory and the application of of the stimulus-locus analysis to the first of the ‘basic forms’ in the classification hierarchy.
- The book, Theory of Instruction: Principles and Application
This is pretty much the equivalent of Newton’s Principia for the field of education, except luckily it’s not written in Latin.
Ironically, in reading this text you will often find yourself wishing that the techniques in this book had been applied to the book itself (and seeing quite clearly how they could be). Understandably though, the authors had to first articulate it all rigorously for themselves, and having done so, and given the low interest in the field in a true scientific theory, they decided to focus their engineering efforts on creating more programs for school children instead.
Nevertheless, the LessWrong audience shouldn’t find it too difficult. The AthabascaU module largely covers the basics presented in the first few modules, and having read that and thus already having the concepts in mind, it’s quite easy to adapt to the language, after which the extremely logical nature of the ideas presented makes it quite easy to follow.
- The book, Research on Direct Instruction: 25 Years Beyond DISTAR [DISTAR was an early set of DI programs focused on arithmetic and reading]
This is the source of the quote from the meta-analysis I mentioned. It also covers studies on other things such as a program for teaching deaf and non-deaf people to interpret spoken words transformed into tactile vibrations, and some experiments that falsified Piaget’s developmental theory (!)
Theory of Instruction also has a section on research.
- Engelmann has also written two books intended for a popular audience with titles that may be overly provocative from a strategic standpoint, but are definitely spot-on in terms of accuracy: “War Against the School’s Academic Child Abuse” and “Teaching Needy Kids in Our Backwards System”.
These books deal with many educational issues which are more often than not both historical and current. Much of the material is presented in a partially autobiographical context.
Although I believe we are going to be able to largely step around most of the frustrating quagmires of institutionalized irrationality detailed in these books, I believe it’s still good to have a good understanding of exactly what it is we’re side-stepping, and many interesting bits of science and things are tied into a common narrative framework too. And finally, since they’re written for a popular audience and quite easy reads, I would definitely recommend these books as worthwhile.
- Engelmann’s personal website zigsite.com has many interesting short (and not-so-short) documents. I would recommend ”Curriculum as the cause of failure”, (a couple pages are duplicated in that pdf) and its contextual prologue, for instance, and the video interviews.
How much of the material you’re interested in depends on how much you just want to know only about the science itself, and how much you want to know about the horrible lack of science in the field outside of DI.
- I also found the interview by Children of the Code (a dyslexia organization) to be worthwhile. It’s in two parts, here, and here.
Conclusion:
That is probably a sufficient amount of material for now. I’m hoping that your first taste will draw you in to voraciously devouring everything you can get your hands on, as happened for me.
However, there is one very significant way in which my experience will differ from yours. As I mentioned in passing, I originally found out about DI in a very unusual round-about way. In fact, I became interested in it long before I first heard of it.
To make a long, complicated story as short and streamlined as possible:
Some years back, I decided I wanted to teach myself French, and after much failure, eventually stumbled upon a set of audio lessons that used something called the “Michel Thomas Method”.
The difference between these lessons and all the other ‘teach yourself’ and formal instruction I’d messed about with was simply incredible. In about a month of using these audio lessons on my mp3 player while walking or riding the bus, I had a strong grasp of the entire structure of the language, and could use it to express my own ideas in a conversational context.
Needless to say, I was a) very excited, and b) very angry that none of the supposed experts in language learning had told me about this sooner.
I wanted to know what this “Michel Thomas Method” actually was. Would it work for everyone, or just learners like me? Would it work for subjects other than languages?
I eventually tracked down a book called “The Learning Revolution” by Jonathan Solity (which I had to order from the UK), and it was here that I first found references to “Direct Instruction” and “Ziggy Engelmann”. I googled it and, like I said, was soon hooked.
But what I got from Solity’s book in the end (although not exactly what he said), was that everything in the Michel Thomas lessons that made them so unusually effective was an approximation of DI.
To explain the way I usually think about it now, I’ll make a short digression to summarize one of Engelmann’s articles criticizing “research-based” educational reforms in reading (“The Dalmation and Its Spots” on zigsite.com if you want to read it yourself).
In it he basically says this:
- These reforms were targeted at mandating that reading instruction have certain features (eg. paying some attention to phonemic awareness), because research had shown that instruction that was effective had these features, and therefore if instruction had these features it would be effective
-However, this is like saying that all dalmatians have spots, and therefore if something has spots, it’s a dalmatian.
So I would now say that the Michel Thomas programs were dalmatians rather than merely spotted. A bit mangy, with some mutt in them, but dalmatian enough to suffice for many practical purposes.
Aside: Nobody knows whether Michel Thomas, now deceased, was ever directly aware of Engelmann’s work, but he must have at least started developing some of the principles independently, given details of his rather dramatic biography in Europe during the WWII which I won’t go into. And independent recreations of the same things are common in science and technology, after all.
At any rate, my personal emotional experience—failing very hard at learning something I wanted to do, and then finally succeeding quickly and easily thanks to, surprise, an instructor that actually had a clue how to teach—is unquestionably responsible for a lot of the enthusiasm I have for this subject. And I just felt I should mention that.
A final aside: If you’re interested in learning a language yourself, I can personally recommend both the French and Spanish courses. (I haven’t used the German and Italian, and don’t know about the courses for other languages made by other people after Michel Thomas’s death.)
I can’t recommend that you simply download these from the internet, since that may be illegal in some jurisdictions, but there’s a good chance you can find a copy at a local library, as I originally did.
Having used these courses does provide an enlightening additional perspective on DI, as well as being, as I mentioned, the context in which I originally thought of some strategies that could allow DI to finally win against irrational forces in the educational establishment, which I will talk about in a later post.
Of course it’s not necessary to have the same experience yourself in order to understand what I’ll talk about, if you are not particularly interested in learning (or already know) French or Spanish, but if you are, then it would definitely be worthwhile.
And that, I believe, wraps up this super-sized “Hey, check this out!”
I look forward to your feedback.
I have only skimmed your post, but now feel motivated to leave feedback as requested. It is possible that some of my objections are misplaced, addressed somewhere in the depths of this article that my eyes glazedly passed over. In fact, my first complaint is that:
it is too long. LW tolerates long articles under limited circumstances and this doesn’t meet any of them (you’re not an established poster, don’t have fifty footnotes with sources, don’t apologize off the bat for length, and have missed many obvious opportunities for compression/excision). You should have made it much shorter (500 words about what the hell Direct Instruction consists of) or much much shorter (a two-sentence blurb with a link to more information).
It’s sales-y. Full of applause lights (counted five instances of the string “rational” in your text). You claim that your intent is to pique interest, but that is not done by saying “This thing is interesting! This thing is interesting!” repeatedly in the local idiom.
It is badly structured. Rambles all over the place. If you laid out the contents of your article in conceptspace and made me walk from point to point in the order you present them, my feet would get tired and I would become dizzy. You have definitely not convinced me that you have learned a secret of how to teach things, on a meta as well as object level.
It makes you look like a crank. If DI needs this much fluff and meandering and enthusiastic pitching, it’s probably not interesting. Oops.
In fact, the only reason I am bothering to think about this article ever again, having successfully scrolled all the way down to the unnecessary signature, is that you do repeatedly ask for feedback. If you’re sincere about that: I invite you to post, as a reply to this comment, a 1-3 sentence description of what DI is, plus one sentence about whatever evidence (beyond your enthusiasm about it) which exists for its splendidness. (Last sentence but not the first 1-3 can be/consist primarily of linkage.)
I’m in strong agreement, with the one addition that it would be interesting to see an explanation of DI using DI principles, and then an explanation of how the principles were used to shape the explanation.
I’d like to see this too. The example given of DI was way too contrived.
Yes.
I felt like I was slogging through this article to glean the relevant bits, which turned out to be the axioms and principles and the history and study evidence for the efficacy of the system.
To the author:
Keep those relevant bits. Expand on them with examples and ramifications. Cut everything else. Aggressively.
Assume that you’re talking to people who already care about the effectiveness of instructional methods; who are already basically aware of issues such as neurotypicality/neurodiversity even; and who are aware of the difference between saying you care about a goal, and actually acting to maximize that goal.
Assume that we already have knowledge and skills that we desperately want to teach, because we highly value ensuring that humanity retains that knowledge and those skills. Give us tools to accomplish our existing goals.
Ouch ouch ouch, but thank you for helping me.
I’ll have to think about a good “1-3 sentence description of what DI is” (in terms of what you can do with it, or in terms of its internal structure, or...?) while I’m at work, but as for evidence, if I said:
“Look at those two graphs from Project Follow-Through, because the meta-analysis says this stark difference between DI and other models of education keeps showing up in experiments comparing them.”
Is that at all useful?
How about:
Direct Instruction is an educational theory which extensively used experiments during its creation, and seeks to explain educational concepts in a sensible and efficient way. The amount of experience individual teachers can acquire pales in comparison to the amount of experience curriculum-builders can acquire, which DI takes advantage of by giving teachers heavily researched, and thus effective, scripts and practices to follow. Experiments dramatically verify the superiority of DI over other instruction methods.
In three sentences, you can’t explain anything about the method besides its inputs and its outputs, which is what other people are interested in. If they’ve got a use for your method, then they’ll start asking about its moving parts.
That explains DI’s origins, but doesn’t say anything about how it looks like and how it is different from other approaches. (Why it is called Direct Instruction, after all?)
Right- that’s because I don’t know anything about what it looks like and how it’s different from other approaches. (I haven’t dug any deeper than this article and wikipedia).
Agreed, the main problem is that you don’ seem to explain what the theory is until deep into the text. the section ‘what it looks like in practice’ doesn’t actually tell us anything other than that scripts are maybe involved, and that teachers don’t like it (personally i found the tone unnecessarily harsh there).
The next sections with graphs seem persuasive, but again at this point we don’t know what it is they are proposing, or where this data is coming from. Perhaps better would be just to link to the study with a quick (one or two sentence) summary of its findings.
Finally we get an explanation of what it is we have been talking about all this time in ‘a quick sketch of the basic theory.’ First paragraph is why we should like it, then an explanation of how you are explaining it.
As far as I can tell what its saying is:
All I get from the example is that its better to teach via rules than via memorisation.
At the risk of being harsh or flippant is that it? That doesn’t seem like a massive educational innovation to me (though I should say I don’t know much about the US system which seems the example here).
If there is something more to it (which I would hope given your enthusiasm) you need to make it very clear what that is. All the stuff about evidence, teachers and examples is unhelpful until we know that.
[Hope this helps, apologies if it sounds harsh.]
From my understanding of the alternate education literature, pretty much everyone is bitter at teachers as a group and has pretty good reason to be. The harshness didn’t even register until now because I see it so often.
Is the thesis that the majority of teachers and education employees are engaged in an active conspiracy to prevent the introduction of new teaching methods? I find that difficult to believe without substantial evidence. (It sets off my warning lights by similarities to claims made by ‘alternative’ medicine advocates about the biased scienetifc establishment and general conspiracy theories).
Even if a conspiracy existed, given the incentives individual teachers/schools/local governments have to improve performance of students why wouldn’t they defect and reap the benefits?
I can’t speak in sweeping generality here, but I can speak from my own experience. (I’m finishing a dissertation in educational research and bump into this pretty frequently.)
Teachers who have been “in the trenches” for any appreciable length of time (say, 5+ years) have developed some sense of what is necessary from a pragmatic point of view. They’re usually extremely well-intentioned toward the children, and often for that reason they resist a lot of the suggestions brought to them from educational research. There’s this sense that education researchers are too ivory-tower to know how things really happen in a classroom. There are exceptions, but they’re actually quite rare.
However, not all education researchers are naive about this issue. I know of many who were teachers in public schools for a decade or so before they moved on to research to do something about the mess they personally encountered. They’re able to build face-to-face rapport more readily with teachers. But that’s simply not good enough to convince teachers as a collective to try something new, even if the pragmatics of how to do it are spelled out in excruciating detail. They seem to resist change purely because it’s change, especially if it interferes with their personal emotional impressions of what it means to teach (which, unfortunately, were instilled more by accidental impressions than by training or rational scrutiny).
In addition to this challenge, a painfully large portion of education researchers don’t apply reasonable scientific methods. The majority of articles I’ve encountered in this field amount to philosophical commentary, which is often based on spotty evidence that to me would constitute preliminary research. So surprise surprise, a lot of the reform suggestions being proposed are saturated in cognitive biases that no one seems to be at all aware of let alone make any effort to account for. And, thus, most “well-researched” reform suggestions really don’t work in classrooms! So on the rare occasions that teachers are willing to try something radically new based on a new theory (or are required to by fiat), in most cases it doesn’t make much of a difference at best.
And this is ignoring the fact that researchers frequently forget about (or never consider!) the inferential distance teachers have to traverse to even understand what they’re supposed to do let alone do it well enough to teach. The “New Math” of the 1960s was a spectacular failure both because the theory behind it was psychologically and neurologically bankrupt and also because the teachers weren’t able to implement the original idea anyway.
So no, I don’t think it’s a conspiracy. I think it’s more that teachers are tired of having people try to mechanize their profession as though it didn’t require any skill to interact with children, especially since the vast majority of attempts to do so that they’ve encountered failed dramatically—often because the reform wouldn’t have worked anyway, but often because it was implemented very poorly due to undertraining the teachers in the first place.
Of course it’s not a “conspiracy” per se. It’s just yer standard lost purposes at work, in a field where the population really doesn’t have a strong grasp of the basic philosophy of science, leaning very much to “romanticism” rather than “enlightenment”, I mean.
Does the Rosenhan experiment imply any ‘conspiracies’ in psychiatry?
Thanks for your help. It’s more… well, look: We at LessWrong are very familiar with the idea that if you really want to pin some hypothesis down, there’s a certain minimum amount of work you have to do. You can’t just magically jump there without certain bits of information, and processing the bits together in the right way. And most of the work goes into just locating the general area of the correct hypothesis.
Okay, that’s obvious. We get that. The big thing DI does (in the stimulus-locus analysis) is turn that sideways and apply it to teaching.
There are certain bits of information without which the naive student can not just magically figure out what we’re trying to communicate, so we need to sequence the introduction of those bits of information, make each logically unambiguous, and prompt the proper processing of the bits together in a manageable context that lets us ensure each step has gotten across properly before moving on to the next bit. The teaching communication should be designed to at least strongly imply the correct conclusion as early as possible.
That make any sense?
Aha!
This is a very useful and relevant explanation, which would have been unutterably more useful to read in the article itself.
Thank you, I’m glad to know I’m at least making some progress (I sweated over my first attempt for ages and it ended up terrible, but then just a little time of feedback and back and forth discussion seems to really be tightening up my understanding of what I need to communicate and how! That’s probably a highly generalizable principle :P - actually reminds me of some “just make an attempt already if failure is low cost!” post I saw some months back in main… can’t find it right now, but maybe you remember the one I mean. [Edit: it was “Just Try It”)
Anyway, is this also helpful? (Even ‘unutterably’ so? :P)
Agree that the article has serious structure and style problems, but a quick check at Wikipedia confirms that DI seems to be approximately as awesome as you say it is, and the biases of the educational establishment certainly seem worth investigating. Upvoted.
I agree it has serious problems. I knew that when I posted it! I just thought the harder bits of content studded here and there would anchor the rest better until people got a better idea what I was talking about from their own research. Clearly I made some miscalculations.
Do you think the notes I added to the beginning as a replacement are a good start at straightening out the mess of confusion I’ve made?
ADD-ON: Are this comment and this comment helpful? (Reading those as a conversation with FiftyTwo, I mean).
Oh, and by the way, I wanted to tell you that your post on “dissolving diseased questions” holds a special place in my heart. I found it soon after finding LessWrong itself (through HPMOR, through David Brin’s blog), and it was just wonderful, the feeling of, “There are other people in the world who think like this! And understand it better than I do, so not only can I communicate with them, I can learn from them!”
So, uh, yeah. Thanks =]
Downvoted. The information is too diluted, the post is too long and badly formatted, and after reading it I don’t know what Direct Instruction is and how it differs from other approaches. The post is written in the same style as all spamish advertisements recommending products whose only description is that they are amazing and revolutionary. In fact, I am very surprised that this post gained so much upvotes, if it hadn’t, I would probably quit reading after the first paragraph.
To improve the structure, I’d suggest:
Explain as concisely as possible what DI is and how it is different from other approaches. You have said that DI somehow uses tested algorithms. Is that it? Are there any other postulates or specific characteristics of the method? Why is it called Direct?
Give few actual examples. All I have found is the fictional example with teaching numbers (did somebody try your method to teach numbers in practice?) and an anecdote about your experience with learning languages (different people like different methods, why do you think your taste is universal here?), and there is no clear way to see how these examples correspond to DI.
Since your graphs are the most salient evidence of effectivity of DI, they should be accompanied by a link to their source.
A slightly less emotional and more technical style would improve apparent trustworthiness of the post. Specifically, don’t replace evidence by strong words (“whosoever adopts this crazy ‘teach my kindergarteners 1-20’ goal is going to horribly slow down and confuse their kids”).
Well… yeah, you pretty much nailed it with “too long and badly formatted”. NOT my best piece of writing ever. Yeesh did I ever drop the ball on that one.
Do the notes added at the beginning as a replacement for the whole long thing help to start clearing stuff up any?
The notes are somewhat helpful, yes. They directed me at the Athabasca University page where is a sort of … description. Well, to my dismay even they don’t say what DI is (except six vague features—that isn’t too much given that they say it is the only theory of instruction) before they give the readers an excercise whether they can recognise DI or not. But perhaps that’s DI in practice.
Now, let me ask few questions about the Athabasca University presentation. They say
It means that the teacher has on average 6 seconds to (a) convey new information (b) formulate the question (c) wait for the students’ replies (d) tell them whether they are right. This seems impossible even if we allow all learners responding at once and if the taught material needs only crude memorisation, with no students’ questions, no explanations, no writing, reading, looking at graphs… So, is the claim that each learner responds 10 times per minute correct?
There is another suspect claim:
This is almost a tautology (with great potential for equivocation in appropriate, fail, faultless, need etc.), but can be also viewed as a claim that DI doesn’t need revision (the answer I chose, the instruction has a problem and needs revision, has been claimed incorrect) no matter what evidence we get. Do the DI proponents in general think that when DI fails, it’s never a sign of errors in theory, but rather imperfections in implementation of the method?
And, more generally, all examples given may be used for teaching categorization of objects. How do you teach algorithms (such as multiplication)? How do you teach history and geography? How do you teach calculus? How do you teach scientific method? Not every knowledge can be reduced to questions of form “does X have property Y” taught by presenting series of objects which either are or aren’t Y. In the whole presentation there was not a single practically applicable example. Children don’t need to go to school to learn what “is longer than” or “not horizontally aligned” means.
{continued from last comment because of character limit}
(Again, thank you so much for working with me so patiently to get through such a big inferential distance!)
You said:
“Faultless communication” is the basis of the stimulus-locus analysis branch of the theory. If faultless communication fails with a particular learner, that gives you specific information about how the learner is not using the two-attribute learning mechanism. That tells you to shift to the response-locus analysis branch of the theory to figure out how to modify the learner so that they do use it (and the stimulus-locus analysis is more just an application of normal behavioural analysis to the situations encountered around the context of DI). For instance, it could quite possibly be that the learner is able respond, but there is a compliance issue (that’s usually relatively easy to diagnose and correct in one step). Or it could be that the learner is missing at least one logically necessary concept underlying the task, in which case the stim-loc tells you what to probe for, and the resp-loc tells you how. Once you find it, you shift back to the stim-loc to figure out how to teach the missing background bits and integrate them. Or if the learner simply can’t produce the response, you shape it (or apply context-shaping if they can produce it but in the wrong context).
Once a learner has been correctly placed in a full DI program, they’re in a context where the probability of compliance problems is drastically cut down, and continuously receiving positive reinforcement for compliance. And in the DI program the way in which later, more complex concepts logically depend on earlier, simpler concepts has already been accounted for, and students are brought to mastery on the logical pre-reqs before the dependent task is introduced, so that kind of problem is pretty much ruled-out. (Although understanding in detail how this is so requires more understanding of the knowledge-systems analysis portion of the stim-loc, and in the AthabascaU module you’ve only been shown how the communications-analysis applies to the first of the ‘basic form’ concepts in that hierarchy [that is, single-dimensional non-comparatives - ‘non-comparative’ meaning that the value of an example as positive or negative is absolute rather than relative to the preceding example])
Still, the two parts of the stim-loc and and resp-loc do interplay a lot in practice, of course.
You teach algorithms through ‘cognitive routines’ (a classification in the knowledge-systems analysis), if they can’t be sufficiently communicated as basic or… hold on, I should lay out a quick sketch of the hierarchy:
Basic forms:
single-dimensional non-comparatives
single-dimensional comparatives
multi-dimensional non-comparative (‘nouns’, and the reason why LW familiarity with “thingspace” should help with understanding DI)
[multi-dimensional comparatives seem to be implied to me, but Theory of Instruction doesn’t even mention them. I can see how they’d be a lot harder to construct sequences for, and would in practice be already ‘naturally’ generalized by the learner once they’ve got enough examples of basic forms of the other three types]
Joining forms:
transformations
correlated-features concepts
Joining forms are the two ways in which basic forms can be related to each other.
Transformations being generalizable systems of relating various examples of the same ‘type’ to corresponding regularities in the response [like grammar rules, spelling and reading, equivalent notations, and...I think I’m actually stipulating with those examples a much narrower range of variation in what transformations can cover than I should, but you know]
And correlated-features being communications about empirical relationships between two basic forms (“if the grade gets steeper, the stream runs faster” An example of a steeper grade is shown. “Did the stream run faster?” (not shown in the example, although they understand the verbal reference to the unshown sensory discrimination of ‘runs faster’ from a previous sequence). Learner: ‘yes’. “How do you know?” Learner: “Because the grade got steeper”).
Complex forms:
communications about events (‘fact-systems’)
cognitive routines
Complex forms being just that, complex systems of basic and joining forms.
Communications about events are kind of a systematic way of designing and teaching mind-maps (which applies to a lot of things from history and geography).
Cognitive routines are algorithms, overtized so that they can be treated as physical operations. (You get any of the factors wrong in a physical operation like opening a door—unlocking it first if necessary, how you turn the handle, direction of applied force—and the environment gives you feedback: the door stays closed! But you try to read a word the wrong way and the environment does nothing to prevent you from saying the incorrect word! Independent practice on cognitive routines new to the learner is a logically insane idea, and experiments can prove it!)
Any concept that can be classified as joining can also be treated as basic, and anything that can be classified as complex can be treated through joining or basic, if the learner is already familiar. Like, you could do a non-comparative sequence “is this calculus? yes/no”, but you couldn’t teach the discrimination of ‘red’ through a cognitive routine or fact system or one of the joining forms.
But yeah, calculus as an unfamiliar topic could of course be largely approached as a body of inter-related cognitive routines (and their inter-relations mean the teaching of the whole body can be much simplified by applying single- and double-transformations).
Cognitive strategies like ‘scientific method’… well, read this comment. Like I say there, all the concepts represented by your brain on an idea like ‘reductionism’ must themselves by reducible somehow. We might not be practically able to reduce the whole huge thing in vaguely the same way we can’t calculate the exact aerodynamics of various shapes, but we can apply basic principles to do a lot better than just throwing something together (that analogy feels a bit looser than my other physics ones, but you get the point).
And I realize most of that was probably ridiculously hard to follow and pretty much most useful to me as practice reviewing the material, but unless you have some reason to think that the book Theory of Instruction is just 376 pages (not counting index and references) of crank techno-babble by two Ph.D.‘s (fine, Zig Engelmann’s is honorary from Western Michigan University, but whatever, he’s also a recipient of a Council of Scientific Society Presidents award) who are respected by multiple other Ph.D.’s they’ve collaborated with on books and papers and the DI programs themselves… and that the contents of the book have nothing to do with the reason that the DI programs they designed actually manage to achieve success in experiments like nothing else in the field of education has...
Just get your hands on the book! Because as much as I wish I could I’m not gonna be able to repost everything in it as a series of blog posts any time soon! Check a local university library, or just order it from ADI if you can’t find a copy! (It’s forty bucks, not exactly a huge expense!)
Again, thank you thank you thank you SO much for being patient and working with me so well through such a huge inferential distance!
Thank you so much for going to the trouble of writing such long and thoughtful feedback! Especially since it’s obvious that I still have a lot of this unclear for you.
Actually, I should ask while I’m here: why are you being so diligent about pursuing this?
For me, the most obvious direction of thought would seem to be as follows (but I’m not too sure about my judgement on what would be obvious to someone who doesn’t yet understand the theory, hence why I’m checking with you):
The differences between DI and anything else (‘normal’ education and competing models together) as shown on those graphs from Project F-T are really impressive
And it did say in the meta-analysis that “DI’s consistent achievement of such scores is unique in educational research”, so the F-T results aren’t likely to be a random fluke
So there must be some explanation for that, and the possibility that the people who make this DI stuff actually know what they’re talking about—something complex and non-obvious that I don’t yet understand—at least deserves some serious consideration (...despite the fact that this idiot has so far done a terrible job at explaining what that might be :P)
If that does seem to accurately reflect what’s going on in your mind, please do tell me, because that seems like it would be of great use in fixing up my “DI to LW” communication problems.
Anyway, I’m gonna use an analogy to explain to your what this challenge of communication feels like from my perspective, and then I’ll try to give you some meatier replies to your questions.
Analogy: Imagine that you were trying to explain physics to someone who had never even heard of it. Why it’s exciting in and of itself, and the amazing engineering feats it allows you to accomplish. You gave them an introductory module on Newton’s three laws and they came back and said, “Honestly, it seemed pretty vague. And the axiom ‘moving objects stay moving unless they’re made still, and still objects stay still unless they’re made to move’ seems almost like a tautology. And how on Earth does this allow us to create ‘amazingly faster transportation’?” (Note it does not occur to them to ask the last question: “How does this allow us to engineer trains and bridges etc?”)
(Please remember, this isn’t meant as an argument by analogy! I just think it could help you to understand what I say better if you have some idea what it feels like for me trying to find good explanations. On to meatier bits.)
So you asked:
I dunno. Offhand, it sounds plausible as something a good presenter could achieve for many sections of the programs, but it’s not like it’s mandated by the theory or empirically shown to be necessary. Get your hands on the Michel Thomas lessons for a personal experience with how this is actually not too implausible. And I could try to scan a lesson from a kindergarten reading program or something for you some time, too. (I’m from Canada, and I’m staying with a homestay during my internship in Baltimore, so I’d have to ask them if their scanner is working).
Anyway, before I go any further, have you read this short comment yet? (Just because I want you to have that background and I wasn’t able to integrate it below.)
{goes to next comment because it hit the character limit per comment}
I am not much diligent, but even if I were, I doubt my ability to state true reasons for my participation in online discussions.
If it wasn’t clear, I didn’t mean differences in results, but differences in method. That’s still what I was complaining about: I have read several times how magnificent DI is, but still haven’t learned what the hell DI consists of. Well, I have a rough idea now, but it isn’t based on unambiguous statements.
This was getting interesting, but was interrupted exactly at the moment when I expected you to write the most important part: how does a DI teacher explain Newton’s laws? Can you show?
From the continuation comment:
This sound extremely vague (much vaguer than Newton’s laws ever sounded to me). Faultless communication is, as far as I understand, a technical term with some precise meaning. What’s its meaning? How is it defined? What are the basics of the stimulus-locus theory? I assume majority of LW readers aren’t familiar with the theory and if it is a key component of DI, you should give at least a brief explanation of its basics.
Once more, nine paragraphs or so and I am not able to make sense of it (probably because I don’t know the specialised vocabulary). Somewhere in your original post you said that DI is based on algorithms which teachers apply and this doesn’t need the teacher to understand DI on theoretical level. So, consider me such a teacher who wants to teach multiplication and give me an algorithm to follow.
I… find myself quite surprised at the way my understanding of your response to my question (round the first three bullets) doesn’t seem to address what I meant to ask. Was I not clear enough, or were you just skimming around there (not that I don’t understand you skimming occasionally at this point).
Man, I just read the first sentence of this comment back to myself, and...
Well, I’ve been working on less than four hours of sleep a night for the past three days. I’ma try to keep this short by giving only a limited treatment of one point you asked about, go to bed, and give you something more detailed later.
All right, I’ll ask in the DI community for advice on good examples of places in programs that teach cognitive routines (well, places that review the whole routine at once, since the initial teaching of all the components is distributed over long sections of the script, of course). (I’ll also ask if they can give me the reference to the experimental evidence on the 1-20 vs 1-99 thing, and so on.)
But yeah, the section of Theory of Instruction on “Constructing Cognitive Routines” begins on page 191 of the text, so you being a bit confused after only nine paragraphs written by a student pretty much reciting an outline of their own mental notes is not that odd.
If you could possibly find the time to check the online catalogs of any university libraries near you to see if they have the book… because if you could easily get your hands on a copy, it wouldn’t be too hard to just try skimming the section and chapter summaries.
Too many serifs.
It sounds minor, but yes, this. It’s jarring to see a long article that doesn’t use the same text style as the rest of the site.
Thanks, THAT at least is easily fixable. Didn’t even notice it with all the other formatting headaches from copy-pasting it from an open office file.
Downvoted for the extremely high ratio of enthusiasm to crunchy bits. In fact for all I know there are zero crunchy bits; I gave up about halfway down the page, when the benefits of the method were being iterated for the third time with no hint of any specifics.
Sorry, yeah, I see what you mean now. The crunchy bits were meant to be a few studs of hard content holding down the rest until people filled in more of the whole hard structure of the subject with their own research, with the excitement serving as, well, yeah, a sales pitch. To study. It’s okay to try to talk people into studying, right? :P
But yeah, that turned out to be a serious miscalculation on my part. Do the notes added at the beginning as a replacement for the whole long thing help to start clearing stuff up any?
Your notes seem to mainly consist of a long-winded apology for being long-winded. So, no. However, the link to the correct DI article on la Wik was helpful. Suggest you move it to the top. You may also want to steal the one shining jewel of that Wiki article, the summarising sentence: “Direct Instruction (DI) is an instructional method that is focused on systematic curriculum design and skillful implementation of a prescribed behavioral script.” Thank you, anonymous Wiki editor, although “skillful implementation of” is superfluous.
Man, if you only knew how wrong that is. Just something like the timing of the teacher’s signal for the student to respond can be crucial. It’s not that complicated to train a teacher to do it right, but if you don’t, hoh-boy.
To explain, imagine you have a group of very young, very naive children, and you need to induce knowledge of the basic-form concept “getting wider” (a single-dimensional comparative [‘comparative’ meaning the value of each example is relative to the previous one rather than absolute]).
You have for this a logically unambiguous communication, made of a series of positive and negative examples that is sufficient to zero in on that concept in conceptspace by showing what it is and is not, with the ordering and juxtaposition of minimal differences (between negatives and positives, to show difference) and large differences (between positives, to show sameness) tightly controlled, and a consistent prompt for the learner to properly process each example, and a test that they have done so integrated right in to the sequence. (Again, guided induction.)
The teacher is presenting examples through continuous conversion (moving their hands farther apart and closer together, or leaving them the same), and saying “watch the space between my hands. I’ll tell you if it gets wider or doesn’t get wider.” and then “it got wider/didn’t get wider” (for the first maybe five examples, which are modeled) and then “did it get wider?” for the rest, to which the kids go “yes/no” (or any clear response they can reliably produce; it can be transformed later anyway).
This works great if done properly, but imagine if the teacher messes up just the timing by asking the question while their hand is still moving? The kids will be lost!
Also see this and this
If you then still can’t find anything worthwhile in what I have to say, I’m sorry for wasting your time right now, and I’ll work with the people who are already at a closer inferential distance to the topic until I can drill back far enough to communicate clearly with people at a greater distance like you.
Thanks.
The point I’m making is that any method whatsoever requires some degree of skillful implementation; consequently, the phrase does not convey any information. All you’re doing is giving an example of what a skillful implementation looks like in this case, namely, attention to timing.
Okay. I was thinking ‘not superfluous’ as in a non-trivial detail in actually turning around a failed school serving a disadvantage population. But saying that it ‘conveys NO information’ is technically correct, by a very narrow definition. Although I think the wiki poster made the right choice in including it, since there are often logically implied things that none-the-less should be pointed out to the reader.
Anyway, did you find anything worthwhile in those two comments I linked? Or should I stop bothering you for now?
Ok, ok, I didn’t mean to be quite that harsh. Your later comments did give a bit more useful detail. :)
I… thank you =]
I like to think that I’m pretty good at taking things in the best way possible. I’m happy to have been told by one person in a private message, “good on ya for having a thick skin”, and I’m trying to live up to that, but I have never had the pleasure of dealing so extensively with people who are just like me in regards to this quality I’m vaguely gesturing towards right now.
I mean, I knew intellectually that the people on less wrong are unusually awesome in that way like you, but it had still never really been my personal experience before, so my emotional belief was much weaker, and I was starting to feel like, “Oh, maybe everyone really is offended that I had the gall to show them such a terrible mess and ask for feedback. Maybe I should have kept trying to work it over by myself until it was perfect rather than just making an attempt...”
Oh, and if you found my later comments to have more useful detail, am I doing a good job continuing that improvement with this (first half and second half) or is that a step in the wrong direction?
You seem to be moving in the right direction: In the comments you linked, you are laying out some jargon of the DI and explaining what it means in simpler words. If at all possible, you might also supply examples taken from actual teaching; for example, when you say
it would be helpful if you could descend from the abstraction for a moment and say “For example, one time I was trying to teach trig using DI; the student was not getting why the double-angle formulas work, so the stim-loc told me I should look for a faulty understanding of [something], and I checked that by [something else]...” Failing real anecdotes, a fictional one (clearly marked, of course!) using a real DI locus schema could also be helpful. But at any rate you’re now explaining what sort of techniques are involved and what the ‘detailed curriculum’ is to consist of, and my desire for anecdotes is more about how to present it rather than what to say.
Sorry I promised I’d type that section out yesterday, but didn’t. Honestly, I’ve been juggling so many things I’d need to kage bunshin myself with my computer to handle them all.
(Yes that’s right, I just made a “Nartuto” reference. :P
Can you imagine what a “Naruto” equivalent of HPMOR would be like?
...I can’t. Other than “awesome”.)
Anyway, rather than typing out the section, I found a scanner and signed up at photobucket.
Here’s the page. The section I was referring to starts at Prescriptive Applications of Programs [“programs” meaning the task analysis], and ends at the Summary.
Okay, I’ll type out the section on “Prescriptive Applications of [Task Analysis]” from page 143 of Theory of Instruction and the accompanying figures tomorrow (been working on less than four hours a night of sleep for the past three days, so I’ma keep this short right now).
The concrete example there is based off:
And the two examples of probed items are both correlated-features concepts.
But yeah, if you could possibly find the time to check the online catalogs of any university libraries near you to see if they have the book… because if you could easily get your hands on a copy, it wouldn’t be too hard to just try skimming the section and chapter summaries.
I’ll also ask in the DI community for advice on good examples of places in programs that teach cognitive routines, and ask if they can give me the reference to the experimental evidence on the 1-20 vs 1-99 thing, and so on.
How was DI tested? I can see it being a good fit for traditional “what is the answer to this question” exams, but less so for problem solving and creative thinking.
The material quoted relates DI to elementary education. Is it supposed to be good all the way up to postrgraduate level?
My general feeling is “How can this possibly work?” I can’t imagine anyone teaching, say, calculus who doesn’t understand calculus thoroughly enough to answer every possible question. If this Direct Instruction thing has mechanics for dealing with that, I don’t see them. Maybe the lesson plans involved are so wonderful that every student gets it straight off and doesn’t have questions? Then the trick is in the wonderful lesson plans that I don’t see how anyone could come up with without a 10-year experience of teaching calculus.
I suppose having better-than-average lesson plans and making teachers stick to them would yield an improvement when the teachers don’t really know what they’re doing. I don’t know if that constitutes a novel education method. It certainly doesn’t seem to be the way that Direct Instruction claims to succeed, so if that’s what’s good about it, that’s a problem.
I also don’t see any other feature of Direct Instruction that would actually make it good for teaching. But it’s possible that by reading this post and the Wikipedia article, I haven’t actually discovered the key insights of DI, and now I’m trying to figure out how the hell a horse runs so fast just by having a tail.
In that case, OP, the ball is in your court. What are the things that make DI work?
Edit: after looking at Vaniver’s post I think maybe I misunderstood the idea; maybe the point is to have really really good lesson plans. This doesn’t address the issue of answering student questions, but I can imagine a way it potentially gets addressed. Maybe students who are confused by something (and we hope there are not too many of them) are just left alone until they do poorly on the next assessment, at which point the lesson plan targeted at them tries to answer all possible sources of confusion.
Is this how it’s done? If it is, I suppose I’m not confident that it won’t work well, and I’d be open to considering those pretty graphs as evidence that it might work well. My main worry is that this method seems like it would do worse and worse as the material gets more advanced, and it might not work at all past middle school level material.
Thanks for your feedback. Does this and this help?
And the rewrite added to the beginning of the original post as a replacement for the whole long thing?
From the wikipedia article:
I can see how this would be useful in certain areas like learning the names of numbers, where obviously there is not benefit to innovation. However I would be very cautious about applying it to wider areas, as much of what school systems are meant to teach is methods of researching and acquiring beliefs, not mere memorisation. Doesn’t this just make it easier to memorise the teachers password?
I would be especially cautious of trying to teach rationality this way.
[This is based on my admittedly limited understanding of the subject, so if there are details I have missed that tackle these concerns please explain.]
Seems our conversation’s getting a bit chopped up. Continuing from this (and forgetting the mixed up di/DI confusion from wikipedia)...
I’m going to quote “Theory of Instruction”:
“If the goal of instruction is to teach the learner to discover a particular relationship, actual practice in discovery is imperative. Such practice can be provided through a cognitive routine, however. The routine is demonstrated with some examples. The learner then applies the routine to other examples. By encouraging the learner to make up problems that may be solved by the routine and then testing them to see if the routine works, we provide the learner with a framework for discovery.” (pp 191-192)
Cognitive strategies are not mysteriously complete wholes. They reduce to generalizations and cross-fertilizations in pools of good cognitive routines, and cognitive routines are made up of moving parts themselves.
These moving parts are joining form concepts—transformation and correlated-feature relationships—which boil down to basic form discriminations of single-dimensional comparatives/non-comparatives and ‘nouns’ (multi-dimensional non-comparatives).
All the concepts represented by your brain on the idea of ‘reductionism’ must themselves by reducible somehow.
This making any sense? Some of the terminology related to the hierarchies in the knowledge system analysis might be slightly opaque...
Upvoted in anticipation of the DI Sequence that will be forthcoming over the next dozen months or so. Yes? :)
Well, one of the things I intend to do once I master the application of DI theory myself is create a DI course covering the material in Theory of Instruction itself.
But I don’t think I’m going to blog the entire contents of 376 page text-book in the next year, what with the huge amount of studying and practice I have to do myself on a more advanced level, and my full-time internship as an elementary teacher.
My intention here is simply to interest some LWers in joining me in my studies, so that they have the opportunity to catch up to me as soon as possible (and given the intelligence distribution on LW, hopefully have some of them surpass me!).
Do I still get to keep your upvote?
PS: Just occurred to me that there’s a terminology mismatch I may have to remember to explain at some point. In the context of DI theory, “sequence” is usually used to mean one of the shortest units of instruction, like, directed at a single basic-form or joining-form concept (a simple joining form concept! Not a transformation concept with many sub-types, which would also be split into multiple sequences).
A ‘program’ refers to a relatively short series of sequences directed at bringing the learner to mastery of a “task” for which those concepts are logically necessary.
How an entire course like “Reading Mastery” for the entire grade-level of kindergarten is unambigiously referred to I’m honestly not sure offhand. I think it’s usually clear from context.
Very interesting. Upvoted, despite structural problems, but that has all been said already.
I have a minor interest in education (my mother is a teacher and I’m kinda her outside adviser), but I’m mostly interested in how to apply this to self-teaching. I can see how this would be awesome if you had a competent teacher to design a course for you, but how can I use DI if no such course exists?
As far as I can tell, one key idea is to construct a series of minimal examples to illustrate one (and only one) new concept. I’ve been using this for language learning very successfully so far (and it works even better combined with spaced repetition), but how do I use this for more abstract or complex concepts?
Mathematical proofs are something that I’m still struggling with in general (both constructing and understanding them). Let’s take the relatively simple proof that sqrt(2) is irrational. The presentation is fairly typical: it’s terse, no motivation for any step is provided, and the whole setup is confusing. Even worse, I don’t even see how I can apply the idea of example cases to this proof. It’s not a general property, so I can’t look at other cases. How do I now transform this into something not-confusing using DI? What would a DI teacher do here? (And ideally, how can I do this on my own without understanding it first?)
(As an aspiring polyglot, I also picked up the French Michel Thomas course and will test it this week. I had previously given up on French because of how awful the learning material for it was, compared to Japanese or Latin, and because it’s only a minor language for me, so I can’t be bothered to design my own material. Maybe this will work better.)
Cool, I look forward to discussing the French lessons with you (although honestly I’ve lately been practicing Spanish a lot more).
Remember to ask me for some small charts I made that will help you immensely in properly producing all the standard French phonemes that differ from the standard (American) English set at some point.
DI could probably be somewhat adapted to help in self-teaching, since it would at least give you a useful classification system for what possible logical structure the ideas you’re looking for might have… but that’s the possible structures of the most basic components, which for an advanced subject are arranged in relationships which make the whole thing exponentially more complex. Although different complex ideas can often differ in small ways, but...
Yeah, I dunno. It would have at least a little use I’m sure, but I would bet it couldn’t produce anywhere near the level of “magic”-seeming results that a good DI program designed by someone who already understands the material can.
But really, the basic answer is, “No. DI is simply the application of practical epistemology to teaching. When you are learning something on your own, you have to apply practical epistemology in the normal direction.”
Oh, and If you’re interested in Japanese but aren’t yet at a very high level, look for the book “Japanese Verbs & Essentials of Grammar” by Rita L. Lampkin. I took Japanese in high school, and the school only had a grade 11 class. I came into it below the level of the other students, but the teacher bumped me up to being the only grade 12 Japanese student in the school during that year. I believe that book was the single most significant factor in that, and any enabling traits I had on the side, I’m sure you already have too.
Actually, even though I still achieved a very weak grasp of the language at my peak and quickly lost most of that when I stopped practicing after grade 12, I think I would be able to apply DI theory to use that reference work to produce some great instruction on using the language expressively in a conversational context once I master the use of DI theory more.
(Essentially I would be using the book as a ‘prosthetic’ understanding of the material while I designed the instruction. I haven’t given it much thought, but I think this may not be possible in the same way with mathematical proofs [cuz that’s more cognitive routines than just transformation concepts, and cognitive routines are higher in the hierarchy of the knowledge-system analysis because they incorporate transformations as components.])
And that’s actually a project I’ve been thinking about how I might possibly do it eventually for a while.
You sound like you could make an awesome collaborator on something like that, if you got your hands on a copy of Theory of Instruction.
[Edit: note that language instruction will generally be much easier to design (for any student who already has a good adult grasp of their native language).
This is because it’s largely just subtype analysis of single- and double-transformations that you need to understand, which means you really don’t need the sections of Theory of Instruction that deal with cognitive routines, diagnosis and corrections, the response-locus analysis, and philosophical and research issues, and that’s the entire last half of the book!
Well, a bit of the response-locus analysis would be useful for teaching the production of new phonemes, but not that much theoretical detail.
The major practical difficulty would be tracking the schedule of the integrated review, simply because of the sheer number of distinct entires (vocabulary words and grammatical patterns), but it would be relatively easy to design a Computer Assisted Design program to help with that.
Oh, and the subtype analysis, which is the most complicated theoretical part, needed for ordering the introduction of the transformation concepts, is only needed for teaching all the basic grammatical structures of the language.
Once you’re done with that, all that’s left is just vocabulary and idioms, which pretty much just follow the same logical template over and over again.
(The only difference is that you’d start to be able to provide more and more of the definitions and directions of the instruction entirely within the target language.)]
It’s a particular property, which you can apply to other cases by substituting some other particularity—for example, replace 2 by any other number throughout and see whether the proof still goes through. Doing this sort of thing will tell you how the proof works and why it works.
I’m interested in knowing how this goes. I’ve never got very far when learning other languages, although I have to say I’m not impressed by the Michel Thomas web site or the Amazon reviews. I suspect that his method would drive me up the wall.
Feedback because you asked.
This article took so long to say much of anything about Direct Instruction I wondered if it might be a sales pitch. My eyes glazed over halfway through. I can see from your participation in the comments thread that you are sincere, and having trouble articulating what DI is in an accessible way.
So I hopped on Wikipedia and looked up “Direct Instruction.” The first paragraph told me everything I need to know:
“Direct instruction is a general term for the explicit teaching of a skill-set using lectures or demonstrations of the material, rather than exploratory models such as inquiry-based learning.
“This method is often contrasted with tutorials, participatory laboratory classes, discussion, recitation, seminars, workshops, observation, case study, active learning, practica or internships. Usually it involves some explication of the skill or subject matter to be taught and may or may not include an opportunity for student participation or individual practice. Some direct instruction is usually part of other methodologies, such as athletic coaching.”
Okay, that does sound pretty cool. As someone with teaching experience, I’ve made use of it without knowing it was a thing, many times. And apparently, studies suggest it’s exceptionally great at getting the point across—awesome! My perception that the most notable incidences of using that method were especially effective with my students may not be off the mark!
I suggest you try to find a way to state, in your own words, what the bit I quoted is saying—or, failing that, just explicitly quote Wikipedia within the first or second paragraph, and keep your whole essay to about 500 words or so pending the community’s reaction and desire to learn more.
Maybe I’m being stupid here, but I find it hard to see the difference between Wiki’s “explicit teaching of a skill-set using lectures or demonstrations of the material” and, say, “traditional blackboard-and-chalk instruction”. Possibly I’m missing a joke? If not, what is the difference between “teaching [using] lectures” and, um, lectures?
That’s...not what’s being contrasted here...
Glad to hear it. In that case, it seems that the Wiki description is rather a bad one, since it could give such a misleading impression, and some different summary is needed.
Thanks for your feedback… and patience. The wikipedia articles (there are two that differ only in the capitalization of one letter!) are confused.
The one with the lowercase “instruction” fluidly mixes up “little-di” and “big-DI”.
“Little-di” is merely instruction that is somehow more direct than some other supposed norm. “Big-DI” is not “little-di” in rather the same way that linear algebra is not algebra that has been put in a line.
Do the notes added at the beginning as a replacement for the whole long thing help to start clearing stuff up any?
And does this help?
I think this is a good subject to write about and that you can work this into a nice article if you continue to boil it down. Others have given some good (and maybe a bit harsh?) feedback. All I want to add is that for a post this long, it would be much more useful to have reference numbers (or better yet, links) within your article. I found myself initially having questions and wanting to look at references and then lost that motivation once I finally got to them. Also, I think your conclusion has too much new content.
There are plenty of very valid criticisms of my horribly writing here that have been repeated a few times each, but the one about my ‘conclusion’ adding too much new content is spot-on and original.
Does the added notes at the beginning as a replacement for the whole long thing help? Like with the shorter more immediate list of references (and the clarification of why I wanted to put that information about Michel Thomas in there, despite having to shoehorn it in)?
I think if you keep the same structure (introduction, evidence, explanation, example), tighten it up and briefly summarise DI at the beginning it would be more effective. It looks like some people have interpreted your informality as a sales pitch so I’d tone it down a bit. Personally I found the post compelling and will look into DI more in the future. As far as I can tell, it ties with my own belief that knowledge is a set of skills and should be taught by breaking down those skills into components and training them (as you would any other skill).
What is the meaning of the two graphs? How are they consistent?
ETA: the graphs come from here, but that doesn’t help. My guess is that they have different baselines, the first using the previous measurement of 20% and the second using the control group, if it regressed to 30%. But they still don’t look consistent to me.
When I looked at Project Follow-Through, about 10 years ago, when DI was considerably less popular than it is now, the graphs looked very different. No single method beat out all the others, though three or four were clearly better than the others.
I don’t see the problems others are talking about at all, to me this seems entirely awesome and I were surprised when I got to the comments and people didn’t agree with me.
There’s too much of the interesting-if-true about it. The quoted statistics say that in aggregate it’s awesomely successful, but the article only gives an imaginary example of how DI is done, with strident assertions that this obviously must logically work and no claim that this method of teaching numbers has ever actually been used. There’s also the claim that because this obviously must work, if it doesn’t it’s the teachers’ fault for not doing what they’re told, which is pretty much a standard rationality failure.
So there may be something awesome here but if so, it doesn’t come through very well from the posting.
I found the article interesting and exciting, and emailed the article to a few people that I thought would be interested, and am just now getting to the comments. I am unsurprised that the people who went directly to the comments and left one had a different impression.