Experience that I had recently that I found interesting:
So, you may have noticed that I’m interested in causality. Part of my upcoming research is using pcalg (which you may have heard of) to identify the relationships between sensors on semiconductor manufacturing equipment, so that we can apply work done earlier in my lab where we identify which subsystem of a complex dynamic system is the root cause of an error. It’s previously been applied in automotive engineering, where we have strong first principles models of how the systems interact, but now we want to do it in semiconductor, where we don’t have first principles models of how the systems interact, and need to learn those models from data.
Time to get R installed and pcalg downloaded correctly on Ubuntu: ~2 hours. (One of the packages that pcalg requires requires R 3.0, which you need to modify the ubuntu update files to get instead of R 2.14, and a handful of other things went wrong.)
Time to figure out how to get my data into R with labels: ~2 minutes.
Time to run the algorithms to discover the causal network for the subsystem I have data for now: ~2 seconds.
I’m not sure I should also count the time spent learning about causality in the first place (which I would probably estimate at ~2 weeks), but it’s striking how much of the investment in generating the results is capital, and how little of it is labor. That is, now that I have the package downloaded, I can do this easily for other datasets. Time to start picking some low-hanging fruit.
(Living in the future is awesome: as much as I complained about all the various rabbit holes I had to go down while installing pcalg, it took way less time than it would have taken me to code the algorithms myself, and I doubt I would have done anywhere near as a good a job at it.)
I’m not sure I should also count the time spent learning about causality in the first place (which I would probably estimate at ~2 weeks), but it’s striking how much of the investment in generating the results is capital, and how little of it is labor. That is, now that I have the package downloaded, I can do this easily for other datasets. Time to start picking some low-hanging fruit.
Absolutely. When I look at my own projects, they go like ‘gathering and cleaning data: 2 months. Figuring out the right analysis the first time: 2 days. Runtime of analysis: 2 hours.’
The first time this happened to me, it drove me nuts. It reminded me of writing my first program, where it took maybe 20 minutes to write it even looking everything up, and then 2 hours to debug. That was when the true horror of programming struck me. Years later, when I came across the famous quote by Wilkes that “I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.” I instantly knew it for truth.
That was when the true horror of programming struck me. Years later, when I came across the famous quote by Wilkes that “I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.” I instantly knew it for truth.
This realization caused one of my big life mistakes, I think. It struck me in high school, and so I foolishly switched my focus from computer science to physics (I think, there might have been a subject or two in between) because I disliked debugging. Later, I realized that programming was both powerful and inescapable, and so I’d have to get over how much debugging sucked, which by then I had the emotional maturity to do (and I suspect I could have done back then, if I had also had the realization that much of my intellectual output would be programming in basically any field).
I think the whole experience is also interesting on a meta-level. Since programming is essentially the same as logical reasoning, it goes to show that humans are very nearly incapable of creating long chains of reasoning without making mistakes, often extremely subtle ones. Sometimes finding them provides insight (especially in multi-threaded code or with memory manipulation), although most often it’s just you failing to pay attention.
Threading is not normally part of logical reasoning. Compare with mathematics, where even flawed proofs are usually (though not always) of correct results. I think a large part of the difficulty of correct programming is the immaturity of our tools.
Experience that I had recently that I found interesting:
So, you may have noticed that I’m interested in causality. Part of my upcoming research is using pcalg (which you may have heard of) to identify the relationships between sensors on semiconductor manufacturing equipment, so that we can apply work done earlier in my lab where we identify which subsystem of a complex dynamic system is the root cause of an error. It’s previously been applied in automotive engineering, where we have strong first principles models of how the systems interact, but now we want to do it in semiconductor, where we don’t have first principles models of how the systems interact, and need to learn those models from data.
Time to get R installed and pcalg downloaded correctly on Ubuntu: ~2 hours. (One of the packages that pcalg requires requires R 3.0, which you need to modify the ubuntu update files to get instead of R 2.14, and a handful of other things went wrong.)
Time to figure out how to get my data into R with labels: ~2 minutes.
Time to run the algorithms to discover the causal network for the subsystem I have data for now: ~2 seconds.
I’m not sure I should also count the time spent learning about causality in the first place (which I would probably estimate at ~2 weeks), but it’s striking how much of the investment in generating the results is capital, and how little of it is labor. That is, now that I have the package downloaded, I can do this easily for other datasets. Time to start picking some low-hanging fruit.
(Living in the future is awesome: as much as I complained about all the various rabbit holes I had to go down while installing pcalg, it took way less time than it would have taken me to code the algorithms myself, and I doubt I would have done anywhere near as a good a job at it.)
Absolutely. When I look at my own projects, they go like ‘gathering and cleaning data: 2 months. Figuring out the right analysis the first time: 2 days. Runtime of analysis: 2 hours.’
The first time this happened to me, it drove me nuts. It reminded me of writing my first program, where it took maybe 20 minutes to write it even looking everything up, and then 2 hours to debug. That was when the true horror of programming struck me. Years later, when I came across the famous quote by Wilkes that “I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.” I instantly knew it for truth.
(Same for http://lesswrong.com/lw/j8f/anonymous_feedback_forms_revisited/ - downloading the data and figuring out how to join up the two CSVs in just the right way, an irritating hour. The logistic regression? Maybe 3 minutes playing around with different predictor variables.)
This realization caused one of my big life mistakes, I think. It struck me in high school, and so I foolishly switched my focus from computer science to physics (I think, there might have been a subject or two in between) because I disliked debugging. Later, I realized that programming was both powerful and inescapable, and so I’d have to get over how much debugging sucked, which by then I had the emotional maturity to do (and I suspect I could have done back then, if I had also had the realization that much of my intellectual output would be programming in basically any field).
I think the whole experience is also interesting on a meta-level. Since programming is essentially the same as logical reasoning, it goes to show that humans are very nearly incapable of creating long chains of reasoning without making mistakes, often extremely subtle ones. Sometimes finding them provides insight (especially in multi-threaded code or with memory manipulation), although most often it’s just you failing to pay attention.
Threading is not normally part of logical reasoning. Compare with mathematics, where even flawed proofs are usually (though not always) of correct results. I think a large part of the difficulty of correct programming is the immaturity of our tools.