I somehow learned about the eggshell skull doctrine long before I started more systematic study of law. If that’s also the case for you, then it seems it has some kind of virality that leads a lot of people to learn about it who have otherwise not studied tort law.
And that’s unfortunate, because it’s actually widely regarded as an exception in law, which typically requires foreseeability in tort. E.g.: a part of a mill is in for repair, repair guy is 48 hours late, mill is shut down for 48 hours because they’re waiting for the part, can they sue for the 48 hour shutdown? Answer: no, because normally mills will have an extra of such part, and there was no way for the repair guy to know that this one didn’t and therefore time was of the essence.
(IANAL, but I once had a seasoned lawyer tell me I was operating at the level of a second-year associate.)
But also, back to the object discussion: the feelings of being exploited are relevant only in that a definition of exploitation is trying to operationalize the underlying concept that causes many people to form an intuitive sense of exploitation. The primary harms of being exploited don’t actually depend on whether you feel exploited. Similarly, if you get in a cab and the cab driver demands $1000 for what should have been a $20 trip, but for some reason you don’t “feel ripped off” (because e.g. you feel like throwing around money, not because he delivered extra value by, say, being an exceptional conversationalist), you’re still out $980.
Darmani
Au contraire. I generally think that, for these kinds of definitions, individual cases about how someone feels have little bearing on the definition at all. There’s a well-cited dictum in American law that I wish I could find right now, to the effect of “it would be absurd for the law to decide whether an action was tortious specifically depending on the fortitude of the recipient. ” I think the same applies to the present context. After all, several kinds of exploitative behavior are tortious (e.g.: hostile work environment) if not outright violative of labor law (e.g.: OSHA), these categories were largely not illegal 150 years ago, and there’s no fundamental reason why the law might not grow to protect against more kinds of exploitative behavior.
This is why many definitions might be phrased in terms of “a person of ordinary fortitude” or with lines like “that would cause a reasonable person to feel distress. ” This is usually applied as a razor excluding particularly fragile victims, but the razor cuts both ways.
Likewise, someone saying they’re exploited for being asked to come to work on time also has little bearing on the validity off the concept. I don’t have a real example of that at hand, but i do recall very similar examples that merely substitute the word “racist” for “exploited.” I also recall another even stronger example from another domain, but unfortunately some personal reasons make me wary of discussing that particular topic in public
In law and philosophy, definitions are oft constructed to operationalize intuition. But once constructed, the validity of the construct is scant influenced by individual cases.
I would still call the phone thing exploitative behavior—massive loss on you, at most a small benefit to themselves.
On a related note, I had an ex that was prone to angry outbursts, including for things that really do not make sense for her to get upset about, with the most memorable example being that she got really upset and started kicking random objects because I was taking too long to peel some asparagus (as we had just watched a Thomas Keller video on the topic, and I was trying to do deliberate practice on my technique). She followed up with “You have no talent at cooking.”
I’d describe that and similar incidents as: she engaged in abusive behavior, but her verbal blows did not connect in the manner needed to be full abuse. By that time I had enough mental fortitude and perspective that my reaction was more “Haha, she’s getting upset over something silly and saying something that obviously does not follow from she’s seen, silly girl, kiss me in the morning” than of feeling the emotional hits that would make me abused. It would have been different if these incidents had happened a few years earlier, when I was much more fragile.So for your situation: perhaps it would be fair to describe it as: your employer engaged in exploitative behavior, but did not successfully exploit you, because you were in a position to walk away. Someone who really did need the money may have responded differently.
By the definition of this post, the basic bargain being offered to migrant workers is not exploitative (not unless there are relatively costless things that can significantly improve their QoL that aren’t being done). But a number of the add-ons, like passport seizure, probably are.
I was in Dubai for the first part of COVID. I saw lots of migrant workers spending all day manning largely-empty retail businesses, bored out of their minds, not allowed to user their phones. I’d still rather be them than the ones working in the sun, but I found that exploitative.
To use an example from my day job: if I’m testing whether knocking out a candidate gene that I think will increase tryptophan in a plant tissue does indeed increase the tryptophan, p < 0.05 feels like a fine cutoff. If I’m testing whether that same KO causes my plants to communicate with me telepathically, I’d be crazy to tell anyone about my results unless I was seeing p < 0.001 in at least two independent experiments.
The difference in required p-value threshold to me seems to come down to the prior. Perhaps there’s no formal framework that combines them, but empirically I think that’s what I’m doing.As a Pearl-style half-Bayesian (see here) who chose the path of logic over stats more than a decade ago, I am not at all a good representative of the frequentist school. My impression though is that they would say these kinds of biases and assumptions live outside the domain of stats.
A Bayesian would just tell you to use Bayesian techniques such as credible intervals and maximum a posteriori estimators in lieu of p-values, which I have mostly not studied. I believe Bayesian techniques as a whole are conceptually simpler but computationally more difficult than frequentist ones.I think I’m happy to bite the bullet here and say the causal chain is long ….and that this is one of the exceptions
Well spoken!
Is there a similar toy example for “corr but no causal linkage”, e.g. “two variables on opposite sides of the program whose value is correlated for no good reason”
So first, if you’re talking about literal correlation coefficients, you can come up with examples where Y is a deterministic function of X, but their correlation coefficient is 0. Correlation coefficients measure linear relationships. Pretty sure you can do it using a piecewise-linear function with just two pieces, and probably also with a simple parabola.
(Just searched: Yes you can. See page 4 of here .)
When we speak of correlation, I think it’s more helpful to consider mutual information and conditional mutual information, which capture all dependencies, not merely linear ones. (At least theoretically—convergence properties of conditional mutual information estimation are terrible, and a reason I gave up on my first causality project in grad school.)
Mutual information between two constants is defined. But it’s always 0. So that’s still a non-example.
I confess that I actually had to look up how to get spurious correlations in a causal network. Here’s an example showing you can do it even without determinism.
....and then I realized this is still the opposite of what you’re looking for. Oops. I should have just gone to sleep.
Okay, you might be right about the spurious correlations not being possible in Pearl-style causal graphs.
What I recall from my investigations into this in 2015: (1) I was mostly concerned with trying to apply the PC algorithm for causal structure discovery to programs. PC essentially operates by looking for colliders, i.e.: independent variables that become dependent when conditioned on a mutual child. So spurious non-correlations were a bigger problem for me. (2) I found an old comment of mine where I mentioned having to condition on events of probabiilty 0 to apply the do-calculus.
To dive into the weeds a bit: The phrasing “with the evidentiary strength you would expect based on the p-value of the correlation” works, but the issue with p-values is much stronger than “that’s ignoring the prior.” There’s another type error there. p-values and priors do not mix. A p-value is a supremum over probabilities from a space of hypothesis. Suppose you have two possible null hypothesis: one generates your data with probability 10%, but you assign 0.000000000001% confidence to this hypothesis. The other generates your data with probability 1%, and you assign 99.999999999999% confidence to this hypothesis. Then you want your p-value to be approximately 0.01. But it’s actually 0.1. Sorry. p-values are fundamentally frequentist, not Bayesian, Frequentism rejects the idea that you can express confidence as a probability.
having discussed this point with dozens of people, including many with degrees in math and physics, exactly zero have ever brought up Reichenbach’s common cause
Not surprised. The mathematics of causality is still not part of a standard stats curriculum. I can’t fully blame them—I dove deep into this area in my first year of grad school, in 2015, and concluded the field is still fairly primitive. But a disproportionate number of people here are familiar with it. (Indeed, I was excited to dive into causality in grad school when I saw an excuse to do so precisely because of my exposure to the field’s existence through LessWrong.)
This is the point about dominoes in footnote 4, I can necker-cube between the two views of it being an extremely long causal chain or a very short one, “they share common ancestry”. I find the shorter view generally easier to reason about and it’s the one I use most of the time.
Indeed, just as I could describe the causal chain linking humans and jellyfish in a few words. I would like there to be a formal way to render both DNA and dominoes as a short causal chain. But I don’t have one.
Can you recommend any non-technical summary of or examples of non-faithful networks? The closes semantic match I found after a couple iterations of search was this paper, but it’s denser than I prefer. The core point
Think this got cut off.
A classic example from Pearl’s 2009 book: A and B are fair 0⁄1 coins, and C is their xor. Then the sets {A, C} and {B, C} each have pairwise independency, even though there are causal links A->C and B->C.
Pearl-style SCMs assume that every single node in a graph is ontologically independent, which makes unrolled models as suggested not particularly great.
From a paper co-authored by Pearl himself:
The problem with using structural causal models is that the language of structural models is simply not expressive enough to capture certain intricate relationships that are important in causal reasoning. The ontological commitment of these models is to facts alone, assignments of values to random variables, much like propositional logic. Just as propositional logic is not a particularly effective tool for reasoning about dynamic situations, it is similarly difficult to express dynamic situations (with objects, relationships, and time) in terms of structural causal models.
( https://commonsensereasoning.org/2005/hopkins.pdf )
I haven’t been active in causality research since about 5 years ago, but I’m not aware of any good solutions to the time problem. I do know there are proposals for models that make improvements for causality involving sets of related variables, e.g.: platelet models. I think our own work on counterfactual probabilistic programming has a pretty strong basis, although the philosophy is fairly abridged in the paper.
Thanks! I’ve been convinced of the general falsity of Reichenbach’s principle.
tl; dr: The basic version of this is Reichenbach’s principle and is well-known. A lot of holes show up under a more advanced lens.
More precisely, if things are correlated, there exists a relatively short causal chain linking those things, with confidence one minus the p-value of the correlation.
Other than the p-value part (would like to see your reasoning there—p-values are not probabilities, so this reads like a type error to me), this is Reichenbach’s principle ( https://plato.stanford.edu/entries/physics-Rpcc/ ): a correlation between A and B implies that either A causes B, B causes A, or there’s a common cause for both. (Or you are conditioning on something caused by both; c.f. Berkson’s paradox.)
Reichenbach’s principle is trivially true in Pearl-style causal networks, but philosphers argue that it’s false for the causal network of the whole universe. I haven’t found the short versions of the arguments convincing, but you can read about them in the linked PLATO article.
There’s a few things to keep in mind:
As the size of a causal network grows, the set of correlations grows far faster than the set of causal relationships, until almost all correlations become spurious. While there may be a causal chain in common, I’ll point out that the correlation between the set of nucleotides used by humans and by jellyfish is explained by an extremely long causal chain. Generally, common causes that are very far away lead to very weak correlations, but they can still be strong if each link in the causal chain is strong, as in DNA. So you might want to say “if the correlation is detectable with relatively low precision, then the causal chain linking them is probably short.” But if you are only willing to use low precision to detect correlations, then you run into the approximate unfaithfulness condition below.
A causal network is faithful if all correlations between variables can be explained by their relation in the causal network. In a deterministic setting such as within a program, causal networks generally are not faithful, so that you can have two variables on opposite sides of the program whose value is correlated for no good reason. In a more general setting, it’s easy to show that non-faithful networks are extremely unlikely under light conditions, but Caroline Uhler showed that approximately unfaithful networks—networks which are close to an unfaithful network—are quite likely .
Thank you! This really made your thesis click for me.
Thanks for the clarification Zach!
Since I’ve been accused of lazy reading, I want to finish off the terminological discussion and put you in my shoes a bit. I get your motivation for preferring to call it “program synthesis” over “program induction,” but it turns out that’s an established term with about 60 years of history. Basically, to understand how I read it: replace every use of “program synthesis” with “techniques for searching constrained spaces of programs using things like SAT solvers and Monte Carlo.” If you also replace “Daniel Selsam” with “researcher in SAT solvers who started off in a lab that uses Monte Carlo to generate assembly programs,” then I actually think it becomes hard not to read it the way that I did—the way that you said was the opposite of the intended reading. And there aren’t really any clear cues that you are not talking about program synthesis in the existing sense—no clear “I define a new term” paragraph. You might think that the lack of citation to established synthesis researchers would be a tell, but, unfortunately, experience has taught me that’s fairly weak evidence of such.
So I read it again, this time replacing “program synthesis” with “Solomonoff induction.” It does indeed read very differently.
And I read your last comment.
And my main reaction was: “Wait, you mean many people don’t already see things this way?”
I mean, it’s been a full decade since Jacob Andreas defended his thesis on modularity in neural networks. I just checked his Google Scholar. First paper (>1600 citations, published 2016), abstract, first sentence: “Visual question answering is fundamentally compositional in nature.”
If neural nets are not doing a simplicity-biased search over compositional solutions, then what are they doing? Finding the largest program before the smaller ones? Not obtaining intermediate results? Not sharing substructure across tasks while still using a smaller number of weights? Developing novel algorithms that solve the problem in one-shot without any substeps?
I’d naively expect neural nets to, by default, be about as modular as biological systems, or programs evolved with GP. In many ways, very modular, but also with a lot of crazy coupling and fragility. Neural nets gain efficiency over GP by being able to start with a superposition of a vast number of solutions and grow or shrink many of them simultaneously. They also can be given stronger regularization than in biology. I would expect them to find a simple-ish solution but not the simplest. If they only found the most complex ones, that would be way more impressive.
Hmmm. Okay then, I’d like to understand your point.
But first, can we clear up some terminological confusion?
From this comment, it seems you are using “program synthesis” in ways which are precisely the opposite of its usual meaning. This means I need to do substantial reverse-engineering of what you mean in every line, in addition to trying to understand your points directly.
These are narrow hypothesis classes where you, the practitioner, chose a representation embedding your assumptions about problem structure.
This is a very confusing thing to write. I think you’re saying that various techniques are differentiated from program synthesis because they “choose a representation embedding assumptions about the problem structure.” But can you point to any papers in the program synthesis literature which don’t do this?
But the bigger reason I chose “synthesis”: I expect deep learning is doing something closer to constructing programs than enumerating over them. Gradient descent seems not to be (and couldn’t possibly be) doing brute-force search through program space,
I think you’re saying that you use the term “program synthesis” to mean “things that are not brute-force searches over program space.”
But a brute-force search over program space is a perfectly valid synthesis technique! It’s literally the first technique discussed in my advisor’s Intro to Program Synthesis course. See https://people.csail.mit.edu/asolar/SynthesisCourse/Lecture2.htm (scroll down to “Explicit Enumeration”).
Hi! I did my Ph. D. in a program synthesis lab that later become a mixed program synthesis / machine learning lab. “Machine learning is program synthesis under a different name” my advisor would say.
But my experienced turned out to not be very relevant to reading this post, because, I must say...I did not get what the point of this post is or the intended takeaways.
In genetic programming, there’s a saying “Learning programs is generalized curve-fitting.” And, indeed, the first chapter of “Foundations of Genetic Programming” is about trying to evolve a small AST that fits a curve. I gave a similar problem to my students as their intro to enumerative program synthesis.
As far as I can tell, that’s the entire meat of this post. Programs can be scene as fancy curves. Okay, I see a few paragraphs about that, plus pages and pages of math tangents. What else am I supposed to get from reading?
To be a little more blunt, reading this reminded me of the line “The book ‘Cryptonomicon’ is 1000 pages of Neal Stephenson saying ‘Hey, isn’t this cool’” as he goes into random digressions into basic cryptography, information theory, etc. Yes, the mechanistic interpretability of grokking is cool. Yes, it’s cool that it’s connected to representation theory. No, I have no idea how that’s related to any larger thesis in this post.
BTW, you’ll probably find the keyword “program induction” more fruitful than “program synthesis.” Program synthesis is a PL/FM term that usually refers to practical techniques for generating programs (or other objects that can be phrased as programs) from specs, examples, human feedback, existing code, etc. “Program induction” is an ML term that basically refers to what you’re talking about: the philosophy of supervised learning being “learning programs.”
I do not know hour they’re handling it! Most of my demoscene knowledge comes from a single event by the CMU Computer Club in 2009.
nit: When demosceners hand-write assembly, it’s not because they’re foregoing a compiler for the sake of it. They operate under performance and size constraints that require having assembly that compilers cannot output. They also often build for older platforms, with only comparatively primitive compilers available.
Consensus view is that they were shielded by those who did invest in it.
I’ve written more about Y2K at https://www.lesswrong.com/posts/zvQdgfFEDFQQhDDuS/y2k-successful-practice-for-ai-alignment
Small historical nit: China was actually ruled by the Mongols for most of the 1300s.
Berkeley Petrov Day
Am I the only one who finds parts of the early story rather dystopian? He sounds like a puppet being pulled around by the AI, gradually losing his ability to have his own thoughts and conversations. (That part’s not written, but it’s the inevitable result of asking the AI every time he encounters struggle.)
Most (but definitely not all) tort law principles are the same across common law jurisdictions, but you seem to be aware of that. I hope you found this digression interesting and relevant to the definitional discussion.
I agree that the things described at length in this post—which can largely be summed up as “being a migrant worker”—are not exploitative. But I think passport seizure is.
I’ll hop over to that thread.