A question and a tail

This is a rambling post, and I will appreciate your criticism to help dry it or delete it altogether.

It seems that however little a question I research by reviewing [botanical] literature, there is always a much more complex, and rather difficult to rigorously put, question that I have to ask for the first one to be meaningful. The second answer (or tier of answers) doesn’t add much to the information I will build upon, but it might—just might! - add uncertainty to the result or allow predictions in advance. How do we use it in advance? We don’t apply formal reasoning, usually, and yet somehow we use it!

1.

Consider: a certain invasive plant has a host of adaptations beneficial to its success. (They probably wouldn’t be sufficient if there were some actual effort to manage manmade ecosystems, but duh.) A trait many IP share is the ability to increase their ploidy—from 2 to 3, 4, 6, 8 or even 10 sets of homologous chromosomes, etc. (Polyploidization sometimes happens even in single cells in somatic (= non-reproductive) tissues, so it’s really a heavily used shortcut.)

Now, suppose I want to see how a different specific property of the species behaves abroad. I will have to check the ploidy level, of course! Quick, what does the literature say, how many chromosomes can it have?

...but wait. Make no mistake, I do have to count them; but what if there is a continent-wide study showing that it generally has 4n in Eastern Europe?.. That would allow me to at least expect 4n, or whatever amount they found, and see if there is any research specifically dealing with this situation within its native range.

...but wait. Of course, those findings will be useful in discussion if I find 4n, but if I don’t, they will be just a point in the overall space of possibilities. Still relevant, but not worth putting much explanatory weight on.

Something in my brain evaluated the usefulness of a piece of data other people have found, which I myself have yet to look up, of whose exact composition I have no idea—perhaps there are simply no other reports! - and placed it in context of what I really expect to do.


2.

Okay, if I can think so about other people’s writings without even reading them, then maybe I can compile a dummy set of data I expect right now and compare them to those I will find in the literature. And later, to actual data. Here’s a simplified problem that doesn’t approach labwork on any scale (I don’t want to add too many qualifiers).

Let us ‘measure’ 8 parameters, and check if there have been studies that have found correlations between at least some of them (and maybe with some other ones), and then try to see if our expectations based on knowledge of study area and casual surveys fit our expectations based on published research in any specific way. We are not ready to put forth any causal structure—no real data yet—though we strongly suspect (80%) that all the parameters are in some way linked to each other.

The following table is rough and repetitive, but I think useful as an illustration of how things brew in [my own] a not-much-clever student’s head. The numbers are ‘dimensionless’, distributions are normal, total number of studies measuring each parameter is 7 or less, and all correlations are no less that 0.8.

<col> <col> <col> <col> <col> <col>

Parameter

Total range

Our expected data ±SE

Reported data range*

Our imaginary correlations

Reported correlations

A

1-12

8±1

4-10

A&F, A&H

A&D, A&F, A doesn’t correlate with anything if nothing else correlates with anything

B

1-5

2±1

1-4

B&C, B&E, B&G, B&H

B doesn’t correlate with E if F&H

C

1-100

35±20

80±7 (only one other study)

C&B, C&F, C&H

Unknown

D

1-28

6±2

2-18

D&F

D&G (and then E&F)

E

1-500

200±46

150-480

E&B, E&G

E&F if D&G

F

1-50

47±8

8-45

F&A, F&C, F&D, F&H if A&H

F&A, F&H (and then B doesn’t correlate with E)

G

1-25

18±2

11-20

G&B, G&E

G&D (and then E&F)

H

1-40

23±10

1-40

H&A, H&B, H&C, H&F (and then H&A)

H&F (and then B doesn’t correlate with E)

*as in, ‘for this species, out of 1-12 that are altogerther possible, only 4-10 have been so far observed. It might mean that 4-10 is the actual range, but the prior for that is about 60% due to difference in methodologies used by various researchers and to the fact that only a part of the species’s habitats have been studied’ etc.


Now I understand that this is hardly the most profitable presentation method and statistics has advanced much since Pearson and eveything. It is just that I find it difficult to compare graphs with diagrams with clouds along axes as they are published in different papers. I only want to guesstimate if my data fit a pattern, to discuss them qualitatively. To stratify the parameters in such a way that I will place explanative weight on some of them, and report the others to give a full picture. I have to do this explicitly, because I know I am doing this implicitly – it’s a feeling I get, of brain working and deciding and not showing me what it has.

I cannot speak about A, only that maybe A, H and F do have something in common – perhaps I haven’t measured it. B looks rather suspicious; I will need to reread that other report. C is intriguing, but ultimately belongs to the ‘lower value stratum’, and maybe those correlations I found are spurious; if only there was a way to reduce the variability… but it won’t be cost-efficient. E, F, D and G also might be worth discussing together. F by itself doesn’t seem very meaningful, unless there is a causal connection to the others; too bad one can imagine many plausible explanations for that. I will probably start discussion with H, since it probably has been studied for other plants and at least something has already been proposed.

Now when I have my own data I will see where they deviate from my expectations, and that will be some knowledge I can put into words, and I will hopefully start calibrating myself on these matters. And on matters of Discussion structuring:)