Not OP but I think Functional Threshold Power is fine. I don’t know of any literature directly comparing it to VO2max, but much of the literature on VO2max didn’t actually measure VO2max, it used proxies like “maximum gradient at which a participant can walk for 3 minutes” (called the Balke treadmill test. When meta-analyses report that VO2max strongly predicts health outcomes, what they usually* mean is “VO2max, and also various proxies for VO2max, when thrown together into a meta-analysis, strongly predict health outcomes”. So as far as I can tell from what (little) research I’ve looked at, there are a lot of metrics that work and it’s not clear which ones work better than others. And FTP seems like as good a measure as any.
Not OP but I think Functional Threshold Power is fine. I don’t know of any literature directly comparing it to VO2max, but much of the literature on VO2max didn’t actually measure VO2max, it used proxies like “maximum gradient at which a participant can walk for 3 minutes” (called the Balke treadmill test. When meta-analyses report that VO2max strongly predicts health outcomes, what they usually* mean is “VO2max, and also various proxies for VO2max, when thrown together into a meta-analysis, strongly predict health outcomes”. So as far as I can tell from what (little) research I’ve looked at, there are a lot of metrics that work and it’s not clear which ones work better than others. And FTP seems like as good a measure as any.
For example, have a look at Table 2 in Impact of Cardiorespiratory Fitness on All-Cause and Disease-Specific Mortality: Advances Since 2009, which gives a list of studies and what measure each study used. You can see that they used a variety of fitness metrics.
*I’ve only actually looked at two meta-analyses