Wednesday, April 29, 2020

Functional Threshold Power : A Scientific Scrutiny

Certain entities in the world have originated competing claims about cycling performance concepts, test protocols, and training zones the rest of the world must adhere to. The astute athlete cum observer would want to find out which ones stand scientific scrutiny and separate myth from fact.

In that spirit, this post is an appraisal of the definition and estimation techniques of Functional Threshold Power (FTP) which are at the core of this power-based training concept. It follows from the last post, where I explored a scientifically vetted threshold concept called critical power (CP) and most of its nuances, including application related issues.

This post attempts to explain with simple arguments and scientific references why FTP, although as "useful" a performance metric as it may be to some people, is a pseudo-scientific concept at best. 

I. Introduction to FTP : Definition and Estimation

FTP was conceptualized as a field-based practical method of estimating a threshold phenomena using cycling power meter technology. It is, like Critical Power, used as an endurance index to design training prescription as well as classify cycling talent. A threshold, as a reminder, is an intensity marker just above which physiological responses will sharply change whereas below it, attains a steady-state within a tolerance band. 

The training concept was formalized by Coggan in the book Training and Racing with a Power Meter (TARWAPM) in the early 2000s and an ecosystem was built around FTP consisting of sister metrics (NP, IF, TSS, etc) and software marketed by Training Peaks group. Some of the history behind how this all came to be is documented on TARWAPM's blog page.

Andrew Coggan PhD signing copies of TARWAPM books. Source : TARWAPM's blog page. 

Let me cut to the chase and quote the 3rd edition of TARWAPM's definition of FTP, marking in red some key terms that I will explore further : 

"FTP is the highest power that a rider can sustain in a quasi-steady state without fatiguing. When power exceeds FTP, fatigue will occur much sooner (generally after approximately one hour in well-trained cyclists), whereas power just below FTP can be maintained considerably longer. [1]"  --- (1)

The text lists around 7 different methods to estimate FTP.

1) From a power & time-frequency distribution chart from cycling training and racing data.
2) From routine steady power intervals, repeats or longer climbs.
3) From normalized power (NP) during hard mass-start races of approximately one hour.
4) From a one-hour time trial by inspecting a smoothed time-series plot of power.
5) From a power duration model. obtained by testing for CP, and where the resulting model derived value of CP is suggested to be interchangeable with FTP.
6) From the proprietary mFTP model in WKO4.
7) From the FTP testing protocol consisting of a 28-minute warmup, a main set of 20 minutes, and a cooldown of 10-15 minutes. 

The 7th method is probably the most notoriously proliferated in cycling lexicon. The premise being that subtracting 5% from the main set time trial of 20 minutes after a hard warm-up will estimate FTP (hereby, called FTP20 for simplicity).  A. Coggan has not until recently distanced himself from this estimation technique, saying that it was Allen Hunter's contribution and not his.

There are other methods of estimating FTP which is coded into popular programs like Zwift and Trainer Road and yet another confusing bunch of "new" test protocols on Training Peaks' website. The validity of these techniques are in question besides the obvious danger of under/overestimating some individuals. Therefore, this post is purely focused on the original concept of FTP and its test protocols as codified in TARWAPM. 

II. A Deeper Inspection of FTP

Let me inspect in slightly more detail the terms highlighted in red in the definition of FTP in (1). 

A) The Issue of "Quasi-steady State" 

A quasi-steady state is meant to describe a transient situation where physiological variables such as blood lactate and VO2 are rising but remain within the zone of uncertainty. Using field power estimates, a quasi-steady state can be attributed to the variation of power values over the time duration. 

Scientific studies demonstrate the ability to work at a "quasi-steady state"  at critical power, where workloads are on the order of 10-15 Watts higher than what can be sustained for one hour. Critical power, from my previous post, corresponds to a workload of approximately mid-way between lactate threshold (or gas exchange threshold) and VO2max.

Moreover, the time to exhaustion at that workload is lower, around 24 minutes or so. This has been demonstrated in both physically active subjects and competitive cyclists (see Figures 1,2 below). Besides, even a shorter maximal time trial lasting around 30 minutes was demonstrated to show quasi-steady-state behavior in both power outputs and physiological variables (see Figures 2B, 2C). 

Therefore, the FTP concept systematically underestimates the wattage, or the workload that can be achieved at a quasi-steady state. 

Figure 1 : The Poole study showed that in physically active subjects, the group mean of metabolic demand when working at constant-load exercise at critical power, much higher than that of lactate threshold, resulted in steady state VO2. Time to exhaustion in these subjects was 17.7 +/- 1.2 minutes. From research reference [6].
Figure 2 : The de Lucas study in competitive cyclists showed that quasi steady state VO2 was achieves at workloads at CP. Here again, the group mean for time to exhaustion was 22.9 +/ 7.5 minutes. From research reference [7].
Figure 2B : Quasi-steady state power outputs were shown in intense 30 min TTs conducted on well-trained triathletes. See reference [12]. 

Figure 2C : Quasi steady states in metabolic demand and blood lactate values were shown in a much more intense, shorter TT lasting 30 minutes on well trained triathletes. Moreover, study demonstrates that subjects can sustain very high values of blood lactate for extended time, some > 10mM. See reference [12].

B) The Issue of "Absence of Fatigue"

The highest workload one can sustain for about an hour does not or cannot occur in the absence of fatigue.

A well-documented study in well trained cyclists who completed a 4, 20 and 40km time trial demonstrated that central and peripheral fatigue occurred in all distances, including the 40km TT which took approximately 65 minutes to complete.  The pattern of central vs peripheral fatigue shifts from peripheral dominant over 4km towards central fatigue over 20km.  In other words, the decline in the ability to produce force residing within the central nervous system was higher in the longer time trials. Given this data, exercising at the highest workload corresponding to quasi steady-state in the "absence of fatigue" is notionally incorrect. 
Figure 3 : Exercise induced impairment in the ability to produce muscular force measured in well trained cyclists who executed 4km, 20km and 40km time trials. Fatigue, central and peripheral, are prominent in all duration time trials with central fatigue being highest during longer time trials. Reference and adapted from [8]. 

C) The Issue of "Highest Power" at Quasi-steady State 

Following the arguments from A), the power output that can be sustained for "approximately" an hour does not correspond to the "highest power" for which a quasi steady-state can be achieved. The study referenced in Figure 3 shows that lactate values for the 20km TT also stabilized around the 8km mark and barely increases until the end spurt, despite being at a 15-20W higher than the 40km TT wattage.

FTP arbitrarily pegs a duration of "approximately" one hour to "highest power at quasi-steady state" which is not actually the case. What is obvious though, is that the workload at FTP is an unambiguously steady-state and therefore, is not the highest "quasi-steady state" intensity. 

Figure 4 : Lactate values in a 20km TT stabilize at the 8km mark and barely increase until the end spurt, despite being a 15-20W higher than a 40km TT. The latter took 65min to complete, which going by FTP definition, would correspond to "approximately" one hour.  Reference and adapted from [8]

D) The Issue of "Functional Threshold" As a Surrogate for Laboratory-Based Testing

FTP originated as a practical field-based alternative to lactate threshold testing. "Threshold" in concept refers to sharp distinctions in physiological responses associated with exercise slightly below and above a specific intensity value.  

However, as we have seen previously, the maximum intensity of "quasi-steady" state exercise has been shown to have sustainable durations much lesser than approximately an hour. So it is questionable why FTP should lay claim to accurately representing threshold in a wide group of people. 

Having stated that, how do comparisons between FTP and laboratory indicators of threshold match up in scientific studies? 

Let's take a look : 

1) FTP compared against individual anaerobic threshold (IAT) power:  Although empirical demonstrations have shown FTP20 and IAT are close, authors of one recent study stated: "…it is difficult to accept FTP as a thoroughly valid concept. We found large limits of agreement between most variables, suggesting a high level of inter-individual variability in the relationship between FTP20 vs. FTP60 and between both measurements vs. IAT (me: stepwise lactate profile test)."  [2]  

In other words, wide limits of agreement in a Bland-Altman plot shows that any agreement between a surrogate method (FTP) and a laboratory-based marker, here IAT, must be ambiguous.

Figure 5 : Bland altman plot of FTP20 compared to individual anaerobic threshold (IAT)  in 23 well trained cyclists. See reference [2].

2) Against maximum lactate steady state (MLSS) power: The same authors from the study above compared FTP with another threshold concept called MLSS and found generally good agreement. However, even in this study, wide limits of agreement were bserved between FTP20 and MLSS among different groups of cyclists with different training statuses, implying ambiguous agreement between the two when we look at heterogeneous samples [3]

Figure 6: Comparison of FTP20 with MLSS in 15 cyclists - 7 trained and 8 well trained. See reference [3].

Another study that studied validity to MLSS concluded: "The results indicate that the PO at FTP95% is different to MLSS, and that changes in the PO at MLSS after training were not reflected by FTP95%.  Even when using an adjusted percentage (ie, 88% rather than 95% of FTP20), the large variability in the data is such that it would not be advisable to use this as a representation of MLSS." [14]

3) Against Lactate Threshold (LT) power: Foster came to a similar conclusion as the previous two studies when comparing LT and FTP20. They wrote: ".....caution should be taken when using the FTP interchangeably with the LT as the bias between markers seems to depend on the athletes’ fitness status. Whereas the FTP provides a good estimate of the LT in trained cyclists, in recreational cyclists, FTP may underestimate LT." [4]

Figure 7 : Limits of agreement between FTP and lactate threshold studied in 20 healthy cyclists. See reference [4]. 

4) FTP Compared Against A Range of Blood Lactate Threshold Markers: One study compared FTP20 with a range of laboratory-based blood lactate measurements, such as LT, LT at 4mmol blood lactate, Dmax derived LT, and IAT (LT = lactate threshold).  The main objective was to find the best correlate of FTP in a single study. 

They demonstrated that all computations resulted in numbers that differed significantly from FTP20. Despite the strongest correlation being between FTP and LT4.0, a large dispersion of approximately 100 Watts was found in the inter-individual data questioning their equivalence. The study concluded: "...we suggest that FTP does not have an equivalent physiological basis to any of the tests used herein and, therefore, cannot be used interchangeably." [9] 

Figure 8 : FTP compared to a host of lactate parameters in 20 competitive cyclists. See reference [9].

The overall picture from the previous studies shows that claiming FTP can be used as an accurate surrogate for laboratory-based measures of threshold is at best, unfounded.

E) The Issue of FTP20 method and "False Sense of Precision" 

As a matter of practical convenience, the second and third editions of TARWAPM suggested the FTP20 method as a way to estimate 60 minute FTP.

The issue with this technique is that the 95% computation is probably an average for a large group of cyclists but not exactly applicable to you or I mainly due to inter-individual variability [5]. Some people will be at 93%, some at 90%, some at 85%. This was also shown by A. Coggan himself (see Figure 9).

One prominent exercise physiologist told me: "A value of 92-93% is probably closer on average, whereas a value of 95% would, therefore bring the estimated threshold back towards 30MMP (30 minute mean maximal power)!"

Figure 9 : 95% of 20min power is not necessarily one hour FTP. Source : Facebook fan-page of TARWAPM.

As published by A. Coggan in a whitepaper in March 2003, the real effect of employing an arbitrary correction factor to 20 min power may simply be to convey a false sense of precision [10].

While its understood that he would like to distance himself from the FTP20 method, I would add that continuing to perpetrate the false sense of precision in the TARWAPM book does not make false sense of precision go away. Besides, the entire discussion of whether the correction factor should be 95%, or 90% does not take away from the fact that FTP is arbitrary linked to "approximately one hour" with an unfounded claim to being the "highest workload" at quasi-steady state. Will two wrongs make a right?

F) The Issue of FTP derived as CP from W-time plot

In my previous post, we looked into several research studies showing how critical power defines the boundary between heavy and severe intensity. In numerous research studies, work-rates at markers of thresholds such as LT and MLSS were found to be lower than work-rates at CP. In fact, it falls somewhere midway between LT and VO2max, depending on which study you look at. 

With that information available, CP is a high-intensity workload that may be sustained only approximately 30 minutes or less. Therefore, approximately one hour of power (FTP) and CP should not be considered interchangeable in principle without data. As one research team noted, a fresh study involving a wider cohort of subjects is worthwhile to continue to test this idea of interchangeability [11]. 

In a study conducted by Morgan, FTP20 and CP correlated with each other but the limits of agreement were found to be relatively large (+ 10.9 to -13.1%) such that the authors argued: "...limits of agreement between CP and FTP in this study may be too large to be practically meaningful for athletes and coaches, and that the agreement between the two variables may be coincidental." [5]

The idea advanced in TARWAPM Ed.3 that FTP can be estimated from a linearized Work-Time plot and considered interchangeable with Critical Power is unfounded.

G) The Issue of Secret Sauce In Modeled FTP

In TARWAPM, one of the methods to estimate FTP is from modeling it from a collection of mean maximal power (MMP) values collected in a specific time frame window. The value of FTP is the resulting parameter solved from the fit, called modeled FTP or mFTP.

However, mFTP modeling is only available in the proprietary software WKO4 (now WKO5). On the Wattage forums, A.Coggan has claimed that data from over 200 MMP values show mFTP to be 60 +/- 13 min, and he's used this as an argument to claim that FTP is sustained for "approximately" an hour.

TARWAPM calls the modeling technique the "secret sauce" implying that the proprietary fitting method is not available for open scrutiny, only its outputs are. This roadblock might explain why most studies have used the FTP20 estimation technique to explore their research questions. Compare this to the CP concept which is pretty much open-source and tenable to research to advance our understanding in wide groups of people and wide groups of sporting activities.

H) The Issue of FTP Based Stress Metrics and One Hour 

While TARWAPM's definition that FTP is based "approximately around an hour" continues to be proliferated, other metrics in the FTP ecosystem such as the Bannister style "Training Stress Score" (TSS) is still arbitrarily pegged to an hour. The math in the formula for TSS has been designed in such a fashion as to result in 1 hour at FTP = 100 TSS. This indicates that the formula was designed with an arbitrarily fixed value in mind for convenience, rather than basing it on physiological reality.  Since training prescription and fitness performance charts in the FTP ecosystem are based on TSS, flaws are propagated throughout the mathematics chain.

III. Conclusion

FTP was borne out of a perceived need for field testing convenience and one might add, an entrepreneurial excitement to build a quantification ecosystem when power-meters hit the market beginning in the late 1990s. 

As a purely performance-based metric, FTP is "useful", just as critical power concept and modeling for CP is useful. However, in comparison to CP, the number of papers scrutinizing FTP has been woefully and remarkably small in number. Many of them demonstrates that the validity of FTP is in question. 

I conclude with a summary of reasons why FTP must be approached with caution by whomsoever is using it or plans to adopt it : 

1) FTP's definition that it is the "highest" workload one can sustain a quasi-steady state is not demonstrated in studies. This might systematically under-estimate the intensity where quasi-steady states can be achieved. This also implies that FTP is an intensity area where one is unambiguously at steady state. 

2) FTP's claim to be a valid and accurate surrogate for lab-based testing for a range of thresholds is unfounded. Besides, any claims that the concepts like critical power and FTP can be interchanged through modeling work is unfounded and probably a serious error. There have been recent calls by scientists to consider CP alone as the gold standard when the goal is to define maximum lactate steady state [13].

3) FTP's claim that it is approximately one hour of power that can be sustained without fatigue is most definitely incorrect. 

4) Despite acknowledgement of variability, accompanying metrics in the FTP ecosystem like Training Stress Scores continue to be arbitrarily pegged to an hour (1 hour at FTP = 100 TSS). This continues to spread the already wide spread confusion that FTP is 1-hour power which it is not. 

5) Widely profilerated estimation techniques for FTP, such as the FTP20 method is incorrect. As the originator of the FTP concept describes, it simply yields a false sense of precision. However, the proliferation of this false sense in the TARWAPM book does not make false sense of precision go away.

Regardless of its conceptual flaws, I acknowledge that FTP has found favor with coaches and athletes who use it simply for its training value. However, testimonials and anecdotal evidence are separate from science. Claims made about FTP and its accompanying ecosystem warrant additional scientific scrutiny. The collection of knowledge we currently have from research suggests that those claims are weak and not based on scientific fact.


1. Allen Hunter. Training and Racing with a Power Meter . VeloPress. Kindle Edition. 

2. Borszcz, Fernando & Tramontin, Artur & Bossi, Arthur & Carminatti, Lorival & Costa, Vitor. (2018). Functional Threshold Power in Cyclists: Validity of the Concept and Physiological Responses. International Journal of Sports Medicine. 39. 10.1055/s-0044-101546. 

3. Borszcz, Fernando & Tramontin, Artur & Costa, Vitor. (2019). Is the Functional Threshold Power Interchangeable With the Maximal Lactate Steady State in Trained Cyclists?. International Journal of Sports Physiology and Performance. 14. 1029-1035. 10.1123/ijspp.2018-0572. 

4. Valenzuela, Pedro L. & Morales, Javier S. & Foster, Carl & Lucia, Alejandro & de la Villa, Pedro. (2018). Is the Functional Threshold Power (FTP) a Valid Surrogate of the Lactate Threshold?. International Journal of Sports Physiology and Performance. 13. 10.1123/ijspp.2018-0008. 

5. Morgan, Paul & Black, Matthew & Bailey, Stephen & Jones, Andrew & Vanhatalo, Anni. (2018). Road cycle TT performance: Relationship to the power-duration model and association with FTP. Journal of Sports Sciences. 10.1080/02640414.2018.1535772. 

6. Poole, David & Ward, Susan & Gardner, Gerald & Whipp, Brian. (1988). Metabolic and respiratory profile of the upper limit for prolonged exercise in man. Ergonomics. 31. 1265-79. 10.1080/00140138808966766. 

7. de Lucas, Ricardo & Mendes de Souza, Kristopher & Costa, Vitor & Grossl, Talita & Guglielmo, Luiz Guilherme. (2013). Time to exhaustion at and above critical power in trained cyclists: The relationship between heavy and severe intensity domains. Science & Sports. 28. e9- e14. 10.1016/j.scispo.2012.04.004. 

8. Thomas, Kevin & Goodall, Stuart & Stone, Mark & Howatson, Glyn & Gibson, Alan & Ansley, Les. (2014). Central and Peripheral Fatigue in Male Cyclists after 4-, 20-, and 40-km Time Trials. Medicine and science in sports and exercise. 47. 10.1249/MSS.0000000000000448. 

9. Jeffries, Owen & Simmons, Richard & Patterson, Stephen & Waldron, Mark. (2019). Functional Threshold Power Is Not Equivalent to Lactate Parameters in Trained Cyclists. Journal of Strength and Conditioning Research. 1. 10.1519/JSC.0000000000003203. 

10. Coggan, Andrew. (2003). Training and racing using a power meter: an introduction. 

11. McGRATH, Eanna & Mahony, Nick & Fleming, Neil & Donne, Bernard. (2019). Is the FTP Test a Reliable, Reproducible and Functional Assessment Tool in Highly-Trained Athletes?. International journal of exercise science. 12. 1334-1345.

12. Perrey, Stephane & Grappe, Fred & Girard, A & Bringard, AurĂ©lien & Alain, Groslambert & William, Bertucci & Rouillon, J. (2003). Physiological and Metabolic Responses of Triathletes to a Simulated 30-min Time-Trial in Cycling at Self-Selected Intensity. International journal of sports medicine. 24. 138-43. 10.1055/s-2003-38200. 

13. Jones, Andrew & Burnley, Mark & Black, Matthew & Poole, David & Vanhatalo, Anni. (2019). The maximal metabolic steady state: redefining the ‘gold standard’. Physiological Reports. 7. 10.14814/phy2.14098.

14. Inglis, Erin Calaine & Iannetta, Danilo & Passfield, Louis & Murias, Juan. (2019). Maximal Lactate Steady State Versus the 20-Minute Functional Threshold Power Test in Well-Trained Individuals: “Watts” the Big Deal?. International Journal of Sports Physiology and Performance. 1-7. 10.1123/ijspp.2019-0214. 

No comments: