Friday, April 3, 2020

Critical Power Concept in Exercise : Critique And Applications


Work and the duration that work can be performed for has a definite relationship in most living species, including humans, horses, mouse and salamanders. In humans, it's validity has been shown for running, cycling, swimming and rowing. It is valid for any activity where the limits of sustainable oxygen consumption is sufficiently challenged.

If you considered running, the most practical observation of such a relation is what runners know as the decrease of running speed with increase in distance and vice versa. In other words, maximal work output is higher the shorter the distance (or time duration) and lower the longer the distance.

There is some high intensity value of movement speed between these two extremes, which could be held for a long time (in most published studies - well under one hour) without "blowing up".

Conceptually, critical velocity or critical power is approximately equal to the highest steady state speed or power output with the internal body state in homeostasis. In a recent review, Jones called CP the "gold standard" when the goal is to determine maximal metabolic steady state [11].


Exercise concepts must have good descriptions that link back to what actually takes place in the body. A good model would have a bio-energetic basis. In this respect, critical power (CP) has well established scientific underpinnings, unlike "other" training concepts in commercial circulation today. (There are of course models that are simply empirical, and do not help us understand how model parameters relate to something within our own bodies)

CP is thought to represent the highest rate of aerobic energy supply available for exercise. On an intensity spectrum, it forms the lower limit for the severe exercise intensity regime and an upper limit for the heavy exercise intensity regime. 

The breakdown of metabolic control variables when exercising above CP. Black dots = baseline values. Gray = new values at work > CP. Source [2].

In this severe intensity regime, intramuscular metabolic control breaks down, and such exhaustive exercise results in the attainment of low end-exercise pH, [bicarbonate] and [PCr] values irrespective of the chosen work rate and a continuous increase in blood [lactate], pulmonary VO2 rate and ventilation relative to baseline values.

CP becomes the "threshold" beyond which metabolic control is lost by the individual. 

Beyond CP, a slow component of VO2 that was previously under control, rises so steeply so as to speed up the body's breathing path to VO2max attainment within the span of a few minutes. The slow component of VO2 is thought to arise from the incremental use of fast twitch muscle fiber. Considering this, exercise above CP always happens on 'borrowed time'. 

Some 85% of the slow VO2 rise is linked to the recruitment of energetically costly fast-twitch (FT) muscle fibres as work intensity increases. The energy cost per unit force output is higher for FT fibers than for slow twitch (ST) fibers. The slow component of VO2 is not unique to humans; the same has been demonstrated in horses when they are exercised above their lactate threshold. [3]

The steep rise of slow component of VO2 at work > CP. Source [1]

In the hyperbolic critical power model, the term W' (vocally called W prime) represents a constant amount of work that can be performed above CP and is notionally equivalent to an energy store consisting of O2 reserves, high energy phosphates and a source related to anaerobic glycolysis.  The higher the sustained power output above the CP, the more rapidly the W' will be expended, and the greater will be the rate at which metabolites which have been associated with the fatigue process accumulate. 

The average time to exhaustion in work done above CP maybe in the order of 10-15 minutes at most depending on the size of the athlete's anaerobic reserves and motivation. In some laboratory tests, the average time to exhaustion in test subjects at work above CP was 13 minutes [1]. 

Even at CP, physiological steady state is not necessarily achieved. The time to failure at CP ranged from 25 minutes 1 second to 40 minutes 3 seconds [2]. This inter-individual variability hints to the obvious possibility that better trained athletes can sustain exercise at CP longer than less aerobically trained individuals. Some of this variation may also be linked to unfamiliarity with exercising at the estimated CP ("learning effect"). 

One definition of CP is that it is the "highest, non-steady-state intensity that can be maintained for a period in excess of 20 minutes, but generally no longer than 40 minutes." [2]


The work rate vs duration (Power-time, or p-t) relation has been mathematically represented in various forms by scientists over the course of the 19th century.

They are listed as follows :

1) The exponential CP model (Hopkins
2) The 3-parameter CP model (Morton
3) The 2-parameter CP model (Hill
4) The linear model (Moritani
5) The inverse time CP model (Whipp

These models and their mathematical representations are shown below :

CP models and their mathematical representation. Source [9].

Although there doesn't seem to a consensus on what is the best model, there has been relatively more attention and research on the hyperbolic forms [7].   This focus of this writeup is primarily in the use of the 2-parameter hyperbolic model which may not be the best model but is the most simple to apply.

Note : This year, a new paper was published detailing an "omni duration" power duration model. Basically, the authors describe an adopted discontinuous mathematical function that helps some of the traditional CP models achieve a better fit at very long durations (more on protocol and duration dependancies below). Details of this model is within the paper in reference [10].


The 2-parameter hyperbolic form of the p-t relation is shown below from a paper on the topic, clearly demarcating boundaries of moderate, heavy and severe intensity domains [1].

Two parameters are of interest in this model :

1) Critical power : This is the horizontal asymptote of the hyperbola, which when read off the y-axis, yields a value of power that could "theoretically" be sustained for ever but in reality, corresponds to a maximal duration of 60 minutes or less. Its units are in Watts.

2) W' : This is curvature constant of the model, signifying a constant "work" that can be done above critical power. Its units are in kilojoules.

Below CP, physiological balance is attained. This corresponds to the heavy and moderate areas in the plot. Above CP, VO2 is driven towards maximum and eventual exercise failure. That area is shown as the severe intensity region.

 The geometrical descriptions of CP. Source [1]

In terms of power output and oxygen consumption, the second plot shows the values represented on the exercise intensity regime.

Range of attainable power output in a young male along with the oxygen consumption attained. Shown in the intensity range are lactate threshold (LT) and critical Power (CP) along with VO2max, the point which results in termination of exercise. Source [8]

The hyperbola may also be linearized, in which case the linear relationship becomes one between work done and time duration. The y-intercept would then correspond to W' while the slope of the line would be critical power or velocity. The linear Moritani model is not discussed further here.


Any model is a mathematical simplification of a real world phenomena and by nature, is never fully correct. As far as whole body CP concept is concerned, four major assumptions in the simple 2 parameter CP model has been documented :

1. There are only two components to the energy supply system, termed aerobic and anaerobic.
2. Aerobic supply is unlimited in capacity but rate limited, the limiting parameter being CP.
3. The anaerobic capacity is not rate limited but capacity limited.
4. Exhaustion, by implication, termination of exercise, occurs when all of the anaerobic work capacity is exhausted.

The treatment of these assumptions has been done beautifully by Morton, and the reader interested in understanding the details of each assumption need to read the reference [5] below.  My conclusions from Morton's paper is as follows :

Assumption 1 : There are only two components to the energy supply system, termed aerobic and anaerobic.  Yes, this is largely true but only to an extent. The body has more than two energy systems.

Assumption 2 : Aerobic supply is unlimited in capacity but rate limited, the limiting parameter being CP.  This is not true, the aerobic capacity clearly has a limit in all humans. However, the statement that it is rate limited is correct. There is clearly a limit and you might define it by CP.

Assumption 3 : The anaerobic capacity is not rate limited but capacity limited.  True, explosive power generated from anaerobic capacity is limited. It is not true that it is rate limited.

Assumption 4 : Exhaustion, by implication, termination of exercise, occurs when all of the anaerobic work capacity is exhausted.  The human engine does not necessarily terminate exercise when all the glycogen stores, consequently, anaerobic work capacity, is exhausted. Research proves that at the point of exercise termination, there is still glycogen left in the body. The fine proof is that when nearing exhaustion, if the power output is just slightly lowered, subjects exercising should be able to continue on despite still working at supra-maximal power outputs.

All models have assumptions and to be able to validate the model also means that the assumptions should be correct. If they deviate from reality, the model is wrong, sometimes dead wrong. Like CP, similar assumptions can be generated the concept of FTP and the astute athlete and coach can treat each assumption and try to understand at what point the usage of the model fails and is inapplicable to the athlete.

Note : Around 9 total assumptions about the 2 parameter CP model have been treated in the paper by Morton [5].


Like any mathematical model, GIGO principle applies. All models are wrong, being a simplistic representation of reality.  The CP models are not immune from this deficiency. Other concepts such as FTP also suffer from model related errors.

Some of the weaknesses in CP modeling are listed as follows :

1) CP is protocol and model dependent : Critical power and its calculation has both model and protocol dependency. In a fantastic research study, scientists compared several models for estimating CP using different combinations of time-to-exhaustion exercise sessions in 13 young recreational cyclists. They not only found that the 3 parameter CP model fit the data best, but when they compared model fits from time duration combinations having more of the short durations, CP was over-estimated and W' under-estimated [9].

Different model fits and differences in parameters compared to criterion measure. Source [9].

In particular to our interest, the 2-parameter CP model was closest to the criterion measure only when mean duration combinations such as 7, 12 and 19 minutes were chosen, whereas when durations were consistently < 10 minutes, the model values were far from accurate [9].

There has been reports of large variations in the calculated value of W' arising from different models, particularly in sub-classes of athletes such as elite athletes [6].

2) Effect of very short only duration : When critical power is calculated from slope of the work-duration relationship using short supra-maximal exercises, the resulting power from models is higher than the power output which corresponds to a lab measured lactate "steady state" work intensity. The critical power also tends to be lower than maximal aerobic power [6].

3) Effect of long only duration : When critical power is calculated from very long sub-maximal exercise durations, the resulting power from the models tends to be lower than the power output which corresponds to a lab measured lactate steady state work intensity such as OBLA (onset of blood lactate) [6].

4) Due to the hyperbolic form of the model, small errors in CP translate to large errors in sustainable time duration. This reduces the predictive validity of CP when the model is misapplied by practitioners.

To get around some of these duration specific weaknesses, I can suggest a few things :

a) Critical power is supposed to be a "heavy" work output very close to maximal running velocity, maximal power output and maximal oxygen uptake as measured in the lab.  As such, it has been suggested that critical power should only be calculated from exhaustion times corresponding to "heavy sub-maximal exercises".  The recommended exhaustion time range is suggested as 6 - 30 minutes [6].  Below and above this range, the validity of the classic CP models are questionable.

b) Owing to research done in [9], it is best to include a mix of test durations in order to balance the short supra-maximal with the long sub-maximal. When using 3 durations, something like 7, 12 and 20 minutes is recommended. When using just two durations, using a range from 12-20 minutes may provide more accurate CP and W' estimations. The 2-parameter CP model yields valid parameters with durations greater than 10 minutes. The 3-parameter hyperbolic CP model (Morton) is deemed protocol independant.


1) Multi-duration testing : The established lab practice to model CP is done using several bouts of constant load exercise done at varying durations to failure over several days. These bouts are administered in random order and the recommended exercise duration to exhaustion range from 1-20 minutes. The time to exhaustion in these exercises is plotted power output. The hyperbolic 2-parameter Whipp model when fit through this data yields CP and W', where CP is the horizontal asymptote of the curve and W' is the area between the curve and CP which represents a fixed quantity of work that can be done above CP before approaching complete exhaustion. However, the choice of durations would need to be scrutinized to yield a critical power that resembles a severe intensity workload.

2) A 3 minute all out test (3MAOT) has been scientifically established to point towards critical power. The idea with this test is that it is possible to deplete W’ in reasonably short time. Therefore, the idea of the test is to perform work all-out in a span of 3 minutes and deplete W'. The last 30 seconds of the 3 min all out test is supposedly close to the critical power.

There are indications from the scientific community that the 3MAOT field test overestimates CP and underestimates W' so therefore, it is not a reliable measure of capacity in "well trained athletes".

CP calculated from a 3MAOT test. Source [4].


1) Training Prescription : Once the critical power (or critical speed) has been determined, training prescription can be designed for an athlete using a percentage of critical power.

As a start, the following training levels can be constructed. It is a good start, atleast in my experience.

Recovery : Best to go by perceived exertion
Endurance : 49-65% of CP
Tempo : 66-79% CP
Threshold : 80-92% of CP
Aerobic Power : 93-105% of CP
Anaerobic Capacity : > 105% of CP

These levels may have to be modified on a case by case basis depending on the critical power test and the athlete.

Pacing prescription may also be set for races where the use of running power is prevalent. A 10K race for a talented runner maybe targeted using 95-100% CP. A 5K race performance maybe targeted within a range of 100-105% CP. Again, experimentation is necessary with these ranges and no guidance can be offered set in stone, as courses are different and CP itself may exhibit small day-to-day variations.

For very long duration events, where it is known that CP actually decreases over time, it is not clear how effectively one could employ CP to set race prescription [12]. I suggest the use of a multi-pronged approach for marathons and ultra-marathons, involving the use of pace, heart rate and perceived exertion.

2) Predicting Time to Exhaustion :  A fairly fundamental application for the critical power (or velocity)  model is to help determine the time to exhaustion during work performed above CP.

With the simple 2 parameter hyperbolic form, time to exhaustion can be represented as :

Tlim = W′ /(P − CP)

As an example, setting W' = 20 KJ, CP = 250W, P = 300W :

Tlim = 20,000 J / (300W - 250W) = 400s = 6.66 minutes.

This way, the optimum time duration to cover a distance without "blowing up" can be predicted.

As one can see from the above example, any errors in the estimation of W' and CP translates to errors in the predicted time to exhaustion.

3) Use in Software : Nowadays, software can easily fit the 2 parameter model to mean maximal exercise data yielding all the parameters from the applied model.

Golden Cheetah is an open source software that does this. I will describe more on using Golden Cheetah and how it treats data in different models in another post, simply because the learning curve involved in using the software is high. However, some introductory tutorials on modeling CP using GC is shown here.

As of today, Mark Liveredge tells me that the upcoming version of GC will feature the ability to overlay several more models on data, which would help the practitioner assess which of the models converge the best.

Filipe Maturana, a PhD candidate, showed me an app developed on Shiny which allows you to model CP using a number of time to exhaustion trials. This would be a good model to play around with for the sheer educative value.

Apart from this, the 2 and 3 parameter model can be programmed in Microsoft Excel. I have been using such models for some time but do find using Excel cumbersome.


While there are several exercise concepts out there, the critical power model has been one of the most rigorously studied one in scientific literature.

In this post, only one form of this model - the hyperbolic 2 parameter model - was described in a somewhat broad manner. There are several other models including 3 parameter and extended CP models. In future, this post will be expanded to include a treatment of those other models.

The concern over test protocol, quality of data and error propagation carries across to any CP model. The practitioner must be careful in the use of these models to advise exercise prescription, specially to talented elite athletes. Lab based physiological profiles will be better suited to making informed decisions in these athletes.

However, in a vast majority of recreational athletes, proper use of the field based testing protocol and the modeling based on the data will yield a useful approximation of the endurance capacity of an individual. That it is conceptually the highest power output or speed at physiological steady state is useful in training prescription. Practitioners will also be pleased in utilizing a very scientifically vetted training concept.

What remains to be seen is how the critical power concept marries with the central nervous system theory of fatigue. That the ultimate limiter of exercise performance is not the muscle but the brain was introduced more than a century ago by scientists.

Implicit in the effectiveness of applying the critical power concept is this idea that the performance tat is analyzed must be the maximal in nature, implying that the central drive must be maximum for that performance. The role of motivation and internal drive is significant enough to warrant further investigations as part of the critical power concept.

Readers are advised to expand on their knowledge and read the papers referenced below.


1. Jones, A. M., Vanhatalo, A., Burnley, M., Morton, R. H., & Poole, D. C. (2010). Critical power: implications for determination of VO2max and exercise tolerance. Med Sci Sports Exerc, 42(10), 1876-90.

2. Brickley, G., Doust, J., & Williams, C. (2002). Physiological responses during exercise to exhaustion at critical power. European journal of applied physiology, 88(1-2), 146-151.

3. Langsetmo, I., Weigle, G. E., Fedde, M. R., Erickson, H. H., Barstow, T. J., & Poole, D. C. (1997). VO2 kinetics in the horse during moderate and heavy exercise. Journal of Applied Physiology, 83(4), 1235-1241

4. Miller, M. C., & Macdermid, P. W. (2015). Predictive validity of critical power, the onset of blood lactate and anaerobic capacity for cross-country mountain bike race performance. Sport Exerc Med Open J, 1(4), 105-110.

5. Morton, R.H. The critical power and related whole-body bioenergetic models. Eur J Appl Physiol 96, 339–354 (2006).

6. Vandewalle, Henry & Vautier, J-F & Kachouri, M & Lechevalier, J & Monod, H. (1997). Work-exhaustion time relationships and the critical power concept. A critical review. The Journal of sports medicine and physical fitness. 37. 89-102.

7. H. Monod & J. Scherrer (1965) The Work Capacity Of a Synergic Muscular Group, Ergonomics, 8:3, 329-338, DOI: 10.1080/00140136508930810

8. Mark Burnley & Andrew M. Jones (2018) Power–duration relationship: Physiology, fatigue, and the limits of human performance, European Journal of Sport Science, 18:1,
1-12, DOI: 10.1080/17461391.2016.1249524

9. Mattioni Maturana, Felipe & Fontana, Federico & Pogliaghi, Silvia & Passfield, Louis & Murias, Juan. (2017). Critical power: How different protocols and models affect its determination. Journal of Science and Medicine in Sport. 21. 10.1016/j.jsams.2017.11.015.

10. Puchowicz, Michael & Baker, Jonathan & Clarke, David. (2020). Development and field validation of an omni-domain power-duration model. Journal of Sports Sciences. 38. 1-13. 10.1080/02640414.2020.1735609.

11. Jones, Andrew & Burnley, Mark & Black, Matthew & Poole, David & Vanhatalo, Anni. (2019). The maximal metabolic steady state: redefining the ‘gold standard’. Physiological Reports. 7. 10.14814/phy2.14098.

12. Clark, Ida & Vanhatalo, Anni & Thompson, Christopher & Joseph, Charlotte & Black, Matthew & Blackwell, Jamie & Wylie, Lee & Tan, Rachel & Bailey, Stephen & Wilkins, Brad & Kirby, Brett & Jones, Andrew. (2019). Dynamics of the power-duration relationship during prolonged endurance exercise and influence of carbohydrate ingestion. Journal of Applied Physiology. 127. 10.1152/japplphysiol.00207.2019.

Sunday, June 16, 2019

The Poor Man's Giro : Amateur Science in a GT Mimicry

Early in May, I set out to do something in the name of science. I'd read about the physical demands of grand tour racing in research papers and wondered what that would translate to for an amateur rider who works 5 days a week in a day job.

The idea was simple. I set out to ride roughly 1/8th the daily distances in the 2019 Giro d'Italia. Each ride tried to capture the intent and spirit of the pro rides.

For example, if one day was the ITT, I'd go out and smash a little ITT of my own. If there was a mountain stage, then I'd go out and do some hill repeats (we do not have mountain passes in Abu Dhabi ! ). If the ride called for a flat stage, I'd go out on a 40km ride and end with a "solo sprint".

The challenge was called "Poor Man's Giro". I even created a little flyer for it and shared it on Twitter with the likes of sports analytics guru Alan Couzens and exercise scientist Stephen Seiler. 

All rides were attempted in the searing heat of Abu Dhabi. The rides were supported by nutrition from Secret Training U.A.E.

Fig 1 : Too poor to be a pro and ride a Grand Tour? The Poor Man's challenge is an answer!

Now, I faced a few challenges which I need to declare before we get started. Namely :

1) In addition to my day job, I also coach a running club and so squeezing in rides everyday became a challenge.

2) I went on vacation round about the 20th pro stage so I ended up completing just 18 "stages".

3) A couple of rides had to be done indoors on a Cybex ergometer.

4) I took one rest day more than necessary. It was inevitable. Too busy to squeeze a ride in one or two occasions.

5) My time trial bike was not fitted with a power meter so power output wasn't captured for two TT's.

95% of the rides were done on a Colnago C40 road bike outfitted with a Powertap powermeter to capture the workload. Daily rides were uploaded into Strava and synced with GC to power the analytics.

Data Results

Below is the data from 18 rides. BikeStress is GC's implementation of Training Peak's TSS when they got rid of the "TSS" trademark from their software. TRIMPS have been calculated most likely using zonal points. IsoPower is GC's implementation of TrainingPeak's NP, again after getting rid of trademarked metrics.

Fig 2 : Ride, workload and stress parameters  from each day's ride of the Poor Man's Giro

To add a little bit of extra science to the investigation, daily HR and HRV related parameters were measured using a Faros ECG device hooked up to a Polar H10 chest strap. Protocol followed was 5 min supine-standing orthostatic format.

All the data was analyzed in Kubios to extract the mathematical nature of sympathetic and parasympathetic function. A self coded script threw the data onto a spreadsheet and automatically plotted the variables.

Fig 3 : Sets of plots showing the HR/HRV related parameters for the duration of Poor Man's Giro

Discussion of Results

We understand from the Grand Tours research done by Sanders that the stress associated with a time trial (TT) as a function of distance is the highest among flat (FLAT), semi-mountaineous (SMT) and mountain stages (MT).

The authors find a typical average TT speed of 36.5 +- 12.9 kph, an average power output of 371 W at 177+/ 10 bpm, TRIMPS of 33 +/ 32 AU and a TSS of 62 +/ 32 AU. That translates to a TRIMPS/km = 3.39 +/- 1.39 and a TSS/km = 3.39 +/ 0.17 AU/km.

The table 2 from their research paper is very instructive of the performance parameters across the spectrum of stages. Borrowed and pasted below for quick reference.

Fig 4 : Typical performance characteristics from Grand Tours from Time Trials (TT), Flats (FLAT), semi-mountaineous (SMT) and mountain stages (MT).  

This can be compared to my own ride characteristics from Fig 2.

Time Trials : Agreeing with the research, the RPE associated with a solo TT is high, around 8.5-9. TRIMP points are 62 vs 58 (mine) which translates to a TRIMPS/km of between 4-5. This is the highest among all rided that I attempted.

Flat Stages : Agreeing with the research, the RPE associated with a flat stage is around 5 (pro =5.8). TRIMP points are 298 vs 94 (mine) which tranlates to around 1/3rd the heart related stress mainly due to the reduction in distance attempted.   This translates to a TRIMPS/km of around 2 (pro = 1.55).  Power output is around 137 W average giving an average TSS/km of 2.9-3 (pro = 1.14). I presume pros show a lesser power related stress per km riding such long stages due to the draft effect.

Mountain Stages : The ride done on May 25 is a perfect example of a flat ride ending with several hill repeats to mimic the feel of climbing a mountain. The TSS/km and TRIMPS/km came out to 3.8 and 3 respectively, compared to the pro stats of 1.97 and 2.1 AU/km. So the stress was a bit greater on my part, and I probably intentionally made it that way when thinking about climbing.

Daily Accumulation Rates : For 3 weeks, the accumulation of stress was as follows :

The sum total of TRIMPS gained over 18 stages = 1937 AU = 108 TRIMPS/day.

The same for TSS (aka BikeStress in GC language) = 1386 AU = 77 TSS/day.

Total workload = 6467 KJ, with an accumulation rate = 359 KJ/day.

Daily HR and HRV related fatigue : The days after the hardest rides (TT's and MTs) on 11th, 18th  and 28th May respectively show significant drops in time related HRV parameters such as rMSSD and conversely  high supine resting pulses. 

Although all these parameters showed cyclical variations day in and day out, one standout feature was the steady rise in chronic HRV and the steady drop in chronic resting heart rates over the course of 18 days (chronic = long term).

Infact, the drop in resting heart rate, when compared to similar data from the beginning of year show the difference very clearly. The long term difference seems to be a decrease of around 5 beats/min compared to the period prior to starting this mini challenge.

Fig 5 : Highlighted section showing the supine resting heart rate (daily acute and chronic over 7 days) compared with data from March 2019. 


Keeping with the spirit of amateur scientific investigation, an 18 day grand tour was mimicked during the period of the 2019 Giro d'Italia. Despite the limitations of a decreased work load, the aim of trying and matching atleast 1/8th the distance was more or less accomplished.

From the data. I conclude that heart related fitness parameters improved during those days, which shows the effect of a 108 TRIMPS/day and 77 TSS/day loading pattern. However, the data doesn't show the "delayed" effect of improvement that must have come +1 or +2 weeks after the 3 week training was concluded.

I hope to expand on this research during the period of the Tour de France. If you wish to join me in a Poor Man's TDF, please join !  Let's learn together. I can be found on Twitter.

*  *  *

Sunday, March 31, 2019

Machine Learning and Learning Humans

Perhaps I'm behind the times, but the field of 'machine learning' is all the rage these days. I only purport to know what it's all about from simple definitions found on the internet.

What I do understand is that 'Machine Learning' is a sub-field in the broad world of what's termed artificial intelligence. Using tools to teach artificial machines to automatically learn and improve their experiential knowledge based on collections of data sounds exciting and promising.

But do we really know how humans reason? At best, what we have are models of how humans are supposed to think intelligently. And perhaps more correctly, research has a model(s) of how a sub-set of humans from this planet are supposed to think 'intelligently' and make decisions on a daily basis. In other words, everything we know about what humans know about intelligent thinking is from a pool of subjects that volunteer to participate in research. Is my thinking far fetched?

Now, do humans need formal rules to make inferences? If Carly knows that chicken pox is associated with dark spots on the skin and that Jim has dark spots, she infers that Jim might have chicken pox. Did this conclusion require logic? No. It is entirely possible Carly used the content of the sentences to make a deduction, to imagine possibilities. 

The news media lately has been filled with humans trying to understand 'difficult, complex' topics, topics we have no precedent to learn from or use to navigate to a solution.

For instance, Brexiteers have little clue how to get Britain out of the European Union without incurring a series of dark uncertainties few really know about. Flight accident investigators scramble for answers how airplanes, an electronic 'thinking' machine made by humans, nose dived twice into the ground killing over 300 people in two separate instances less than 6 months apart. Separately, safety experts sing positive songs over completely automating speed limits in cars by 2020. We want to try and wrest control out of the human being, because ... it must be exciting.

Others look for clues on the ground explaining the precise moments of a meteor impact that apparently led to the disappearance of dinosaurs. This is another interesting piece of development and I wonder whether any machines were truly involved in this study. Why would you need a machine to study this issue anyway?

News stories show the complexities behind real learning, real decision making.  Can machines really imagine possibilities using content and 'meanings' behind that might lead to reasonings based outside logic? And do we know enough of how humans make meaning to data in examples not needing logic before we take it as a given that machines can 'learn' the same things too, if we only force them to think in certain ways. Are explorations in these two fields - human learning, and machine learning, going in parallel and feed into each other? 

What do we not know about humans that we don't put into machines, which eventually might lead to the creation of what essentially are incomplete models of humans? 

We try to mimic decision making in 'artificial intelligence' based on a limited set of knowledge we have about humans. The biases in that knowledge forms the underbelly of 'machine intelligence' we will have in our transportation systems, our appliances, and perhaps even in the robot that will help deliver your baby tomorrow. Aldoux Huxley's 'brave new world' is really an uncertain world. 

*  *  *

Monday, December 24, 2018

Surface Related VO2 Changes Not Reflected In Stryd Power - An Examination of Aubrey

In a recent paper by Aubrey published in the Journal of Strength and Conditioning Research, significant differences in oxygen consumption were reported from treadmill to overground running without evidence of a corresponding change in Stryd reported running power. Details can be found here

Two quick pieces of summary : 

a) The main bit of detail is that there was a significant change in VO2 not reflected in corresponding power readings between treadmill and overground running.

b) The other statement made in the paper is that a weak correlation was found between oxygen cost and power:weight ratio across all runners, elite or recreational, suggesting that "running power as assessed with the Stryd Power Meter, is not a great reflection of the metabolic demand of running in a mixed ability population of runners".

In a rebuttal of point b) in the paper, Dr. Snyder from Stryd accused the authors of "fatal methodological flaws" when they chose to normalize both metabolic rate and power/weight ratio with speed while pointing to a weak correlation between the two variables (r = 0.29, p = 0.02). 

Dr. Snyder's rebuttals are examined with the help of data from our old friends, Dutch researchers from the Secret of Running group. From their blog, I extracted mean VO2 and mean power/weight ratios from treadmill testing belonging to a subset of 6 runners in random fashion.

Statement 1 : Rate of oxygen consumption is approximately proportional to speed (linearly dependent upon speed with a y-intercept of close to zero) across both elite and recreational runners (Batliner et al., 2018). This means all values for the rate of oxygen consumption measure when normalized by speed (otherwise known as cost of transport)* will be approximately constant, giving virtually no variation in these values other than that due to noise or subject variation. Therefore, regardless of Stryd power’s dependence upon speed, no correlation would be expected between the normalized measures. 

Aubrey normalized metabolic rate in ml/kg/min with speed measured in m/s. Such a division does not result automatically in the oxygen cost of transport. 

Infact, the actual formula is :

Oxygen cost = 60/3.6*VO(ml O2/kg/min)/v (m/s)  --- 1)

So by calling this normalization "cost of transport", Dr. Snyder is not dimensionally correct because the x-axis in the Aubrey paper shows values ranging from 9 to 17 (see Figure 1 in their paper). Such low double digit values cannot align with the oxygen cost of running, which is in the triple digits.  

Behavior of oxygen consumption with speed can be examined from the data of Secret of Running. The plot in Fig.1 shows that for 6 different subjects, metabolic rate is mostly linearly proportional to speed. 

Fig 1 : Metabolic rate vs running speed measured in 6 subjects. Source of data : Secret of Running (Dijk, Megen). 

Converting these values to an oxygen cost of running with the appropriate formula in 1) transforms the plot into the following plot in Fig.2. As Dr. Snyder states, the linear relationship between speed and oxygen consumption becomes nearly constant save for noise and subject variation. Infact, when looking at this plot, the data looks less noisy for some runners (4,5,6) and more noisy for others (1,2,3). What is the source of this variation? Some explanation would be good. 

Fig 2 : Oxygen cost of running vs running speed in 6 subjects. Source of data : Secret of Running (Dijk, Megen). 

What does research say about this relationship? According to the plot in Fig.3, there is a "general absence" of a change in oxygen cost as running speed increases.  However, because of the noise from the Stryd sensor, this constant relationship is not exactly seen.

From looking at Fig.2, we cannot make the claim that some individuals somehow magically reduce their oxygen cost as speed increases. The fundamental source of these fluctuations appear to be noise. It is precisely this noisy bit that requires further examination if such devices are to be applied among elite runners as a "surrogate" measure of oxygen cost. Not that I didn't warn about it on the Stryd Facebook page many moons ago

Fig 3 : Oxygen cost in three different population groups

Statement 2 : Stryd power’s strong linear correlation with rate of oxygen consumption, however, indicates increasing Stryd power with increasing speed, meaning any variability would be reduced by normalization with speed. Thus, any correlation whatsoever between the normalized measures would be small and due to chance, unaccounted for nonlinearities, or subject variation, not the dominant linear relation with speed that underlies both non-normalized measures.

The relationship between Stryd power/weight ratio and treadmill speed can be examined in the Secret of Running data. By way of algorithmic implementation, Stryd power/weight in strongly linear in speed (Fig.4). But on closer inspection, not all subjects show linear proportionality. Infact, in this data, there doesn't appear to be anything close to perfectly linear relationship. Almost all datapoints show a wavy pattern. 

Some appear comical. Subject 6 shows markedly high power ramp beween 15 and 16 kph compared to that between 16 and 17kph. 

Subject 1 on the other hand exhibits something that looks like a curvilinear relationship. 

What is the cause of these artifacts? 

There is no reason why some runners should take more effort to "jump" between two speeds compared to other speeds. The treadmill test is a continuously administered test with no "breaks" in between each speed. The other explanation could be the choice of value of VO2. It hasn't been explained by the authors of Secret of Running on what basis they chose steady state values. Were some intervals shorter than others, affecting the average of VO2 in that interval? 

Fig 4 : Stryd power/weight ratio vs running speed in 6 subjects. Source of data : Secret of Running (Dijk, Megen).
Normalizing the power/weight values by speed will dimensionally yield the energy cost of running through the formula :

ECOR (kJ/kg/km) = P (Watt/kg)/v (m/s)  ---- 2)

When the above data is normalized by speed using the expression in 2), we get the following plot. Again, due to random variations in the Stryd data, none of the subjects show a constancy in energy cost of running.

Fig. 5 : ECOR (calculated) in 6 subjects. Source of data : Secret of Running (Dijk, Megen). 

The authors in Secret of Running have argued that the differences in ECOR among runners is of a fundamental nature due to some being more experienced and more "efficient" than others. They suggest in their literature and books that it is important to reduce ECOR and that the Stryd powermeter is sensitive enough to measure ECOR. 

However, I challenge this idea. I suggest that these authors re-examine if changes in ECOR are really due to training status and running experience or simply due to random variations in the data as Fig. 5 and Dr. Snyder's assertion shows! Otherwise, different interpretations from different people appear to conflict. 

Statement(s) 3 : [...] there is still a very strong linear relationship between the rate of oxygen consumption values and the Stryd power values. This strong dependence is obviously significantly reduced when these values are normalized by speed, giving a value only slightly larger than that found in the paper.
[....]Stryd power data are tailored to the individual, with power calculations being performed using input data for each specific subject, not across subjects. Therefore, if one were to actually validate Stryd power’s values as a training metric, as the paper’s title implies, correlation coefficients between rate of oxygen consumption and Stryd power should only be performed on a subject-by-subject basis

In keeping with Dr. Snyder's advice of analying Stryd data strictly on a subject-by-subject basis, I plot W/kg and metabolic rate of individual subjects separately on one plot and examine the strength of trendline linearity (Fig 6). Each subject's trendline and co-efficient of determination is shown. The plot shows that changes in metabolic rate explain anywhere from 96% to 98% of the variation in W/kg. The relationship is strong but far from 100%. It also shows a similar picture to the data I have collected from my own laboratory VO2max testing

Fig. 5 : Energy cost lof running (calculated) vs cxygen cost of running (calculated).  Source of data : Secret of Running (Dijk, Megen). 

What Aubrey did in their paper (Figure 1) was pool all runner's data together by normalizing the metabolic rate and power/weight ratio by running speed. If we do the same for dataset from Secret of Running, all relationships are blunted and the plot essentially becomes a scatter of points (Fig.6). 

Fig 6 : Normalized specific power vs normalized VO2.  Source of data : Secret of Running (Dijk, Megen). 

So the methodological error explained by Dr. Snyder seems to be correct. Aubrey must explain why they took this approach and on which former pieces of literature they borrowed this kind of analysis.

Stryd power and VO2 show a significant linear relationship. This relationship is pegged in two ways. 

1) The Stryd powermeter, by way of algorithm, reports increased watts with increased running speed on flat surfaces. 

2) By VO2 being positively proportional to speed on flat land running.

Statement 4 : Data collection methods are not consistent across surfaces, making effective comparison across surfaces impossible.

That Aubrey didn't fully explain data collection methods is a genuine accusation. However, on the same token, articles published by Secret of Running that were used in the chain emailing marketing efforts by Stryd also lacked tremendous clarity on how the authors conducted the tests. 

For example, the authors Dijk & Megen stated that the energy cost of running increases uphill. The exact magnitude of the increase is in question. Is the nature of the specific increase just due to how the numerator in the algorithm (W/kg) is scaled to increase faster than the denominator (speed) and on what basis were the scaling factors decided?  The correlational aspects of Stryd power and above ground gradient running is left to be explored and explained in scientific literature.


The methodological "fatal flaw" explained by Dr. Snyder in the Aubrey paper seems to be correct. Aubrey must explain why they took this approach and on which former pieces of literature they borrowed this kind of analysis from. A proper explanation for this choice is desired.

On cross-examining statements made with other data from Secret of Running group, Stryd power to weight ratio has a significant positively proportional relationship with speed. However, the data is not exactly linear, more wavy due to the presence of random variations and subject related issues and the slope of a linear trend line varies with subject.  

Both the energy cost of running and the oxygen cost of running calculated by normalizing power/weight ratio and metabolic rate by speed respectively are not exactly constant when seen in practice. Constancy is shown in literature but real data appears wavy, sometimes monotonically decreasing in certain runners. This maybe due to random errors in the sensor  and variations in sensor placement as well as experimental issues in the VO2 data but these facts needs to be appreciated. 

Therefore, the Stryd as a powermeter must be used to make assertions about metabolic fitness only within subjects, as oppoed to across subjects.

If we assume for a moment that Aubrey indeed did due diligence and considered steady state VO2 values across both treadmill and above ground running, the Stryd research team has left some explaining to do why the observed differences in oxygen cost did not reflect in a corresponding difference in Stryd power. At the heart of this explanation lies several extrapolations various people are making on the internet about energy cost of running, running efficiency and oxygen economy, all on the basis of algorithms and no direct measurements of force or power. 

Sunday, December 23, 2018

Examination of the Link Between Oxygen Uptake (VO2) and Stryd Run Power

Footpods utilizing 3D inertial measurement units to calculate external running power have been discussed previously on my blog several times. 

One of the purported advantages touted by product developers is the ability of the running "power meter" to track and inform about instantaneous metabolic rate (VO2). With the Stryd power pod, the existing support for this position has been that running power and VO2 are linearly proportional. 

Infact, a linear relationship has been shown on my blog earlier from a single VO2max test when we look at steady state values. But since the time I wrote it, I have gathered more data in order to re-examine the nature of this relationship in light of fitness changes in the body. 


I completed two VO2max tests in a running laboratory a year apart in 2017 and 2018. Both tests were conducted by an experienced consultant who is also a PhD in Physiology & Exercise Sciences. Name withheld. 

On both tests, I wore a Stryd footpod on my shoes and ran with a self-selected cadence. Key information : I also wore different shoes but the position of the pods themselves were standardized by mounting on the second criss-cross lacing from bottom. In 2017, I wore a Mizuno Ronin 5 and in 2018, I wore a Mizuno Sonic.

Treadmill grade was set to 1% and speed was increased by 2kph every 2 minutes until complete exhaustion. In 2017, I exhausted at 16kph. In 2018, I was fitter and exhausted at 18kph. 

There was no change in equipment - treadmill, masks or gas analyzers, heart rate chest strap and metabolic carts - used between the two tests. Physiological variables that changed were my body weight and running fitness between the two periods. I was 64kg in 2017 and just shy of 61kg in 2018.

I ran my personal best 10K of 41 minutes in January 2018 and posted several track PR's in the later months. Compared to 2017, actual performance data indicated increased running fitness. 

By special request, I gained all the raw data from both tests corresponding to several variables measured during the test for my own record.

Summary of Results

A 30s rolling average of weight normalized metabolic rate and the corresponding instantaneous heart rate against time are shown in separate plots below (Figs. 1, 2). Tabulated data shows that in 2018, I had significantly lower heart rates to achieve similar running speeds on the treadmill. I was fit enough to run into the 18kph territory and extended my time to exhaustion by a whopping 3 minutes. 

The VO2 trace on the other hand shows an increase in oxygen consumption in 2018 with a corresponding increase in power to weight ratio. The differences are significant. For example, at 16kph, the difference in oxygen consumption between both years are significant (p less than 0.05, f=68.96).

The strength of the correlation between oxygen demand and Stryd power weakened between 2017 and 2018, going from 99% in the former to being able to explain 96% of the variance in the latter. The particular relationship between 2018 oxygen consumption and power seems not exactly linear (Fig. 3).  

Fig 1 : Tabulated summary showing VO2, Stryd power and corresponding heart rate for 6 different speed regimes.

Fig 2 : VO2 and heart rate - time traces compared between two years.

Fig 3 : Strength of correlation between VO2 and Stryd power to weight ratio in two tests.

The specific percentage changes at each speed is shown for VO2 and power:weight ratio (Figs. 4, 5). Instantaneous VO2 measured by a metabolic cart is a scatter of points before achieving steady state so a boxplot of distribution is shown with the median value being used to calculate % changes. The same has been done for Stryd power. Outliers are also shown but median values are not affected by outliers.

Fig 4 : Comparison of VO2 distribution

Fig 5 : Comparison of Stryd power:weight ratio


Shown above is two VO2max tests done within a year and a few days. On both tests, I wore a Stryd footpod on two different shoes. 

Specific discussion points are as follows. Note :

1) The correlation between Stryd power to weight and lab tested VO2 is strong, however the degree of the correlation weakens from 2017 to 2018. The reported requirement for higher power to weight ratios and decreased economy for the same speeds conflicts with the lowered heart rate data and the increased time to exhaustion and higher speed attained on the second test.  In other words, one set of data indicating worsened power-speed efficiency appears to conflict with the actual performance on the test. Interpretations are open.

2) The boxplot distribution of VO2 at specific speeds are wide ranging and show the organic nature of oxygen rate according to the interval timing, run mechanics and the usage of elastic structures in the body. The boxplot distribution of algorthmic watts on the other hand is tight, which might potentially mislead when interpreting which value of run power corresponds to what oxygen demand. Therefore, caution must be exercised when comparing athlete(s) on the basis of run power to make value judgments of economical running. What is certain here is that Stryd power should be stated to be proportional only to steady state values of VO2, not transient data. If for example, a runner would run outdoors in heat conditions with a slowly rising component of VO2 which is a completely organic way the body functions, the meaning of the correlation of  VO2 and Stryd power measured in one set of controlled conditions is lost in another. 

3) The substantial decrease in heart rates to run the same speeds during the test show increased cardiovascular fitness. This correlates very well with the Polar Run Index recorded with Polar V800 for a period of 365 days between March 2017 and March 2018 (Fig. 6). In fact, around the January 2018 time frame, I'd been posting Run Indices in the 58-59 range which predicts my 5K/10K times within a margin of 1-2 minutes compared to actual performance. 

Fig 6 : Author's Polar Run Index time series scatter obtained from Polar Flow for a period of 365 days from March 2017 - March 2018

4) An inspection of preferred cadences on the two tests indicates non-signficant differences. The changes in cadence could not possibly explain the increased metabolic rate.

Fig 7 : Chosen stride rates between two VO2 tests conducted in 2017 and 2018.

5) An inspection of the speed error (device speed minus target belt speed) between the two years show increased error in the second year but within 2%. The reason for the increased error is not known, as calibration factors were not changed within the footpod. 

Fig 8 : Computed % error in run speed = 100 x (Device measured speed - Belt Speed)/(Belt Speed)  

6) The main variables that changed between the two tests were fitness, weight and the shoes worn. There is a possibility that simply wearing the meter on different shoes gave different readings but logically there is no reason why this should be so. However, on the Stryd forums, a variability in power measurements due to variations in mounting has been reported by users. 

7) Interpretations should be kept in context of sample size (n=1), the period of time between the two tests in which many things not accounted for may have changed (systematic changes in sensors, stiffness between shoe and treadmill interface, motivation, hydration status, calibration error).

Other Studies

1) In an outdoor setting, Aubrey et. al found statistically strong differences in oxygen consumption between different running surfaces that were not reflected in the strength of the differences in Stryd power to weight ratio (Aubrey, 2018). The device used was the first gen Stryd power meter worn on the chest. 

2) In an indoor study studying the influence of a change in cadence on running economy and Stryd power in competitive collegiate runners, investigators found that only 31% of the variability in running economy coudd be explained by power (Austin, 2018). They cautioned that the Stryd's power measures may not be sufficiently accurate to estimate differences in running economy of competitive runners. The device used was the second gen Stryd power meter worn on the shoe as a footpod. 


A positive correlation exists between Stryd power and metabolic demand IN STEADY STATE. However, in light of the reported case here and the two other peer reviewed and published studies, caution must be exercised when applying Stryd power for metabolic profiling specifically due to points explored above. The value of a footpod powermeter to inform about "real time" metabolic demand in situations where minute but critical transient VO2 changes might be prevelant  is suspect.  

The true accuracy of this relationship is unknown in a large sample of runners in different environmental conditions as found in real world running. Interventions in running , such as change in shoes, change of mechanics, circadian rythms, travel fatigue etc may reflect in VO2 but not in run power. This is a hypothesis, some of which is just starting to be shown in the research community. We hope the research community can come forward with more topic ideas and explorations.

As reported here, a worsened power-speed efficiency did not correlate with the increased time to exhaustion, higher speeds and better heart rate fitness achieved in the second VO2 test. This study shows there is both teneble and actionable value in longitudinal heart rate monitoring over long periods of time. Conventional measures such as heart rate is not superceded or replaced by running power meters but should be considered an essential ingredient of a holistic performance monitoring approach. 


Austin, C., Hokanson, J., McGinnis, P., & Patrick, S. (2018). The Relationship between Running Power and Running Economy in Well-Trained Distance Runners. Sports, 6(4), 142.

Rachel Aubry, Geoff Power, J. B. (2018). An Assessment of Running Power as a Training Metric for Elite and Recreational Runners. Journal of Strength and Conditioning Research, 32(8), 2258–2264.

Polar Run Index Table 

Sunday, November 18, 2018

GPS Inaccuracy is a Non-Problem

There are those who say running is not a skill. Sure, unlike soccer or archery, it may not need massive amounts of skill but the ability to pace by the internal "calibrator" in your head is absolulely a learned skill. That takes long hours of practice and generous amounts of emotional intelligence. Some people have more of EI than others. Perhaps women are better long distance pacers for this reason? The debate continues.

The other day, I ran a relatively decent 10K with a simple tried-and-true method I always employ : hit kilometer landmarks at specific times. The race, an annual staple in the Abu Dhabi calender, is not AIIMS certified, but is run on a course that is reliable enough for most of us 8am-5pm working animals. The course is also an easy out and back with stretches of long road and one roundabout so the effect of loops and not running tangets around those loops is absolutely minimal. 

The trusted V800 GPS on my wrist always goes as a supplement, never a primary mode of pacing. Not surprisingly, the device would beep the kilometer split on-point in the beginning  stretches of the race (corresponding to the position of kilometer signage) but as the race progressed, anywhere between 10-20 metres before the marked landmark.

It's important to put this into perspective. At my running speed, the watch beeped 3-5 seconds before the actual km marker.  Over the course of 42:08 minutes, I ran 10.17km according to the watch but the race distance was reported to be 10km.  In other words, assuming that the course was marked out correctly, the receiver on my wrist relying on a system of 24 global positioning satellites in orbit would under-report distance by 1-2%. 

Is that really something to make a big hoopla about?

Don't Fuss, We're Finely Tuned Machines

An experienced 10K road runner running would be consistently pacing within 1-5% of previous timings from race to race. They really are fined tuned machines.  They already an ingrained sense of pace from long hours of training and racing. The GPS doesn't come to much benefit except to help assess whether they are roughly where they need to be. 

A beginner road marathoner on the other hand might be more reliant on the GPS. They feel they need the training wheel to help guide them along, perhaps more out of a sense of anxiousness that anything can go wrong on such a long distance if they were off from where they need to be. 

I argue that even these second class of individuals don't really need to depend primarily on GPS pace. With lots of hours of correct training, the human brain learns the forces and patterns of a marathon pace most comfortable and sustainable for a period of 2-4 hours. The primary reason for the trepidation in these runners is lack of adequate training. It's not GPS thats the problem.

Get a Hold of Precision, not Inaccuracy

In a review of a ridiculous measurement of the same segment of road measuring 10km around 1000 times by GPS, the German mathematician Helmut Winter (who was also responsible for creating the timing systems in Kipchoge's world record Berlin marathon) wrote on his blog : "The most important result of the analyses was a standard deviation of the distribution of about 2 m for a total distance of 10,000 m, ie a relative dispersion of the data of about 0.2 per thousand. The deviation from the mean of the measured distance was less than 10 cm in the regime."

Even during training, I argue that a minor device deviation is a non-factor if you knew that it was precisely off everytime. 

For example, if the watch says you run 7:57 min/mile but you covered really only about 3.75 miles in 30 minutes, you know that you really ran 8:00 min/mile so the watch over-estimated pace by about 3s/mile everytime. Over the course of a 3:30:00 marathon, the actual difference between what you actually ran and what the watch says you ran is a mere 150-200m.

On race day, even with tired bodies and weather fluctuations, such a runner can turn to the biological calibrator as primary guide and use a supplemental strategy of running every mile 3s/mile faster than what the watch should actually say in order to accomodate for the margin of error.

Physiology is Not That Fussy

What about those who think if you don't hit training paces point blank, the sky will come crashing down?

Physiological reality is that there is an upper bound and lower bound to most training zones. A 20s/mile tolerance band to a threshold zone would be considerably more than the 3s/mile deviation in your GPS. Moreover, it is far better to incorporate multipace training to get your feet wet and learn different aspects of the water being tested.


We forget that the point of training is to roughly hit the bullseye everytime and get on with life. Multipace training was how the Olympic stars of previous years broke world records! Instead, some hobby runners today want military grade accuracy, perhaps to land a missile in a specific spot of an ocean somewhere with a $500 watch. They can't sleep if device reported distance was off by 2%.

My argument is that trained humans are fined tuned machines to begin with. Distance road runners (which comprise probably 80-90% of the running population) can gain a ingrained sense of sustainable pace from long hours of training.

GPS inaccuracy is really a non-problem. What is a problem is that it is turned into a problem by those looking to dip into your pocket while marketing their own product. And one has to be wary about such hidden agendas.