Why am I sharing this, and why might another rider want to read this?
I'm a (modestly) credentialed, (modestly) working scientist with no present affiliation to the any part of the bike industry. Teaching quantitative reasoning and technical communication skills is the lion's share of what I do for work. I'm not a 'sci-comm' professional trying to drive attention toward a podcast about bikes or a bike youtuber trying to grab attention by deploying keywords like data or science. I don't have a vested interest in either bike setup being faster than the other. So let this be my attempt at a guide by example to reading bike science better or making bike science better. Here is a data set which I measured myself. This means I can share the methodology behind the data and help to assess the strength of any inferences or insights we may make from the data.
I'm not going to reveal at the end that Setup A and Setup B are the first and second most expensive forks on the market. It's not a review. I ride two bikes and neither has that stuff. They are different. But how different? How do we talk in a way that is not bogus or hyperbolic about differences? Read on!
How were these data collected?
Chip timers were attached to the fork of each bike A and B for the duration of the test. This timing system works on a predetermined track segment and measured elapsed time between a static start and the bike passing by the ending sensor, similar a chip-timed DH race run between the start gate and the first split. The static start eliminated needing to standardize the entry speed to the segment. Lap times for the segment were collected across numerous days in 2024, with a session of data collection following the schedule ABABAB for six total laps timed per day of riding. The complete schedule of six laps wasn't possible to record some test days, and these days were noted for possible exclusion from the analysis. Laps that failed for any reason to be a full effort start-to-finish by the test rider were noted for possible exclusion from the analysis. Further details of data cleaning will be discussed later.
What questions have been asked of the data so far?
To what extent are Bike Setup A and Bike Setup B different as measured by lap times?
To what extent does lap order within a session matter as measured by lap times?
How much of the difference in lap times can be attributed to the difference in bike setup A or B, versus other non-equipment factors not controlled in the experiment?
Let's take a first look at the data. This is a graph specifically formatted for readability of the raw data. They are spread out in space in a way that makes each point visible but distorts one axis. A lot of caution should be taken making inferences for reasons I'll try to cover, but this format is better in many ways than showing a large table to provide a look at the entire, raw, set of data.
Looking down at the unedited big picture, let's see what we can see.
This is wild! That's a lot of practice on the same trail. Betting you have every rock, bump, and line burned in your memory now.
If I'm eyeballing lines of best fit for A and B for the graph you shared, it seems like A starts out slower than B, then they converge to be equivalent by the end of June through the end of the test in November. At the widest point of variance between A and B, it's about 3 seconds (46-ish seconds per lap for A vs. 49 seconds for B, eyeballing lines of best fit). Which is not nothing, that's a 6+% difference. And n=70 for each set is pretty decent!
Now tell me why I'm wrong and all my assumptions are stupid.
[Edit: B is slower than A in the graph, duh, my bad]
this is awesome!
You were consistently slower on B through May and had a wider variation of times compared to setup A. In June, the times started looking much similar. Setup A remained consistent with less variation throughout the whole period. It points that setup B was likely newer and less familiar as it got better, while A remained about the same pointing to a setup you were previously familiar with. Setup B you got about 10% faster with throughout the year, while A looks to remain flat and both look about the same time by end of year.
With the raw data I would break it down by Min, Max, Median, Average by month. My assumption is that your variance from Min to max decreased more on setup B from May to October. I'd be curious to know what one was better in October by those aggregations as they look about the same visually and its hard to differentiate.
It has been a while since I did much statistics and I never fancied it much, but if you do a z-test what do you get?
Oh man! This initially seems perfect for a paired t-test as your methods are going to allow you to control for changing conditions across days! However, the trend toward convergence across time does make me question the assumption of a “random sample of days.”
Specifically, I am curious if bike setup A was your standard setup going into the test (or closer to it), and if we are seeing how long it takes for you to reach your peak pace on a new bike. Alternatively, are we seeing the gap close because the transition into summer and fall reduces some sort of trail conditions-related benefit present in the winter and spring (seasonality)?
Looking forward to following this thread! I love when I get to combine stats and mtb!
Two-Way ANOVA (predict lap times according to factors of bike and date) and include an interaction effect. Then if present you can test the slope of the regression lines to determine the rate of change in lap times differs between bikes (Team Robots) notes above!
What was the difference between bike A and B? At the end of May/start of June you found something with bike set B. But, interestingly, on an average day you rode better on bike setup A. it would be interesting to hear what you were testing or trying?
Based on the fact that the only portion of the data that could maybe be considered significantly different is winter through early spring, it would actually be interesting to look at how conditions may factor in. I could see there being a chance that setup A is only faster in wet cold months and otherwise essentially the same.
MaxxTerra on Bike B, MaxxGrip on Bike A. Rider got faster with practice and Bike A tyres wore out.
Would you be willing to send a file with the raw data? I am an Engineer in the industry and would like a stab at doing some analysis on that to maybe answer some questions and pull-out contributing factors.
Nice to see the nerds are out in force. Fun topic and data set to discuss, so thanks for that.
I deal with these sorts of "basic" data sets and the questions that come with them regularly at the World Cups as that is a big part of my role as coach at the races. The general lack of rigour and understanding of causality, causation, error, trial-to-trial variability etc.. at the races and generally in the bike industry is pretty surprising. But sadly, strict materialist/reductionist/mechanistic philosophies dominate without any awareness of that domination for a lot of people
I won't comment much on applying classical statistical analysis to the data set provided, as others have done a great job already unpacking potential ways to investigate and infer from the data set. Of note though is the seeming lack of an initial commitment to either inductive, deductive or even abductive reasoning? I think this is key to lay out first, as otherwise some HARKing may happen (as it almost always does at the races and in the bike industry).
As we are already assuming this is Northern hemisphere data and as such the weather/seasons are playing a big role in the "learning effect" on display in both set-ups, a further "descriptive conversion" of the data would be really helpful. For example, knowing certain track characteristics and how they might interact with the set-ups and weather/seasonal changes would be highly valuable. The positivistic tradition of the "data will know best" regardless of context is seldom tenable in the non-linear interactions of complex systems.
Your question around trial order is really interesting though, honestly impressive you stuck to the ABAB order. The motor learning and control research is pretty dense with investigations into the impact of random, blocked, mixed practice orders etc... and again the bike industry in general is not aware of the centuries old research in this area. Sadly most of the research is around key pressing exercises in labs, or if it is sport related then it's often targeting/aiming or ballistic tasks like volleyball shots. So hard to extrapolate to something as variable as MTB.
What is of note though, is that your ABAB order or an AABBAABB order is usually the most common when testing things (bikes/tyres/parts), this comes again with the assumption of linearity and mechanistic functioning of humans. So while the bike is a complicated system the human is not, the human is complex. And depending on what theories of motor learning you sign up too ( there are many) then the implications of trial order are quite interesting. If the human needs a certain amount of time to "calibrate" or attune to the changes in task, equipment or environment, then an ABAABBA order might be best, but again all these interactions are non-linear. So it's hard to make blanket recommendations, especially if we want to account for fatigue. The difficulty of the track and the magnitude of set-up change need to be known quantities(and maybe qualia), even if "blind" in some way to the participant. Likewise the ability of the rider is a key variable. As their "knowledge of" - so how good they can ride any terrain, interacts with their "knowledge about" abilities - so their ability to rapidly interpret what a set-up change may mean and how they could preemptively adjust their interaction with the track is highly individual dependant but a key performance variable in testing "set-ups".
Long story short, I'm really looking forward to more insight/data if it's coming, but I would remain skeptical that a clear inductive process of data-pattern-conclusion (even if the conclusion is probabilistic). It is a big commitment to make when the non-linearity of rider-bike-track interactions is not accounted for.
Early observations and new questions
With the benefit of the big picture and despite the funny business with the "time" axis having a misleading scale, y'all can just see that Bike Setup B approaches and maybe even ends up faster than Bike Setup A, eventually. I'll show how I did the statistical tests in the next post and ask if it seems convincing and how you might have done it differently. But I think some cool things happened in the early observations. Here's when the parts started appearing that I think might help others as they think about their testing, and I can give a little more of the story.
Just like people have said, the test rider comes out of the winter holidays and New Year having built up the new, untested bike, Setup B. Setup A has been around for long enough to be familiar. The first day isn't even a test day, just a chance to ride a few laps and see where things stand. I did a few laps on Setup B but hadn't gotten the second timing chip or the method or questions firmly in my mind yet. I just wanted to know if Bike Setup B was good! The next session marks the beginning of the sampling and it could have stopped right then
Setup B is not good!
I can imagine writing up a little bike review in February. Hey guys, I did an untimed session first week of January to shake down the new bike and see where my body was at after all the Christmas cookies. I did timed testing at the same track three weeks later with three runs each, ABABAB, and didn't have any crashes or big mistakes or other reasons to cast doubt on the times. The weather was great. Setup B is just plain slower. 3 seconds slower. 7% slower. I went in the wrong direction and so it's time to start thinking about Bike Setup C!
But questions can change in light of new observations. I wanted to know about the difference between the two bikes. I took some measurements. But there were hints already that I was getting information not just about the bikes.
Look at the difference in consistency between the first 6 runs of Setup A and the first 3 runs of Setup B. The spread in A is less than the spread in B, despite having more runs (6 vs 3) logged on more days (2 vs 1). What are the plausible equipment reasons for Setup B to perform inconsistently?
Tire pressures? Suspension or brake inconsistency? The answer was no, not that I could find. So if it can't be explained by equipment performing inconsistently from run to run, then it's gotta be a non-equipment reason. Something in the environment or the rider. So the test I designed doesn't just test bikes. Dang it.
Possible patterns, possible explanations
Now that the door has cracked open to us considering that the test is capturing information that isn't simply about the nature (how fast?) of Bike Setup A versus Bike Setup B, a few more patterns jumped out. Lines of best fit are hypotheses, at least that's how I think of them. They aren't really in the data, but are a way of saying graphically and with math, if we draw this line and presume that there truly is a pattern here, what would that mean? And what would it predict in a way that we could test the prediction and get some evidence that the pattern really was there in the first place.
Here we are in mid March, and it's beginning to look like my A/B bike test is really timing things like fatigue(?) (orange lines) or a sort of 'practice effect'(?) (blue line). It's become a rider test! And who knows what else? If these confounding factors are mixed into the data, how do I use the data to answer my original question about whether A or B is faster?
Intriguing. Keep the posts coming. I hate that MTB markets crap without data to prove the benefit.
As an aside I'm amazed that setup B is slow at the start. Let's say it's a new bike, I would be quicker for sure on my very first run than my old bike (I know because I've also done accurate timed testing), but I wait a while before I'm certain that a new bike will be noticeable.
Post a reply to: 140 laps A/B test data -- considerations for future mtb data projects for readers and test riders