Descriptive Analysis
There are many important facts needed to be known for determining what my marathon time will be. Through this analysis, I will collect information regarding pace, time, distance, and heart rate. All of these factors while training are important in influencing a marathon time.
To learn more about my data I performed some basic statistics on the Dataframe. Information such as overall mileage can be useful for two reasons: Shoe degradation and Personal Best Time. Running shoes are generally good for 300 - 500 miles, depending on the shoe and the use, this is important for preventing injury. I went through two Mizuno Wave Riders while training, switching my shoes on June 1st 2019. As you can see below I clocked 512.77 miles while training, and since I switched shoes throughout this time, I didn't break this 300 - 500 mile rule.
​
​

Total Distance ran
512.77 miles
Average Distance ran 4.203032786885246 miles
Longest Distance ran 26.49 miles
Shortest Distance ran
0.13 miles
Fastest pace
7:50
Slowest pace
10:02
Additionally, the more miles, the better PB (Personal Best) Time. Studies suggest that “doubling up and running in a depleted state can boost fat-burning, train the body to use glycogen more efficiently, and stimulate mitochondria production (more mitochondria can delay fatigue) (Marshal, 2019). On average I ran about 4.2 miles for each “activity” in the DataFrame, while my running plan varied from day to day on mileage, I tried to break up the longer runs to twice a day to help my recovery as well as my personal best times.


Visualizations
​
Cardiovascular fitness level
A key factor to running a great marathon is to gauge your level of cardiovascular fitness.This helps you know how you’re progressing as the weeks go by to see if you’re becoming stronger and building endurance. Your heart rate is a good indication of whether or not you’re becoming more cardiovascularly fit. For example, if you’re running 3 miles every day at a 8:15 minute mile pace, and your average heart rate is 155 beats per minute with a max of 180. If you are constantly running 3 miles your average heart rate should drop to about 145 with a max of 172, which shows your heart has grown stronger from the daily increased stress of running 3 miles every day.

Heat MAP
The heatmap shows a correlation between distance and heart rate. The color palette on the side shows the amount of correlation between the two variables. The lighter in shade represents a high correlation and the darker shade a low correlation. We can see that the distance and hr are highlighted in orange, showing that they have an average correlation between each other. To me this indicates that I had a steady increase in cardiovascular fitness as I trained for the marathon.


HEXBIN PLOT
The hexbin plot helps represent the relationship between distance and heart rate throughout the months of training. Hexbin plots are useful when you have a lot of data, instead of overlapping, the plotting window is split into multiple hexbins and the number of points per hexbin is counted. The color indicates the number of points, the darker in color the more points. Looking at the graph we see the lower mileage times have darker points as they are more consistently ran, making my heart rate more consistent over time.
Clustering with K-Means
Examining the relationship between Distance and Heart Rate

Creating the Clusters


K-Means graph
This graph represents subgroups of distances with an average heart rate associated with it based on the conditions of the K means clustering algorithm. There are no labels in this graph as K means clustering is an unsupervised learning technique. For interpreting the graph, the center of each cluster in red denotes the mean of all observations belonging to that cluster. The observations that belong to a given cluster are closer to the center of that cluster, in comparison to the center of the other clusters.
Overview of Data

