In 2008, the Baltimore Raven's drafted Quarterback Joe Flacco out of the University of Delaware with the 18th overall draft pick. In his early career, Joe Flacco saw substantial success, making a run in the playoffs to the AFC Championship Game (the game before the Super Bowl) his rookie year. In the following four years, Joe Flacco and the Baltimore Ravens made the playoffs each year, making two more AFC Championship Game appearances, and winning Super Bowl XLVII in the 2012 season. He was nicknamed by Raven's fans as "January Joe" for his playoff performances. After an impressive start to his career, Joe Flacco signed the largest contract of any professional NFL player at the time. So, naturally, the question began to be asked around the NFL: "Is Joe Flacco an Elite Quarterback?", so, here I am to shed some light on the subject by looking at quarterback statistics around the league during the first six years of Joe Flacco's career, 2008 through 2013.
To start, will we import the libraries we will need for gathering the data from sources on the internet, as well as storing and plotting the data. All libraries used are imported here.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn
from bs4 import BeautifulSoup
import requests
import warnings
import sklearn
from sklearn.linear_model import LinearRegression
!pip3 install lxml
warnings.filterwarnings('ignore')
url = "https://www.pro-football-reference.com/years/2008/passing.htm"
dfs = []
for i in range(2008, 2014):
# gets page from server and finds the table with the player
# stats in it.
re = requests.get(url)
root = BeautifulSoup(re.content)
t = root.find("table")
# converts the tables to a dataframe
df = pd.read_html(str(t))[0]
# this makes a new column that indicates the year each player's stats are from
df["year"] = [i for k in range(0, df["GS"].count())]
# add the current dataframe to the list of dataframes already scraped
dfs.append(df)
# change 2008 to 2009 in the url, 2009 to 2010, etc. up through 2013 (2014 is non inclusive)
url = url.replace(str(i), str(i+1))
Now that all the data has been scraped into tables, we can remove unwanted rows of information. First, we start by removing of rows of information that is unrelated to any player. We will also remove any players who do not have any wins or losses associate with them. On occasion, teams will run trick plays where someone other than the quarterback throws the ball. They will be taken out by this.
Additionally, we would like to make sure the player we are looking at is not a backup quarterback playing for only a few games. Their stats may be skewed for better or for worse, and would be unreliable data points. Any player who did not start atleast half the season will be removed.
starters = []
for i in range(0, len(dfs)):
df = dfs[i]
# rows exist that are just repeats of the column titles. This removes them.
df = df[~df["Rk"].str.contains("Rk")]
# converts the column "QBrec" to string and removes invalid entries
df["QBrec"] = df["QBrec"].astype(str)
df = df[~df["QBrec"].str.contains("NaN")]
df = df[~df["QBrec"].str.contains("nan")]
# convert the Games Started column to numbers
df["GS"] = pd.to_numeric(df["GS"])
# filter out quarterbacks who did not start atleast half the season
df = df[df["GS"] > 7]
starters.append(df)
Now that we have all the tables, we can combine them into one large table.
#combine the tables from each year
table = starters[0]
for i in range(1, len(starters)):
table = table.append(starters[i], ignore_index=True)
table
Now we can drop the columns that we do not wish to look at. Other stats can be kept if you would like to analyze them, however, these variables are what we will be focusing on.
table = table[["Player", "QBrec", "Cmp%", "TD", "Int", "ANY/A", "Rate", "Sk", "year"]]
table
Tidying the data is almost complete. Since some quarterbacks have played different numbers of games from each other, let us convert the statistics to show the performance per game. Additionally, we will be taking their record as starting quarterback and turning it into a win rate column.
winrate = []
names = []
completion = []
sacks = []
tds = []
ints = []
ratio = []
for x in table.itertuples():
# calculate winrate from the QBrec column
record = x[2].split("-")
games = int(record[0])+ int(record[1])+ int(record[2])
winrate.append(int(record[0])/games)
# the original table uses * and + to indicate Pro Bowl and All-Pro selections
names.append(x[1].replace("*", "").replace("+", "").strip())
# completion rate is currently a number above one as a percentage. convert to only decimals.
completion.append(float(x[3])/100)
# convert the stats to occurences per game
sacks.append(int(x[8])/games)
tds.append(int(x[4])/games)
ints.append(int(x[5])/games)
ratio.append(int(x[4])/int(x[5]))
table["Player"] = names
table["win_rate"] = winrate
table["Cmp%"] = completion
table["Sk"] = sacks
table["TD"] = tds
table["Int"] = ints
table["td_int_ratio"] = ratio
table = table.drop("QBrec", axis=1)
table["ANY/A"] = table["ANY/A"].astype(float)
table["Rate"] = table["Rate"].astype(float)
table
Update the column labels to reflect the change to statistic per game. Rate is more accurately re-named QBR.
table = table.rename(columns = {"Rate": "QBR", "Cmp%": "completion_rate", "Player": "name", \
"Sk": "sack_rate", "TD": "td_rate", "Int": "int_rate"})
# move the "year" column to the end, as it makes more sense to be there.
table = table[[col for col in table if col not in ["year"]] + ["year"]]
table
Now, our table is in a tidy form. Here are what all of the columns represent:
Now, we have our data in a neat and digestable form. Let us make some different charts to see if we can find any evidence or trend that would indicate that Joe Flacco is elite.
First, we will start by showing completion percentage versus win rate for the data.
plt.subplots(figsize = (12,8))
i = 2008
# plot the information so each year is in one color
for y in table["year"].unique():
year = table[(table==y).any(axis=1)]
# remove Joe Flacco from the plot
year = year[~year["name"].str.contains("Flacco")]
# plot that specific year of data
plt.scatter(marker = "x", x = year["win_rate"], y = year["completion_rate"], label=str(i))
i += 1
plt.legend()
for x in table.itertuples():
name = x[1]
completion_rate = x[2]
win_rate = x[8]
if(name == "Joe Flacco"):
plt.scatter(marker="o",x= x[8], y= x[2])
plt.annotate(str(x[10]) + " " + x[1], (x[8], x[2]))
win_rate = np.array(table["win_rate"]).reshape(-1, 1)
comp_rate = np.array(table["completion_rate"]).reshape(-1, 1)
# make and plot regression line.
reg = LinearRegression().fit(win_rate, comp_rate)
plt.plot(win_rate, reg.intercept_ + reg.coef_ * win_rate, '-')
x = "Win Rate"
y = "Completion Percentage"
plt.xlabel(x)
plt.ylabel(y)
plt.title(y + " vs. " + x)
plt.show()
So far, things don't look too great for Joe Flacco based on his passing accuracy. Let's see if he can do better with taking care of the ball. We can plot the touchdown to interception ratio to see if he is a good decision maker. There are only a couple outliers on the chart that make the chart small and unreadable, so we will only look at seasons where a QB has had less than 8 touchdowns thrown per interception, which is still an astounding accomplishment. We will be able to get a better idea with this range.
plt.subplots(figsize = (12,8))
col = "td_int_ratio"
i = 2008
# plot the information so each year is in one color
for y in table["year"].unique():
year = table[(table==y).any(axis=1)]
# remove Joe Flacco from the plot
year = year[~year["name"].str.contains("Flacco")]
#removing outliers with high td to int ratio for visibility
year = year[year[col] < 8]
# plot that specific year of data
plt.scatter(marker = "x", x = year["win_rate"], y = year[col], label=str(i))
i += 1
plt.legend()
for x in table.itertuples():
name = x[1]
completion_rate = x[2]
win_rate = x[8]
if(name == "Joe Flacco"):
plt.scatter(marker="o",x= x[8], y= x[9])
plt.annotate(str(x[10]) + " " + x[1], (x[8], x[9]))
win_rate = np.array(table["win_rate"]).reshape(-1, 1)
comp_rate = np.array(table[col]).reshape(-1, 1)
# make and plot regression line.
reg = LinearRegression().fit(win_rate, comp_rate)
plt.plot(win_rate, reg.intercept_ + reg.coef_ * win_rate, '-')
x = "Win Rate"
y = "Touchdowns per Interception thrown"
plt.xlabel(x)
plt.ylabel(y)
plt.title(y + " vs. " + x)
plt.show()
After showing his Touchdown to Interception ratio, things are not great, but slightly better. As described at the end of the `Data Tidying` section, Quarterback rating and Adjusted Net Yards per Attempt are a way to get a better overall idea of how a quarterback performs. We can now plot those to see if Joe Flacco stands out overall from any of the other quarterbacks.
lst = [("ANY/A", 5), ("QBR", 6)]
for p in lst:
plt.subplots(figsize = (12,8))
col = p[0]
pos = p[1]
i = 2008
# plot the information so each year is in one color
for y in table["year"].unique():
year = table[(table==y).any(axis=1)]
# remove Joe Flacco from the plot
year = year[~year["name"].str.contains("Flacco")]
# plot that specific year of data
plt.scatter(marker = "x", x = year["win_rate"], y = year[col], label=str(i))
i += 1
plt.legend()
for x in table.itertuples():
name = x[1]
completion_rate = x[2]
win_rate = x[8]
if(name == "Joe Flacco"):
plt.scatter(marker="o",x= x[8], y= x[pos])
plt.annotate(str(x[10]) + " " + x[1], (x[8], x[pos]))
win_rate = np.array(table["win_rate"]).reshape(-1, 1)
comp_rate = np.array(table[col]).reshape(-1, 1)
# make and plot regression line.
reg = LinearRegression().fit(win_rate, comp_rate)
plt.plot(win_rate, reg.intercept_ + reg.coef_ * win_rate, '-')
x = "Win Rate"
y = p[0]
plt.xlabel(x)
plt.ylabel(y)
plt.title(y + " vs. " + x)
plt.show()
No outlier performances from Joe Flacco so far. Since the data is spread out across six years, we can standardize the data to show performance relative to the other quarterbacks that year. In order to standardize our data, we must show that the data is approximately normally distributed. Using seaborn, we can make violin plots of each year of data to make sure that the data is approximately normally distributed. QBR slightly favored Joe Flacco over Adjusted Net Yards per Attempt, so for the sake of arguement, let us attempt to normalize the QBR ratings for each year.
# define plot size and make violin plots to show an approximately normal distribution.
plt.subplots(figsize = (12,8))
seaborn.violinplot(x=table["year"], y=table["QBR"])
Good! The data looks to be normally distributed because there is one peak on each violin, the data is not skewed too far to one side, and the mean (the white dot) lies approximately in the center of the distribution. We can proceed with the standardization of the QBR stat. As sports evolve, the game changes, and as the years have gone on, football has become more passing oriented. So, standardization will be done by year.
For players with an above average QBR, their standardized score will be positive, and those below average will have a negative QBR. In a normal distribution, about two thirds of the data lies within one standard deviation of the mean. Learn more about standardization and what it means here.
# begin standardizing
standardized = []
for y in table["year"].unique():
print
year = table[(table==y).any(axis=1)]
mean = year["QBR"].mean()
std = np.std(year["QBR"])
for x in year.itertuples():
qbr = x[6]
if std != 0 :
standard_qbr = (qbr - mean)/std
else :
standard_qbr = 0
standardized.append(standard_qbr)
table["std_qbr"] = standardized
table
Now that the data is standardized, we can plot it to see where Joe Flacco lies. We can make one plot for each year to show where he placed relative to other quarterbacks that year.
i = 2008
# plot the information by year, and player.
for y in table["year"].unique():
year = table[(table==y).any(axis=1)]
plt.subplots(figsize = (12,8))
for x in year.itertuples():
name = x[1]
win_rate = x[8]
std_score = x[11]
# plot that specific year of data
if name == "Joe Flacco":
plt.scatter(marker = "o", x = win_rate, y = std_score, label=str(i))
plt.annotate(name , (win_rate, std_score))
else :
plt.scatter(marker = "x", x = win_rate, y = std_score, label=str(i))
win_rate = np.array(year["win_rate"]).reshape(-1, 1)
std_qbr = np.array(year["std_qbr"]).reshape(-1, 1)
# make and plot regression line.
reg = LinearRegression().fit(win_rate, std_qbr)
plt.plot(win_rate, reg.intercept_ + reg.coef_ * win_rate, '-')
x = "Win Rate"
y = "Standardized QBR"
plt.xlabel(x)
plt.ylabel(y)
plt.title(y + " vs. " + x + ", " + str(i))
plt.show()
i += 1
So, is Joe Flacco an Elite Quarterback?
Unfortunately, Joe Flacco's stats are pretty average or below average in his 2008 through 2013 seasons. However, this does not mean Joe Flacco is not an elite quarterback. As mentioned in the introduction, Joe Flacco built his reputation of eliteness in the playoffs, which this analysis did not cover. His regular season stats are nothing spectacular or out of this world. In order to show Joe Flacco is or is not an elite quarterback, we would need to take a dive into his post-season statistics. If I was working in a group for this project, this is definitely something I would have looked into.
Purely in terms of Joe Flacco's regular season numerical performances from 2008 through 2013, he is not an elite quarterback. However, the quarterback is never the only player on the field. One shortcoming of this project is that difficulty of opponents is not considered. For example, in the current 2020 season, the AFC North, the division the Ravens are in, have three playoff contenders. Since you play each team in your division twice, being in a tougher division (meaning, your division has teams with high win rates) could certainly affect your on paper performance. When the Ravens play their biggest rival, the Pittsburgh Steelers, the games are always nail-biters until the final whistle blows. Additionally, every teams defense is ranked, and toughness of defense was not considered either. Another thing to consider is the talent of the players on the team as a whole. This is not easy to quantify, but the number of seconds a quarterback has to throw the ball without pressure is a good indicator of how good the offensive line is. More modern statistics could show how many yards apart your receivers get from the nearest defender. This would be an indication that your receivers are able to get open. Even the coaching staff will effect a quarterback's on paper performance, as some play callers are better at designing running plays and worse at designing pass plays. These are possible to take into account, however, is out of the scope of this project.
If you would like to read about Joe Flacco's historic playoff run, you can do so here. After reaching the Super Bowl, Bleacher Report, a popular sports site, wrote an article on why he should be considered an elite quarterback.
Thank you for your time to read my analysis of Joe Flacco as an elite quarterback, and as a life long Raven's fan, I hope you had as much entertainment reading this as I did writing it.
</font>