Working with Spatial Data in R

Creating a NFL map in R with Leaflet

Inspired by this post in r/cfb, I wanted to see if I could a similar map for NFL teams using the leaflet package.

I first created a spreadsheet of the location of each team’s stadium and colors using data from Wikipedia and a website of team colors. I also downloaded a shapefile of U.S. counties from the U.S. Census Bureau.

First, I loaded the stadium and color data into R using read.csv, my data can be found here. Then, I converted the latitude and longitude variables into spatial coordinates using the sp package.

library(spdep)        # for tools that use shape and poly
library(leaflet)      # used to create interactie plots that can be rendered in HTML
library(rgeos)        # to find centroid

#Load csv into dataframe containing stadium locations
stadiums <- read.csv(data_path)

# Create Spatial DataFrame of Stadiums
sp.stadiums <- stadiums
coordinates(sp.stadiums) <- ~Long+Lat

# to see location of stadiums (no basemap)

I imported the county shapefile using readOGR() and removed all areas not in the contiguous United States.

# Read in us county shapefiles  <- readOGR(dsn = shape_path, layer = "cb_2016_us_county_500k", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/dcullen/Google Drive/ECON/analysis_cfb/county_shape", layer: "cb_2016_us_county_500k"
## with 3233 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
#  Remove  Alaska(02), Hawaii (15), Puerto Rico (72), Guam (66), Virgin Islands (78), American Samoa (60)
#  Mariana Islands (69), Micronesia (64), Marshall Islands (68), Palau (70), Minor Islands (74) <-[!$STATEFP %in% c("02", "15" ,"72", "66", "78", "60", "69",
                                        "64", "68", "70", "74"),]

# Remove other outling islands <-[!$STATEFP %in% c("81", "84", "86", "87", "89", "71", "76",
                                        "95", "79"),]

I use the gCentroid function in the rgeos package to find the centroid of every county in the shapefile.

# get the centroids and then convert them to a SpatialPointsDataFrame
county_centers <- SpatialPointsDataFrame(gCentroid(, byid=TRUE), 
                               , match.ID=FALSE)


The plot(county_center) with a cross on the location of each county center. Using gDistance I calculate the distance between each centroid and each stadium in the data. Then I merge the team data onto the distance data so that each row gives the distance to each stadium.

# Create matrix of distances between each county center and each stadium
# stadium in rows counties in columns
dist_matrix <- gDistance(county_centers,sp.stadiums, byid=TRUE)

# Merge stadium data to distance data each row is distance to stadium and there 
# will be a column of team names
dist_matrix <- merge(dist_matrix, stadiums, by=0, all=TRUE)

# Keep only distances and team names
drops <- c('Row.names', 'Lat', 'Long',  'Color1', 'Color2')
dist_matrix <- dist_matrix[ , !(names(dist_matrix) %in% drops)]

I create a function that finds the minimum distance in each column (county) and writes the team name to every cell in the column.

closest_f <- function(df) {
  i <- 1
  n <- ncol(df)
  while (i < n){
    df$ <- df[which(df[,i] == min(df[,i])), which(colnames(df)=="Team")]
    df[,i]    <- df$
    i <- i + 1
  # Drop Team and columns
  drops  <- c('Team', '')
  df     <- df[ , !(names(df) %in% drops)]

After running the function on the distance matrix, I merge the data onto the spatial points data frame using apply with the min option (the min option shouldn’t do anything because every entry in each column is the same). Then I merge the data back to the spatial polygons data frame. Now I have the closest stadium associated with every county in the contiguous United States.

closest <- closest_f(dist_matrix)

# Add closest team name to county_centers @data
county_centers@data$TEAM_0 <- apply(closest,2,min)

# Merge team data back to shapefile <- merge(, county_centers, by.x="NAME", by.y="NAME")

Finally we can create the leaflet map with our data. We first create the color palette using the colorFactor command to create two color palettes c1pal for the fill color and c2pal for the line color. Then we create the label that will display on mouse over to show the county name and the team name of the closest stadium.

## Create color palette 
# Read CSV that contains first and second colors
colors <- read.csv(data_path, stringsAsFactors = FALSE)
# Keep only columns team and color1 and color2
keep.c <- c('Color1', 'Team', 'Color2')
colors <- colors[,(names(colors) %in% keep.c) ]

# Sort colors alphabetically by team so colorfactor uses correct order
colors <- colors[order(colors$Team),]

c1 <- colors[["Color1"]]
c2 <- colors[["Color2"]]
t  <- colors[["Team"]]

# Create two color palettes for factor variables
c1pal <- colorFactor(c1, t)
c2pal <- colorFactor(c2, t)

## Create Label
labels <- sprintf(
  "<strong>%s </strong><br/>%s ",$NAME,$TEAM_0
) %>% lapply(htmltools::HTML)
county_popup <- paste0($NAME,"<br>",$TEAM_0)

## Create map of closest team to each county 
# color and weight to change outline
m <- leaflet() %>%
  addPolygons(data =, weight = 1.5, color = ~c2pal($TEAM_0) , smoothFactor = 0.5, 
              fillOpacity = 1,
              fillColor = ~c1pal($TEAM_0), 
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = TRUE), 
              label = labels,
              labelOptions = labelOptions(
                style = list("font-weight" = "normal", padding = "3px 8px"),
                textsize = "15px",
                direction = "auto")) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%
  setView(lng = -98.5795, lat = 39.8283, zoom = 4)