The use of weather data in economic research has become increasingly common. When studying developing countries, data from ground stations is often unreliable and sparse. A useful source of weather data is reanalysis data. In this example, I use local temperature and precipitation data from the ERA-Interim dataset, which is created using global atmospheric reanalysis by the European Centre for Medium-Range Weather Forecasts (ECMWF). Climate data is reported on a 0.125×0.1250.125×0.125 degree grid (approximately 15km by 15km at the equator) at six hour frequency from January 1, 2000 to December 31, 2015. The ERA-Interim reanalysis utilizes high frequency historical observations from a variety of sources: weather stations, ships, aircraft, weather balloons, radiosondes, and satellites. The reanalysis data is preferable because rainfall gauge data is sparse and low-quality in Africa.
The first thing I want to do is create a list of all of my netCDF files (one file for each year). I am going to create this list, so later I can loop through all the files in the list and perform the same operations to all files in the folder. I also read in my shaped file using the readOGR function from the rgdal package.
I want to create observations at the month level, so I write a function that takes in a netCDF data file and subsets across months and finds the average temperature for the month for each grid point.
month_average <- function( nc_input, var, output_folder) {  ## This function takes an .nc file of era-interim data,   ## a variable of interest (temperature), and an output   ## folder path and averages the observed temperatures   ## to a average monthly temperature and returns a list  ## of rasterstacks for the year    var.nc <- brick(nc_input, varname = var, layer = "time")  time   <- as.POSIXct(substr(var.nc@data@names, start=2, stop=20), format="%Y.%m.%d.%H.%M.%S")  months <- unique(format(time, "%m"))    # Find the most common year in file bc file has some obs. from prev and next year  yr     <- names(sort(table(format(time, "%Y")), decreasing=TRUE)[1])  df     <- data.frame(INDEX = 1:length(time), time = time)  # Create empty stack  x <- stack()    # Loop over months to create rasterstack for each month  for(MONTH in months){    subset  <- df[format(df$time, "%m") == MONTH,]    sub.var <- var.nc[[subset$INDEX]]        print(paste("Executing Average for Month: ", MONTH, yr))    av.var <- calc(sub.var, fun = mean)    names(av.var) <- paste0(var, ".", MONTH, ".", yr)    x <- stack ( x , av.var)  }  output <- list(x, yr)  return(output)}  Next I want to get rid of the extra grids and match to the area covered by my shape file.
match_brick_to_shape <- function( brick_input ) {  ## This funciton takes a raster brick and matches the data  ## to the coordinate susterm of my shapefile then crops  ## the data to only include the areas in the shapefile  ## returns the cropped raster brick  brick_input %>%    # Reproject raster brick to the shapefile's coordinate system.    projectRaster( . , crs = proj4string( shape ),                   method = "ngb" ) %>%    # Crop to match the size of my shapefile.    raster::crop( . , extent( shape ) ) %>%    return( . )}Now that my functions are ready, I write a loop to go through each file in my file list and runs both functions on my data. After, running the functions I compute spatial averages to aggregate the data to the district levels in my shape file and convert the temperature data from Kelvin to Celsius.
out <- data.frame()for (file in flist){  y  <- month_average( paste0("raw_data/weather_data/", file), "t2m", out_folder)  y1  <- y[[1]]  y2 <- match_brick_to_shape( y1 )    # Take means according to the shape.  # Make sure df = TRUE , so that output is a dataframe.  # Weights equal T to weight all cells inside polygon by % covered  z <- raster::extract( y2 , shape , df = TRUE, fun = mean, na.rm = TRUE, weights = TRUE  )  z <- z - 273.15  # Convert Kelvin to Celsius    # Combine data from all years   if (nrow(out) == 0) {    out <- z  } else{    out <- cbind.data.frame(out, z)  }}Lastly, I create a new shape file with the monthly temperature data. To so this I write the output data from the previous loop to the @data attribute of the shape file.
# Combine shapefile with average temperature datashape2 <- shapeout <- out[, !duplicated(colnames(out))] # Remove duplicated ID columnfor (i in (1:length(out))){ # Loop though each column and add to shapefile then rename new variable  shape2@data$out <- out[,i]  names(shape2@data)[[ncol(shape2@data)]] <- colnames(out[i])}Now that the data is prepared I want to map the average temperature in each district for a given year and month. I do this using the leaflet package to create an interactive map. I’m going to map the average temperature in each district for January 2015, so I use the variable t2_01_2015. The following leaflet code will create the map at the top of the page with popup labels when you hover over a district.
labels <- sprintf(  "<strong>%s, %s</strong><br/>%g °C",  shape2@data$NAME_1, shape2@data$NAME_0, round(shape2@data$t2m.01.1999, 2)) %>% lapply(htmltools::HTML)pal <- colorNumeric(  palette = "YlOrRd",  domain = shape2@data$t2m.01.1999)map <- leaflet(shape2) %>%  addPolygons(color = "#444444", weight = 1, smoothFactor = 0.5,              opacity = 1.0, fillOpacity = 0.5,              fillColor = ~pal(t2m.01.1999),              highlightOptions = highlightOptions(color = "white", weight = 2,                                                  bringToFront = TRUE),               label = labels,              labelOptions = labelOptions(                style = list("font-weight" = "normal", padding = "3px 8px"),                textsize = "15px",                direction = "auto")) %>%   addLegend(pal = pal,             values = ~shape2@data$t2m.01.1999, opacity = 0.7,             title = NULL,            labFormat = labelFormat(suffix = '°C'),            position = "bottomright")  %>%   addProviderTiles(providers$CartoDB.Positron)