Using Climate Reanalysis Data in R

The use of weather data in economic research has become increasingly common. When studying developing countries, data from ground stations is often unreliable and sparse. A useful source of weather data is reanalysis data. In this example, I use local temperature and precipitation data from the ERA-Interim dataset, which is created using global atmospheric reanalysis by the European Centre for Medium-Range Weather Forecasts (ECMWF). Climate data is reported on a 0.125×0.1250.125×0.125 degree grid (approximately 15km by 15km at the equator) at six hour frequency from January 1, 2000 to December 31, 2015. The ERA-Interim reanalysis utilizes high frequency historical observations from a variety of sources: weather stations, ships, aircraft, weather balloons, radiosondes, and satellites. The reanalysis data is preferable because rainfall gauge data is sparse and low-quality in Africa.

The first thing I want to do is create a list of all of my netCDF files (one file for each year). I am going to create this list, so later I can loop through all the files in the list and perform the same operations to all files in the folder. I also read in my shaped file using the readOGR function from the rgdal package.

I want to create observations at the month level, so I write a function that takes in a netCDF data file and subsets across months and finds the average temperature for the month for each grid point.

month_average <- function( nc_input, var, output_folder) {
  ## This function takes an .nc file of era-interim data, 
  ## a variable of interest (temperature), and an output 
  ## folder path and averages the observed temperatures 
  ## to a average monthly temperature and returns a list
  ## of rasterstacks for the year <- brick(nc_input, varname = var, layer = "time")
  time   <- as.POSIXct(substr(, start=2, stop=20), format="%Y.%m.%d.%H.%M.%S")
  months <- unique(format(time, "%m"))
  # Find the most common year in file bc file has some obs. from prev and next year
  yr     <- names(sort(table(format(time, "%Y")), decreasing=TRUE)[1])
  df     <- data.frame(INDEX = 1:length(time), time = time)
  # Create empty stack
  x <- stack()
  # Loop over months to create rasterstack for each month
  for(MONTH in months){
    subset  <- df[format(df$time, "%m") == MONTH,]
    sub.var <-[[subset$INDEX]]
    print(paste("Executing Average for Month: ", MONTH, yr))
    av.var <- calc(sub.var, fun = mean)
    names(av.var) <- paste0(var, ".", MONTH, ".", yr)
    x <- stack ( x , av.var)
  output <- list(x, yr)

Next I want to get rid of the extra grids and match to the area covered by my shape file.

match_brick_to_shape <- function( brick_input ) {
  ## This funciton takes a raster brick and matches the data
  ## to the coordinate susterm of my shapefile then crops
  ## the data to only include the areas in the shapefile
  ## returns the cropped raster brick
  brick_input %>%
    # Reproject raster brick to the shapefile's coordinate system.
    projectRaster( . , crs = proj4string( shape ),
                   method = "ngb" ) %>%
    # Crop to match the size of my shapefile.
    raster::crop( . , extent( shape ) ) %>%
    return( . )

Now that my functions are ready, I write a loop to go through each file in my file list and runs both functions on my data. After, running the functions I compute spatial averages to aggregate the data to the district levels in my shape file and convert the temperature data from Kelvin to Celsius.

out <- data.frame()
for (file in flist){
  y  <- month_average( paste0("raw_data/weather_data/", file), "t2m", out_folder)
  y1  <- y[[1]]
  y2 <- match_brick_to_shape( y1 )
  # Take means according to the shape.
  # Make sure df = TRUE , so that output is a dataframe.
  # Weights equal T to weight all cells inside polygon by % covered
  z <- raster::extract( y2 , shape , df = TRUE, fun = mean, na.rm = TRUE, weights = TRUE  )
  z <- z - 273.15  # Convert Kelvin to Celsius
  # Combine data from all years 
  if (nrow(out) == 0) {
    out <- z
  } else{
    out <-, z)

Lastly, I create a new shape file with the monthly temperature data. To so this I write the output data from the previous loop to the @data attribute of the shape file.

# Combine shapefile with average temperature data
shape2 <- shape
out <- out[, !duplicated(colnames(out))] # Remove duplicated ID column
for (i in (1:length(out))){ # Loop though each column and add to shapefile then rename new variable
  shape2@data$out <- out[,i]
  names(shape2@data)[[ncol(shape2@data)]] <- colnames(out[i])

Now that the data is prepared I want to map the average temperature in each district for a given year and month. I do this using the leaflet package to create an interactive map. I’m going to map the average temperature in each district for January 2015, so I use the variable t2_01_2015. The following leaflet code will create the map at the top of the page with popup labels when you hover over a district.

labels <- sprintf(
  "<strong>%s, %s</strong><br/>%g &#176C",
  shape2@data$NAME_1, shape2@data$NAME_0, round(shape2@data$t2m.01.1999, 2)
) %>% lapply(htmltools::HTML)

pal <- colorNumeric(
  palette = "YlOrRd",
  domain = shape2@data$t2m.01.1999

map <- leaflet(shape2) %>%
  addPolygons(color = "#444444", weight = 1, smoothFactor = 0.5,
              opacity = 1.0, fillOpacity = 0.5,
              fillColor = ~pal(t2m.01.1999),
              highlightOptions = highlightOptions(color = "white", weight = 2,
                                                  bringToFront = TRUE), 
              label = labels,
              labelOptions = labelOptions(
                style = list("font-weight" = "normal", padding = "3px 8px"),
                textsize = "15px",
                direction = "auto")) %>% 
  addLegend(pal = pal, 
            values = ~shape2@data$t2m.01.1999, opacity = 0.7, 
            title = NULL,
            labFormat = labelFormat(suffix = '°C'),
            position = "bottomright")  %>%