Using Climate Reanalysis Data in R

The use of weather data in economic research has become increasingly common. When studying developing countries, data from ground stations is often unreliable and sparse. A useful source of weather data is reanalysis data. In this example, I use local temperature and precipitation data from the ERA-Interim dataset, which is created using global atmospheric reanalysis by the European Centre for Medium-Range Weather Forecasts (ECMWF). Climate data is reported on a 0.125×0.1250.125×0.125 degree grid (approximately 15km by 15km at the equator) at six hour frequency from January 1, 2000 to December 31, 2015. The ERA-Interim reanalysis utilizes high frequency historical observations from a variety of sources: weather stations, ships, aircraft, weather balloons, radiosondes, and satellites. The reanalysis data is preferable because rainfall gauge data is sparse and low-quality in Africa.

The first thing I want to do is create a list of all of my netCDF files (one file for each year). I am going to create this list, so later I can loop through all the files in the list and perform the same operations to all files in the folder. I also read in my shaped file using the readOGR function from the rgdal package.

I want to create observations at the month level, so I write a function that takes in a netCDF data file and subsets across months and finds the average temperature for the month for each grid point.

month_average <- function( nc_input, var, output_folder) {

  ## This function takes an .nc file of era-interim data,

  ## a variable of interest (temperature), and an output

  ## folder path and averages the observed temperatures

  ## to a average monthly temperature and returns a list

  ## of rasterstacks for the year

  var.nc <- brick(nc_input, varname = var, layer = "time")

  time   <- as.POSIXct(substr(var.nc@data@names, start=2, stop=20), format="%Y.%m.%d.%H.%M.%S")

  months <- unique(format(time, "%m"))

  # Find the most common year in file bc file has some obs. from prev and next year

  yr     <- names(sort(table(format(time, "%Y")), decreasing=TRUE)[1])

  df     <- data.frame(INDEX = 1:length(time), time = time)

  # Create empty stack

  x <- stack()

  # Loop over months to create rasterstack for each month

  for(MONTH in months){

    subset  <- df[format(df$time, "%m") == MONTH,]

    sub.var <- var.nc[[subset$INDEX]]

    print(paste("Executing Average for Month: ", MONTH, yr))

    av.var <- calc(sub.var, fun = mean)

    names(av.var) <- paste0(var, ".", MONTH, ".", yr)

    x <- stack ( x , av.var)

  output <- list(x, yr)

  return(output)

Next I want to get rid of the extra grids and match to the area covered by my shape file.

match_brick_to_shape <- function( brick_input ) {

  ## This funciton takes a raster brick and matches the data

  ## to the coordinate susterm of my shapefile then crops

  ## the data to only include the areas in the shapefile

  ## returns the cropped raster brick

  brick_input %>%

    # Reproject raster brick to the shapefile's coordinate system.

    projectRaster( . , crs = proj4string( shape ),

                   method = "ngb" ) %>%

    # Crop to match the size of my shapefile.

    raster::crop( . , extent( shape ) ) %>%

    return( . )

Now that my functions are ready, I write a loop to go through each file in my file list and runs both functions on my data. After, running the functions I compute spatial averages to aggregate the data to the district levels in my shape file and convert the temperature data from Kelvin to Celsius.

out <- data.frame()

for (file in flist){

  y  <- month_average( paste0("raw_data/weather_data/", file), "t2m", out_folder)

  y1  <- y[[1]]

  y2 <- match_brick_to_shape( y1 )

  # Take means according to the shape.

  # Make sure df = TRUE , so that output is a dataframe.

  # Weights equal T to weight all cells inside polygon by % covered

  z <- raster::extract( y2 , shape , df = TRUE, fun = mean, na.rm = TRUE, weights = TRUE  )

  z <- z - 273.15  # Convert Kelvin to Celsius

  # Combine data from all years

  if (nrow(out) == 0) {

    out <- z

  } else{

    out <- cbind.data.frame(out, z)

Lastly, I create a new shape file with the monthly temperature data. To so this I write the output data from the previous loop to the @data attribute of the shape file.

# Combine shapefile with average temperature data

shape2 <- shape

out <- out[, !duplicated(colnames(out))] # Remove duplicated ID column

for (i in (1:length(out))){ # Loop though each column and add to shapefile then rename new variable

  shape2@data$out <- out[,i]

  names(shape2@data)[[ncol(shape2@data)]] <- colnames(out[i])

Now that the data is prepared I want to map the average temperature in each district for a given year and month. I do this using the leaflet package to create an interactive map. I’m going to map the average temperature in each district for January 2015, so I use the variable t2_01_2015. The following leaflet code will create the map at the top of the page with popup labels when you hover over a district.

labels <- sprintf(

  "<strong>%s, %s</strong><br/>%g &#176C",

  shape2@data$NAME_1, shape2@data$NAME_0, round(shape2@data$t2m.01.1999, 2)

) %>% lapply(htmltools::HTML)

pal <- colorNumeric(

  palette = "YlOrRd",

  domain = shape2@data$t2m.01.1999

map <- leaflet(shape2) %>%

  addPolygons(color = "#444444", weight = 1, smoothFactor = 0.5,

              opacity = 1.0, fillOpacity = 0.5,

              fillColor = ~pal(t2m.01.1999),

              highlightOptions = highlightOptions(color = "white", weight = 2,

                                                  bringToFront = TRUE),

              label = labels,

              labelOptions = labelOptions(

                style = list("font-weight" = "normal", padding = "3px 8px"),

                textsize = "15px",

                direction = "auto")) %>%

  addLegend(pal = pal,

            values = ~shape2@data$t2m.01.1999, opacity = 0.7,

            title = NULL,

            labFormat = labelFormat(suffix = '°C'),

            position = "bottomright")  %>%

  addProviderTiles(providers$CartoDB.Positron)

Page updated

Google Sites

Report abuse