The use of weather data in economic research has become increasingly common. When studying developing countries, data from ground stations is often unreliable and sparse. A useful source of weather data is reanalysis data. In this example, I use local temperature and precipitation data from the ERA-Interim dataset, which is created using global atmospheric reanalysis by the European Centre for Medium-Range Weather Forecasts (ECMWF). Climate data is reported on a 0.125×0.1250.125×0.125 degree grid (approximately 15km by 15km at the equator) at six hour frequency from January 1, 2000 to December 31, 2015. The ERA-Interim reanalysis utilizes high frequency historical observations from a variety of sources: weather stations, ships, aircraft, weather balloons, radiosondes, and satellites. The reanalysis data is preferable because rainfall gauge data is sparse and low-quality in Africa.
The first thing I want to do is create a list of all of my netCDF files (one file for each year). I am going to create this list, so later I can loop through all the files in the list and perform the same operations to all files in the folder. I also read in my shaped file using the readOGR function from the rgdal package.
I want to create observations at the month level, so I write a function that takes in a netCDF data file and subsets across months and finds the average temperature for the month for each grid point.
month_average <- function( nc_input, var, output_folder) { ## This function takes an .nc file of era-interim data, ## a variable of interest (temperature), and an output ## folder path and averages the observed temperatures ## to a average monthly temperature and returns a list ## of rasterstacks for the year var.nc <- brick(nc_input, varname = var, layer = "time") time <- as.POSIXct(substr(var.nc@data@names, start=2, stop=20), format="%Y.%m.%d.%H.%M.%S") months <- unique(format(time, "%m")) # Find the most common year in file bc file has some obs. from prev and next year yr <- names(sort(table(format(time, "%Y")), decreasing=TRUE)[1]) df <- data.frame(INDEX = 1:length(time), time = time) # Create empty stack x <- stack() # Loop over months to create rasterstack for each month for(MONTH in months){ subset <- df[format(df$time, "%m") == MONTH,] sub.var <- var.nc[[subset$INDEX]] print(paste("Executing Average for Month: ", MONTH, yr)) av.var <- calc(sub.var, fun = mean) names(av.var) <- paste0(var, ".", MONTH, ".", yr) x <- stack ( x , av.var) } output <- list(x, yr) return(output)} Next I want to get rid of the extra grids and match to the area covered by my shape file.
match_brick_to_shape <- function( brick_input ) { ## This funciton takes a raster brick and matches the data ## to the coordinate susterm of my shapefile then crops ## the data to only include the areas in the shapefile ## returns the cropped raster brick brick_input %>% # Reproject raster brick to the shapefile's coordinate system. projectRaster( . , crs = proj4string( shape ), method = "ngb" ) %>% # Crop to match the size of my shapefile. raster::crop( . , extent( shape ) ) %>% return( . )}Now that my functions are ready, I write a loop to go through each file in my file list and runs both functions on my data. After, running the functions I compute spatial averages to aggregate the data to the district levels in my shape file and convert the temperature data from Kelvin to Celsius.
out <- data.frame()for (file in flist){ y <- month_average( paste0("raw_data/weather_data/", file), "t2m", out_folder) y1 <- y[[1]] y2 <- match_brick_to_shape( y1 ) # Take means according to the shape. # Make sure df = TRUE , so that output is a dataframe. # Weights equal T to weight all cells inside polygon by % covered z <- raster::extract( y2 , shape , df = TRUE, fun = mean, na.rm = TRUE, weights = TRUE ) z <- z - 273.15 # Convert Kelvin to Celsius # Combine data from all years if (nrow(out) == 0) { out <- z } else{ out <- cbind.data.frame(out, z) }}Lastly, I create a new shape file with the monthly temperature data. To so this I write the output data from the previous loop to the @data attribute of the shape file.
# Combine shapefile with average temperature datashape2 <- shapeout <- out[, !duplicated(colnames(out))] # Remove duplicated ID columnfor (i in (1:length(out))){ # Loop though each column and add to shapefile then rename new variable shape2@data$out <- out[,i] names(shape2@data)[[ncol(shape2@data)]] <- colnames(out[i])}Now that the data is prepared I want to map the average temperature in each district for a given year and month. I do this using the leaflet package to create an interactive map. I’m going to map the average temperature in each district for January 2015, so I use the variable t2_01_2015. The following leaflet code will create the map at the top of the page with popup labels when you hover over a district.
labels <- sprintf( "<strong>%s, %s</strong><br/>%g °C", shape2@data$NAME_1, shape2@data$NAME_0, round(shape2@data$t2m.01.1999, 2)) %>% lapply(htmltools::HTML)pal <- colorNumeric( palette = "YlOrRd", domain = shape2@data$t2m.01.1999)map <- leaflet(shape2) %>% addPolygons(color = "#444444", weight = 1, smoothFactor = 0.5, opacity = 1.0, fillOpacity = 0.5, fillColor = ~pal(t2m.01.1999), highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE), label = labels, labelOptions = labelOptions( style = list("font-weight" = "normal", padding = "3px 8px"), textsize = "15px", direction = "auto")) %>% addLegend(pal = pal, values = ~shape2@data$t2m.01.1999, opacity = 0.7, title = NULL, labFormat = labelFormat(suffix = '°C'), position = "bottomright") %>% addProviderTiles(providers$CartoDB.Positron)