Using Climate Reanalysis Data in R
The use of weather data in economic research has become increasingly common. When studying developing countries, data from ground stations is often unreliable and sparse. A useful source of weather data is reanalysis data. In this example, I use local temperature and precipitation data from the ERA-Interim dataset, which is created using global atmospheric reanalysis by the European Centre for Medium-Range Weather Forecasts (ECMWF). Climate data is reported on a 0.125×0.1250.125×0.125 degree grid (approximately 15km by 15km at the equator) at six hour frequency from January 1, 2000 to December 31, 2015. The ERA-Interim reanalysis utilizes high frequency historical observations from a variety of sources: weather stations, ships, aircraft, weather balloons, radiosondes, and satellites. The reanalysis data is preferable because rainfall gauge data is sparse and low-quality in Africa.
The first thing I want to do is create a list of all of my netCDF files (one file for each year). I am going to create this list, so later I can loop through all the files in the list and perform the same operations to all files in the folder. I also read in my shaped file using the readOGR
function from the rgdal
package.
I want to create observations at the month level, so I write a function that takes in a netCDF data file and subsets across months and finds the average temperature for the month for each grid point.
month_average <- function( nc_input, var, output_folder) {
## This function takes an .nc file of era-interim data,
## a variable of interest (temperature), and an output
## folder path and averages the observed temperatures
## to a average monthly temperature and returns a list
## of rasterstacks for the year
var.nc <- brick(nc_input, varname = var, layer = "time")
time <- as.POSIXct(substr(var.nc@data@names, start=2, stop=20), format="%Y.%m.%d.%H.%M.%S")
months <- unique(format(time, "%m"))
# Find the most common year in file bc file has some obs. from prev and next year
yr <- names(sort(table(format(time, "%Y")), decreasing=TRUE)[1])
df <- data.frame(INDEX = 1:length(time), time = time)
# Create empty stack
x <- stack()
# Loop over months to create rasterstack for each month
for(MONTH in months){
subset <- df[format(df$time, "%m") == MONTH,]
sub.var <- var.nc[[subset$INDEX]]
print(paste("Executing Average for Month: ", MONTH, yr))
av.var <- calc(sub.var, fun = mean)
names(av.var) <- paste0(var, ".", MONTH, ".", yr)
x <- stack ( x , av.var)
}
output <- list(x, yr)
return(output)
}
Next I want to get rid of the extra grids and match to the area covered by my shape file.
match_brick_to_shape <- function( brick_input ) {
## This funciton takes a raster brick and matches the data
## to the coordinate susterm of my shapefile then crops
## the data to only include the areas in the shapefile
## returns the cropped raster brick
brick_input %>%
# Reproject raster brick to the shapefile's coordinate system.
projectRaster( . , crs = proj4string( shape ),
method = "ngb" ) %>%
# Crop to match the size of my shapefile.
raster::crop( . , extent( shape ) ) %>%
return( . )
}
Now that my functions are ready, I write a loop to go through each file in my file list and runs both functions on my data. After, running the functions I compute spatial averages to aggregate the data to the district levels in my shape file and convert the temperature data from Kelvin to Celsius.
out <- data.frame()
for (file in flist){
y <- month_average( paste0("raw_data/weather_data/", file), "t2m", out_folder)
y1 <- y[[1]]
y2 <- match_brick_to_shape( y1 )
# Take means according to the shape.
# Make sure df = TRUE , so that output is a dataframe.
# Weights equal T to weight all cells inside polygon by % covered
z <- raster::extract( y2 , shape , df = TRUE, fun = mean, na.rm = TRUE, weights = TRUE )
z <- z - 273.15 # Convert Kelvin to Celsius
# Combine data from all years
if (nrow(out) == 0) {
out <- z
} else{
out <- cbind.data.frame(out, z)
}
}
Lastly, I create a new shape file with the monthly temperature data. To so this I write the output data from the previous loop to the @data
attribute of the shape file.
# Combine shapefile with average temperature data
shape2 <- shape
out <- out[, !duplicated(colnames(out))] # Remove duplicated ID column
for (i in (1:length(out))){ # Loop though each column and add to shapefile then rename new variable
shape2@data$out <- out[,i]
names(shape2@data)[[ncol(shape2@data)]] <- colnames(out[i])
}
Now that the data is prepared I want to map the average temperature in each district for a given year and month. I do this using the leaflet
package to create an interactive map. I’m going to map the average temperature in each district for January 2015, so I use the variable t2_01_2015
. The following leaflet code will create the map at the top of the page with popup labels when you hover over a district.
labels <- sprintf(
"<strong>%s, %s</strong><br/>%g °C",
shape2@data$NAME_1, shape2@data$NAME_0, round(shape2@data$t2m.01.1999, 2)
) %>% lapply(htmltools::HTML)
pal <- colorNumeric(
palette = "YlOrRd",
domain = shape2@data$t2m.01.1999
)
map <- leaflet(shape2) %>%
addPolygons(color = "#444444", weight = 1, smoothFactor = 0.5,
opacity = 1.0, fillOpacity = 0.5,
fillColor = ~pal(t2m.01.1999),
highlightOptions = highlightOptions(color = "white", weight = 2,
bringToFront = TRUE),
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "15px",
direction = "auto")) %>%
addLegend(pal = pal,
values = ~shape2@data$t2m.01.1999, opacity = 0.7,
title = NULL,
labFormat = labelFormat(suffix = '°C'),
position = "bottomright") %>%
addProviderTiles(providers$CartoDB.Positron)