Prototyping a large-scale analysis of "weather events" in Canada

In past blog posts I have looked at weather “events” of or “blocks” of similar “extreme” weather (specifically cold snaps and heat waves), and how these weather events are described by the SSC2 (Sheridan 2002) weather types. My posts have focused on Toronto, but my research interest is broader. So I have developed some code to help automate some of the analysis that I would perform at the station level. I originally developed some of this code as a Shiny app, but that app needs an update, so this post is as much about debugging the code as it is about writing it!

First, we load the libraries that we will need for this analysis.

library(canadaHCDx)
library(claut)
library(dplyr)
library(lubridate)
library(plotly)
library(purrr)
library(readr)
library(stringr)
library(tibble)
library(tidyr)
library(XML)

Getting information about the available stations

Let’s first grab the list of SSC2 stations. We want to look at both local weather conditions and the weather type, so I will use the SSC2 stations as my “data points”, and find ECCC data from the same, or nearby stations.

First I will download the webpage that shows the available stations, pull the table from the HTML using the XML package, and then filter so that my list only contains the stations in Canada.

f <- tempfile(fileext = ".html")
url <- "http://sheridan.geog.kent.edu/ssc/stations.html"
download.file(url, f, quiet = TRUE)

readHTMLTable(f,stringsAsFactors = FALSE)[[1]] %>% 
  as_tibble() %>% 
  filter(grepl("Canada", Station)) %>% 
  mutate(Station = sub("\\s*, Canada", "", Station)) -> stns

stns
## # A tibble: 110 x 3
##    Station                             ID    Years                        
##    <chr>                               <chr> <chr>                        
##  1 Abbotsford, British Columbia        YXX   1977-2019                    
##  2 Alert, Nunavut                      YLT   1950-1963 1973-2018          
##  3 Bagotville, Quebec                  YBG   1951-2019                    
##  4 Baker Lake, Nunavut                 YBK   1946-1947 1955-2019          
##  5 Big Trout Lake, Ontario             YTL   1953-1991 1994-2019          
##  6 Brandon, Manitoba                   YBR   1977-2019                    
##  7 Calgary, Alberta                    YYC   1953-2019                    
##  8 Cambridge Bay, Nunavut              YCB   1950-1959 1973-2004 2006-2019
##  9 Cartwright, Newfoundland            YRF   1955-1996 1998-2019          
## 10 Charlottetown, Prince Edward Island YYG   1953-2019                    
## # … with 100 more rows

We want to know the years of available data, but the formatted list of ranges is not very easy to automate. Let’s get the minimum and maximum years as “start” and “end”.

str_extract_all(stns$Years, "[0-9]{4}") %>% 
  map_df(~tibble(Start = min(as.integer(.)),
                 End = max(as.integer(.)))) %>% 
  bind_cols(stns, .) -> stns

stns
## # A tibble: 110 x 5
##    Station                          ID    Years                      Start   End
##    <chr>                            <chr> <chr>                      <int> <int>
##  1 Abbotsford, British Columbia     YXX   1977-2019                   1977  2019
##  2 Alert, Nunavut                   YLT   1950-1963 1973-2018         1950  2018
##  3 Bagotville, Quebec               YBG   1951-2019                   1951  2019
##  4 Baker Lake, Nunavut              YBK   1946-1947 1955-2019         1946  2019
##  5 Big Trout Lake, Ontario          YTL   1953-1991 1994-2019         1953  2019
##  6 Brandon, Manitoba                YBR   1977-2019                   1977  2019
##  7 Calgary, Alberta                 YYC   1953-2019                   1953  2019
##  8 Cambridge Bay, Nunavut           YCB   1950-1959 1973-2004 2006-…  1950  2019
##  9 Cartwright, Newfoundland         YRF   1955-1996 1998-2019         1955  2019
## 10 Charlottetown, Prince Edward Is… YYG   1953-2019                   1953  2019
## # … with 100 more rows

The “Years” column is now almost redundant, but it still contains an extra point of interest that we have not captured in “Start” and “End”—the periods of missing data are apparent as the gaps between the last year of each range and the first year of the next range. Let’s use this logic to pull this information out, explicitly.

str_extract_all(stns$Years, "-[0-9]{4} [0-9]{4}-") %>% 
  map_df(~list(Gaps = paste(as.integer(str_sub(., 2, 5)) + 1L,
                            as.integer(str_sub(., 6, 10)) -1L,
                            sep = "-", collapse = ", "))) %>% 
  mutate(Gaps = str_replace_all(Gaps, "([0-9]{4})-\\1", "\\1")) %>% 
  bind_cols(select(stns, -Years), .) -> stns

stns
## # A tibble: 110 x 5
##    Station                             ID    Start   End Gaps       
##    <chr>                               <chr> <int> <int> <chr>      
##  1 Abbotsford, British Columbia        YXX    1977  2019 ""         
##  2 Alert, Nunavut                      YLT    1950  2018 "1964-1972"
##  3 Bagotville, Quebec                  YBG    1951  2019 ""         
##  4 Baker Lake, Nunavut                 YBK    1946  2019 "1948-1954"
##  5 Big Trout Lake, Ontario             YTL    1953  2019 "1992-1993"
##  6 Brandon, Manitoba                   YBR    1977  2019 ""         
##  7 Calgary, Alberta                    YYC    1953  2019 ""         
##  8 Cambridge Bay, Nunavut              YCB    1950  2019 "1960-1972"
##  9 Cartwright, Newfoundland            YRF    1955  2019 "1997"     
## 10 Charlottetown, Prince Edward Island YYG    1953  2019 ""         
## # … with 100 more rows

Now we have information on the temporal availability of data, but we are missing the spatial information. Let’s pull the information out of the Google Map at http://sheridan.geog.kent.edu/ssc.html

f <- tempfile(fileext = ".kml")
download.file("https://www.google.com/maps/d/kml?mid=1wHXUzwNjNvmnz2JW9XFr8AhqntWEaeFQ&lid=PlhtYW-UWmA&forcekml=1", f, quiet = TRUE)
coords <- rgdal::readOGR(f)
## OGR data source with driver: LIBKML 
## Source: "/tmp/RtmpIq1VJS/file113d783ee73d.kml", layer: "SSC stations 2018.csv"
## with 808 features
## It has 13 fields

Let’s see what we’re working with:

coords %>% head()
##          coordinates  Name              description timestamp begin  end
## 1   (3.22, 36.69, 0) 36.69   AAE: ALG<br>7.81: 3.22      <NA>  <NA> <NA>
## 2   (6.62, 36.28, 0) 36.28   AAE: CZL<br>7.81: 6.62      <NA>  <NA> <NA>
## 3  (-0.62, 35.62, 0) 35.62  AAE: ORN<br>7.81: -0.62      <NA>  <NA> <NA>
## 4 (-61.79, 17.14, 0) 17.14 AAE: ANU<br>7.81: -61.79      <NA>  <NA> <NA>
## 5  (-70.02, 12.5, 0) 12.50 AAE: AUA<br>7.81: -70.02      <NA>  <NA> <NA>
## 6  (14.19, 48.23, 0) 48.23  AAE: LNZ<br>7.81: 14.19      <NA>  <NA> <NA>
##   altitudeMode tessellate extrude visibility drawOrder icon AAE  X7_81
## 1         <NA>         -1       0         -1        NA <NA> ALG   3.22
## 2         <NA>         -1       0         -1        NA <NA> CZL   6.62
## 3         <NA>         -1       0         -1        NA <NA> ORN  -0.62
## 4         <NA>         -1       0         -1        NA <NA> ANU -61.79
## 5         <NA>         -1       0         -1        NA <NA> AUA -70.02
## 6         <NA>         -1       0         -1        NA <NA> LNZ  14.19

Extract the information that we are interested in.

coords <- bind_cols(
  attr(coords, "data") %>% mutate(ID = str_sub(description, 6, 8)) %>% select(ID),
  attr(coords, "coords") %>% as_tibble %>% select(Latitude = coords.x2, Longitude = coords.x1)
)

coords %>% head()
##    ID Latitude Longitude
## 1 ALG    36.69      3.22
## 2 CZL    36.28      6.62
## 3 ORN    35.62     -0.62
## 4 ANU    17.14    -61.79
## 5 AUA    12.50    -70.02
## 6 LNZ    48.23     14.19

Join to our stns table.

left_join(stns, coords, by = "ID") %>% 
  select(Station, ID, Latitude, Longitude, Start:Gaps) -> stns

stns
## # A tibble: 110 x 7
##    Station                        ID    Latitude Longitude Start   End Gaps     
##    <chr>                          <chr>    <dbl>     <dbl> <int> <int> <chr>    
##  1 Abbotsford, British Columbia   YXX       49.0    -122.   1977  2019 ""       
##  2 Alert, Nunavut                 YLT       82.5     -62.3  1950  2018 "1964-19…
##  3 Bagotville, Quebec             YBG       48.3     -71    1951  2019 ""       
##  4 Baker Lake, Nunavut            YBK       64.3     -96.1  1946  2019 "1948-19…
##  5 Big Trout Lake, Ontario        YTL       53.8     -89.9  1953  2019 "1992-19…
##  6 Brandon, Manitoba              YBR       49.9    -100.   1977  2019 ""       
##  7 Calgary, Alberta               YYC       51.1    -114.   1953  2019 ""       
##  8 Cambridge Bay, Nunavut         YCB       69.1    -105.   1950  2019 "1960-19…
##  9 Cartwright, Newfoundland       YRF       53.7     -57.0  1955  2019 "1997"   
## 10 Charlottetown, Prince Edward … YYG       46.3     -63.1  1953  2019 ""       
## # … with 100 more rows

Getting station data

Let’s look at how we would analyze data for a given station. True to form, I’ll stick to Toronto for the first portion of this blog post. This is the station called “Toronto, Ontario” in our table above.

First, we want to get the relevant data from our stns object for the Station.

get_stn_info <- function(station_in, stns) {
  filter(stns, Station == station_in)
}

stn_info <- get_stn_info("Toronto, Ontario", stns)

stn_info
## # A tibble: 1 x 7
##   Station          ID    Latitude Longitude Start   End Gaps     
##   <chr>            <chr>    <dbl>     <dbl> <int> <int> <chr>    
## 1 Toronto, Ontario YYZ       43.7     -79.6  1950  2019 1967-1970

Now we need to find the ECCC stations that are co-located (or nearby) to 43.68°N, 79.63°W.

get_eccc_data <- function(stn_info) {
  # Look for nearby stations using canadaHCDx
  eccc_stns <- filter(find_station(),
                      LatitudeDD == stn_info$Latitude & LongitudeDD == stn_info$Longitude)
  # If no station is co-located, find another within (an arbitrary) 2 km
  if (nrow(eccc_stns) == 0) {
      eccc_stns <- find_station(target = c(stn_info$latitude, stn_info$longitude), dist = 0:2)
  }
  
  # Return data using the CanadaHCD package.
  # Cache requires fork - remotes::install_github("ConorIA/canadaHCD@cache_external")
  hcd_hourly(eccc_stns$StationID,
             year = stn_info$Start:stn_info$End,
             month = c(1, 2, 6, 7, 8, 12), cache = TRUE, progress = FALSE)
}

The function above will return ECCC data for one or more nearby stations (co-located or within 2 km).

dat <- get_eccc_data(stn_info)

dat
## # A tibble: 612,336 x 15
##    Station StationID Longitude Latitude DateTime             Temp DewPointTemp
##    <chr>   <chr>         <dbl>    <dbl> <dttm>              <dbl>        <dbl>
##  1 TORONT… 6158731       -79.6     43.7 1950-01-01 00:00:00    NA           NA
##  2 TORONT… 6158731       -79.6     43.7 1950-01-01 01:00:00    NA           NA
##  3 TORONT… 6158731       -79.6     43.7 1950-01-01 02:00:00    NA           NA
##  4 TORONT… 6158731       -79.6     43.7 1950-01-01 03:00:00    NA           NA
##  5 TORONT… 6158731       -79.6     43.7 1950-01-01 04:00:00    NA           NA
##  6 TORONT… 6158731       -79.6     43.7 1950-01-01 05:00:00    NA           NA
##  7 TORONT… 6158731       -79.6     43.7 1950-01-01 06:00:00    NA           NA
##  8 TORONT… 6158731       -79.6     43.7 1950-01-01 07:00:00    NA           NA
##  9 TORONT… 6158731       -79.6     43.7 1950-01-01 08:00:00    NA           NA
## 10 TORONT… 6158731       -79.6     43.7 1950-01-01 09:00:00    NA           NA
## # … with 612,326 more rows, and 8 more variables: RelHumidity <int>,
## #   WindDir <int>, WindSpeed <int>, Visibility <dbl>, Pressure <dbl>,
## #   Humidex <int>, WindChill <int>, Weather <chr>

Now we need the corresponding weather types data, binned as I have done in previously: “warm” (DT, MT, MT+, MT++), “moderate” (DM, MM), and “cool” (DP, MP), plus “transitional”.

get_ssc_data <- function(stn_info) {
  f <- tempfile(fileext = ".dbdmt")
  url <- sprintf("http://sheridan.geog.kent.edu/ssc/files/%s.dbdmt", stn_info$ID)
  download.file(url, f, quiet = TRUE)
  air <- read_table(f, col_names = c("Station", "Date", "Air"),
                    col_types = cols(col_character(),
                                     col_date(format = "%Y%m%d"),
                                     col_factor(levels = c(1, 2, 3, 4, 5, 6, 7, 66, 67))))
  levels(air$Air) <- list(warm = c(3, 6, 66, 67), mod = c(1, 4), cool = c(2, 5), trans = 7)
  air %>% mutate(Air = forcats::fct_explicit_na(air$Air, na_level = "(none)"))
}
air <- get_ssc_data(stn_info)

air
## # A tibble: 25,202 x 3
##    Station Date       Air  
##    <chr>   <date>     <fct>
##  1 YYZ     1950-01-01 mod  
##  2 YYZ     1950-01-02 mod  
##  3 YYZ     1950-01-03 warm 
##  4 YYZ     1950-01-04 trans
##  5 YYZ     1950-01-05 cool 
##  6 YYZ     1950-01-06 cool 
##  7 YYZ     1950-01-07 trans
##  8 YYZ     1950-01-08 trans
##  9 YYZ     1950-01-09 cool 
## 10 YYZ     1950-01-10 trans
## # … with 25,192 more rows

Making daily data

As explained in a previous post, the use of 6AM UTC for the “cutoff” point for the dermatological day is not ideal. I will once again use the sun-based COWND proposed by Žaknić-Ćatović and Gough (2018).

I need a few helper functions to get me from A to B in this case.

A function to find the timezone of each station

get_tz <- function(lat, lon) {
  tz <- googleway::google_timezone(c(lat, lon), key = Sys.getenv("GOOGLE_API_KEY"))
  tz$rawOffset/3600
}

And functions to calculate windchill (wcl) and humidex (hdx).

wcl <- function(tair, wspeed) {
  wcl = 13.12 + 0.6215 * tair - 11.37 * wspeed^0.16 + 0.3965 * tair * wspeed^0.16
  wcl[tair >= 10 | wspeed < 4.8] <- NA
  round(wcl)
}

hdx <- function(tair, tdew) {
  tdew = tdew + 273.15
  e = 6.11 * exp(1)^(5417.7530  * ((1 / 273.16) - (1 / tdew)))
  h = 0.5555 * (e - 10)
  round(tair + h)
}

Now let’s create the function to create our data set(s).

I am going to include an additional note

make_eccc_daily <- function(stn_info, dat) {
  
  tz <- get_tz(stn_info$Latitude, stn_info$Longitude)
  
  filter(dat, month(DateTime) %in% c(1, 2, 12)) %>% 
    mutate(WindChill = wcl(Temp, WindSpeed)) %>% 
    group_by_at(vars(Station:Latitude)) %>%
    nest() %>%
    mutate(daily = map(data, ~solar_daily(select(., DateTime, Temp, WindChill),
                                          Latitude, Longitude, tz, na.rm = TRUE))) %>% 
    select(-data) %>%
    unnest(daily) %>% 
    ungroup() -> win_dly
  
  filter(dat, month(DateTime) %in% 6:8) %>% 
    mutate(Humidex = hdx(Temp, DewPointTemp)) %>% 
    group_by_at(vars(Station:Latitude)) %>%
    nest() %>%
    mutate(daily = map(data, ~solar_daily(select(., DateTime, Temp, Humidex),
                                          Latitude, Longitude, tz, na.rm = TRUE))) %>% 
    select(-data) %>%
    unnest(daily) %>% 
    ungroup() -> sum_dly
  
    list(win_dly = win_dly, sum_dly = sum_dly)
}
daily_dat <- make_eccc_daily(stn_info, dat)

Identifying “events”

Now we can identify our events. Here we will look at heatwaves and cold snaps. I will be working with the following definitions:

  • A cold snap is any two or more days with minimum temperatures below -15°C or windchill below -20°C.
  • A heat wave is any two or more days with maximum temperatures above 30°C and minimum temperatures above 20°C, or days when the humidex exceeds 40.
win_expression <- "MinTemp <= -15 | MinWindChill <= -20"
win_event_length <- 2

sum_expression <- "(MaxTemp >= 31 & MinTemp >= 20) | MaxHumidex >= 40"
sum_event_length <- 2
make_eccc_warns <- function(stn_info, daily_dat,
                            win_expression, win_event_length,
                            sum_expression, sum_event_length) {
  
  sum_warns <- daily_dat$sum_dly %>%
    mutate(Criteria = eval(parse(text = sum_expression))) %>% 
    group_by(Station, Year = year(Date)) %>%
    mutate(Start = as.Date(describe_snaps(rle(Criteria), sum_event_length,
                                          Date, min), origin = origin),
           End = as.Date(describe_snaps(rle(Criteria), sum_event_length, 
                                        Date, max), origin = origin),
           Length = describe_snaps(rle(Criteria), sum_event_length)) %>%
    filter(!is.na(Length)) %>%
    group_by(Station, Year, Start, End, Length) %>%
    summarize_all(mean) %>%
    select(-Criteria, -Date)
  
  win_warns <- daily_dat$win_dly %>%
    mutate(Criteria = eval(parse(text = win_expression))) %>% 
    mutate(Year = case_when(month(Date) == 12 ~ year(Date) + 1,
                            TRUE ~ year(Date))) %>%
    filter(Year != stn_info$Start) %>%
    group_by(Station, Year) %>%
    mutate(Start = as.Date(describe_snaps(rle(Criteria), win_event_length,
                                          Date, min), origin = origin),
           End = as.Date(describe_snaps(rle(Criteria), win_event_length,
                                        Date, max), origin = origin),
           Length = describe_snaps(rle(Criteria), win_event_length)) %>%
    filter(!is.na(Length)) %>%
    group_by(Station, Year, Start, End, Length) %>%
    summarize_all(mean) %>%
    select(-Criteria, -Date)
  
  list(win_warns = win_warns, sum_warns = sum_warns)
}
warns <- make_eccc_warns(stn_info, daily_dat,
                         win_expression, win_event_length,
                         sum_expression, sum_event_length)

Creating seasonal summaries

Let’s look at some seasonal summaries as well. Summarize the number of days with a particular air mass, by season.

make_air_sets <- function(stn_info, air, daily_dat) {
  
  sum_air <- left_join(daily_dat$sum_dly, select(air, -Station), by = "Date") %>%
    group_by(Station, Year = year(Date), Air) %>%
    count() %>%
    ungroup() %>%
    pivot_wider(names_from = "Air", values_from = "n", values_fill = list(n = 0L))
  
  win_air <- left_join(daily_dat$win_dly, select(air, -Station), by = "Date") %>%
    mutate(Year = case_when(month(Date) == 12 ~ year(Date) + 1,
                           TRUE ~ year(Date))) %>%
    filter(Year != stn_info$Start) %>%
    group_by(Station, Year, Air) %>%
    count() %>%
    ungroup() %>%
    pivot_wider(names_from = "Air", values_from = "n", values_fill = list(n = 0L))
  
  list(win_air = win_air, sum_air = sum_air)
}
airs <- make_air_sets(stn_info, air, daily_dat)

airs$sum_air
## # A tibble: 140 x 8
##    Station         Year  warm   mod  cool trans `(none)`  `NA`
##    <chr>          <dbl> <int> <int> <int> <int>    <int> <int>
##  1 TORONTO INTL A  1950     8    48    21    15        0     0
##  2 TORONTO INTL A  1951    16    48    15    13        0     0
##  3 TORONTO INTL A  1952    26    45     7    14        0     0
##  4 TORONTO INTL A  1953    22    44    17     8        1     0
##  5 TORONTO INTL A  1954    17    50    15    10        0     0
##  6 TORONTO INTL A  1955    43    34     9     6        0     0
##  7 TORONTO INTL A  1956    11    51    21     9        0     0
##  8 TORONTO INTL A  1957    18    43    18    13        0     0
##  9 TORONTO INTL A  1958    12    45    14    21        0     0
## 10 TORONTO INTL A  1959    35    37     8    12        0     0
## # … with 130 more rows

Summarize the characteristics of “events” during each season.

make_eccc_warns_seasonal <- function(warns) {

  summ_sum <- left_join(warns$sum_warns %>% group_by(Station, Year) %>% count(),
                        warns$sum_warns %>% group_by(Station, Year) %>%
                            summarize(AvgLen = mean(Length),
                                      AvgTmax = weighted.mean(MaxTemp, Length),
                                      AvgTmax = weighted.mean(MinTemp, Length),
                                      AvgTmean = weighted.mean(MeanTemp, Length),
                                      AvgMaxHumidex = weighted.mean(MaxHumidex, Length)),
                        by = c("Station", "Year"))
  
  summ_win <- left_join(warns$win_warns %>% group_by(Station, Year) %>% count(),
                        warns$win_warns %>% group_by(Station, Year) %>%
                          summarize(AvgLen = mean(Length),
                                    AvgTmax = weighted.mean(MaxTemp, Length),
                                    AvgTmax = weighted.mean(MinTemp, Length),
                                    AvgTmean = weighted.mean(MeanTemp, Length),
                                    AvgMinWindChill = weighted.mean(MinWindChill, Length)),
                        by = c("Station", "Year"))
  
  list(summ_win = summ_win, summ_sum = summ_sum)
}
warns_summary <- make_eccc_warns_seasonal(warns)

warns_summary$summ_win
## # A tibble: 65 x 7
## # Groups:   Station, Year [65]
##    Station                    Year     n AvgLen AvgTmax AvgTmean AvgMinWindChill
##    <chr>                     <dbl> <int>  <dbl>   <dbl>    <dbl>           <dbl>
##  1 TORONTO INTL A             2014     9   3.44   -15.7   -12.8            -25.1
##  2 TORONTO INTL A             2015     6   5.33   -15.6   -12.9            -24.9
##  3 TORONTO INTL A             2016     3   2.67   -15.2   -10.5            -25.1
##  4 TORONTO INTL A             2017     2   3      -12.1    -9.98           -20.7
##  5 TORONTO INTL A             2018     4   4.25   -16.5   -12.8            -25.7
##  6 TORONTO INTL A             2019     2   5      -17.1   -13.7            -27.2
##  7 TORONTO LESTER B. PEARSO…  1954     7   2.86   -15.9   -12.3            -24.2
##  8 TORONTO LESTER B. PEARSO…  1955     5   3      -15.5   -12.4            -23.9
##  9 TORONTO LESTER B. PEARSO…  1956     5   2.6    -14.7   -11.2            -22.7
## 10 TORONTO LESTER B. PEARSO…  1957     3   3.67   -16.7   -13.1            -25.2
## # … with 55 more rows

And finally a seasonal weather data set.

make_eccc_seasonal <- function(stn_info, daily_dat,
                               sum_expression, win_expression) {

  win_seas <- daily_dat$win_dly %>%
    mutate(Year = case_when(month(Date) == 12 ~ year(Date) + 1,
                            TRUE ~ year(Date)),
           Criteria = eval(parse(text = win_expression))) %>%
    filter(Year != stn_info$Start) %>%
    group_by(Station, Year) %>%
    summarize(MaxTemp = mean(MaxTemp, na.rm = TRUE),
              MinTemp = mean(MinTemp, na.rm = TRUE),
              MeanTemp = mean(MeanTemp, na.rm = TRUE),
              MinWindChill = mean(MinWindChill, na.rm = TRUE),
              ColdDays = sum(Criteria, na.rm = TRUE)) %>%
    mutate_if(is.numeric, ~ifelse(abs(.) == Inf,NA,.)) %>% 
    mutate(ColdDays = ifelse(is.na(MaxTemp) & is.na(MinTemp) & is.na(MeanTemp),
                             NA, ColdDays)) #FIXME: Need a more elegant solution to this
  
  sum_seas <- daily_dat$sum_dly %>%
    mutate(Criteria = eval(parse(text = sum_expression))) %>% 
    group_by(Station, Year = year(Date)) %>%
    summarize(MaxTemp = mean(MaxTemp, na.rm = TRUE),
              MinTemp = mean(MinTemp, na.rm = TRUE),
              MeanTemp = mean(MeanTemp, na.rm = TRUE),
              MaxHumidex = mean(MaxHumidex, na.rm = TRUE),
              HeatDays = sum(Criteria, na.rm = TRUE)) %>% 
    mutate_if(is.numeric, ~ifelse(abs(.) == Inf,NA,.)) %>% 
    mutate(HeatDays = ifelse(is.na(MaxTemp) & is.na(MinTemp) & is.na(MeanTemp),
                             NA, HeatDays)) #FIXME: Need a more elegant solution to this
  
  list(win_seas = win_seas, sum_seas = sum_seas)
}
seas_dat <- make_eccc_seasonal(stn_info, daily_dat,
                               sum_expression, win_expression)

seas_dat$sum_seas
## # A tibble: 140 x 7
## # Groups:   Station [2]
##    Station         Year MaxTemp MinTemp MeanTemp MaxHumidex HeatDays
##    <chr>          <dbl>   <dbl>   <dbl>    <dbl>      <dbl>    <int>
##  1 TORONTO INTL A  1950      NA      NA       NA         NA       NA
##  2 TORONTO INTL A  1951      NA      NA       NA         NA       NA
##  3 TORONTO INTL A  1952      NA      NA       NA         NA       NA
##  4 TORONTO INTL A  1953      NA      NA       NA         NA       NA
##  5 TORONTO INTL A  1954      NA      NA       NA         NA       NA
##  6 TORONTO INTL A  1955      NA      NA       NA         NA       NA
##  7 TORONTO INTL A  1956      NA      NA       NA         NA       NA
##  8 TORONTO INTL A  1957      NA      NA       NA         NA       NA
##  9 TORONTO INTL A  1958      NA      NA       NA         NA       NA
## 10 TORONTO INTL A  1959      NA      NA       NA         NA       NA
## # … with 130 more rows

Exploring the initial results

Now we can plot the initial results. Some of these plots will need to be tweaked, for instance, to remove months with too many missing values.

Winter “cold snaps”

Winter SSC classes

airs$win_air %>%
  pivot_longer(c(-Station, -Year), names_to = "air", values_to = "n") %>%
  plot_ly(x = ~Year, y = ~n, color = ~air) %>%
  add_lines() %>%
  layout(title = "Seasonal wintertime air masses",
         yaxis = list(title = "Number of instances of air mass type"))

Winter weather events

warns$win_warns %>% 
  plot_ly(x = ~Start, y = ~MinWindChill, color = ~MinTemp, size = ~Length,
          symbol = ~Station, hoverinfo = "text", 
          hovertext = paste0(warns$win_warns$Start, ": ",
                             warns$win_warns$Length, "days")) %>%
  plotly::add_markers() %>%
  layout(title = "Wintertime cold alerts",
         xaxis = list(title = "Start Date",
                      range = c(min(warns$win_warns$Start - 1000),
                                max(warns$win_warns$Start) + 1000)))

Seasonal winter weather event summary

seas_dat$win_seas %>% 
  select(-`ColdDays`) %>%
  pivot_longer(c(-Station, -Year), names_to = "Var", values_to = "value") %>%
  plot_ly(x = ~Year, y = ~value, color = ~Var, linetype = ~Station) %>%
  add_lines() %>%
  layout(yaxis = list(title = "Temperature (°C) / WindChill"),
         title = "Winter temperature and windchill")
warns_summary$summ_win %>% 
  plot_ly(x = ~Year, y = ~n, symbol = ~Station,
          color = ~AvgMinWindChill, size = ~AvgLen,
          hoverinfo = "text",
          hovertext = paste0(warns_summary$summ_win$Year, ": ",
                             warns_summary$summ_win$n,
                             " event(s) with mean duration of ",
                             round(warns_summary$summ_win$AvgLen, 1),
                             " days")) %>% plotly::add_markers() %>%
  layout(title = "Seasonal wintertime cold alert weather",
         yaxis = list(title = "Number of cold alerts"))

We can also perform a quick linear regression (though these data are likely to be junk until I’ve done a better job of dealing with missing values and discontinuous series that are split between “different” stations, like Toronto Pearson.)

Seasonal characteristics

fx <- function(x) {
  if (all(is.na(x$value))) {
    tibble(min = NA, max = NA, mean = NA)
  } else {
    summ <- summary(lm(value~Year, data = x))
    tibble(min = min(x$value, na.rm = TRUE),
           max = max(x$value, na.rm = TRUE),
           mean = mean(x$value, na.rm = TRUE),
           slope = coef(summ)[2], rsq = summ$r.squared, p = coef(summ)[8])
  }
}

seas_dat$win_seas %>% 
  pivot_longer(c(-Station, -Year), names_to = "var", values_to = "value") %>% 
  group_by(Station, var) %>% 
  nest() %>% 
  mutate(model = map(data, ~fx(.))) %>% 
  select(-data) %>% 
  unnest(model) %>% 
  knitr::kable("markdown", digits = 3, escape = FALSE)
Station var min max mean slope rsq p
TORONTO INTL A MaxTemp -4.702 1.935 -1.196 0.665 0.318 0.188
TORONTO INTL A MinTemp -8.628 -2.211 -5.060 0.747 0.343 0.167
TORONTO INTL A MeanTemp -6.691 -0.245 -3.203 0.709 0.338 0.171
TORONTO INTL A MinWindChill -15.578 -7.867 -11.731 1.163 0.635 0.107
TORONTO INTL A ColdDays 1.000 35.000 17.286 -4.536 0.537 0.061
TORONTO LESTER B. PEARSON INT’L A MaxTemp -5.908 2.216 -2.306 0.040 0.173 0.001
TORONTO LESTER B. PEARSON INT’L A MinTemp -10.331 -1.817 -6.447 0.053 0.241 0.000
TORONTO LESTER B. PEARSON INT’L A MeanTemp -8.069 0.072 -4.362 0.040 0.166 0.001
TORONTO LESTER B. PEARSON INT’L A MinWindChill -17.544 -7.077 -12.756 0.057 0.195 0.001
TORONTO LESTER B. PEARSON INT’L A ColdDays 2.000 39.000 18.934 -0.211 0.171 0.001

Characteristics of cold snaps

warns_summary$summ_win %>%
  filter(Year != year(Sys.Date())) %>%
  pivot_longer(c(-Station, -Year), names_to = "var", values_to = "value") %>%
  group_by(Station, var) %>% 
  nest() %>% 
  mutate(model = map(data, ~fx(.))) %>% 
  select(-data) %>% 
  unnest(model) %>% 
  knitr::kable("markdown", digits = 3, escape = FALSE)
Station var min max mean slope rsq p
TORONTO INTL A n 2.000 9.000 4.333 -1.200 0.675 0.045
TORONTO INTL A AvgLen 2.667 5.333 3.949 0.139 0.057 0.648
TORONTO INTL A AvgTmax -17.120 -12.067 -15.361 -0.181 0.037 0.715
TORONTO INTL A AvgTmean -13.682 -9.980 -12.122 -0.106 0.018 0.802
TORONTO INTL A AvgMinWindChill -27.200 -20.667 -24.778 -0.244 0.044 0.691
TORONTO LESTER B. PEARSON INT’L A n 1.000 12.000 4.949 -0.073 0.254 0.000
TORONTO LESTER B. PEARSON INT’L A AvgLen 2.000 4.800 3.135 -0.007 0.030 0.193
TORONTO LESTER B. PEARSON INT’L A AvgTmax -18.417 -11.800 -15.637 0.026 0.125 0.006
TORONTO LESTER B. PEARSON INT’L A AvgTmean -14.771 -8.119 -12.157 0.021 0.070 0.042
TORONTO LESTER B. PEARSON INT’L A AvgMinWindChill -28.833 -21.250 -24.527 0.014 0.032 0.172

Summer “heat waves”

Summer SSC classes

airs$sum_air %>%
  pivot_longer(c(-Station, -Year), names_to = "air", values_to = "n") %>%
  plot_ly(x = ~Year, y = ~n, color = ~air) %>%
  add_lines() %>%
  layout(title = "Seasonal summertime air masses",
         yaxis = list(title = "Number of instances of air mass type"))

Summer weather events

warns$sum_warns %>% 
  plot_ly(x = ~Start, y = ~MaxHumidex, color = ~MaxTemp, size = ~Length,
          symbol = ~Station, hoverinfo = "text", 
          hovertext = paste0(warns$sum_warns$Start, ": ",
                             warns$sum_warns$Length, "days")) %>%
  plotly::add_markers() %>%
  layout(title = "Summertime heat warnings",
         xaxis = list(title = "Start Date",
                      range = c(min(warns$sum_warns$Start - 1000),
                                max(warns$sum_warns$Start) + 1000)))

Seasonal summer weather event summary

seas_dat$sum_seas %>% 
  select(-`HeatDays`) %>%
  pivot_longer(c(-Station, -Year), names_to = "Var", values_to = "value") %>%
  plot_ly(x = ~Year, y = ~value, color = ~Var, linetype = ~Station) %>%
  add_lines() %>%
  layout(yaxis = list(title = "Temperature (°C) / Humidex"),
         title = "Summertime temperature and humidex")
warns_summary$summ_sum %>% 
  plot_ly(x = ~Year, y = ~n, symbol = ~Station,
          color = ~AvgMaxHumidex, size = ~AvgLen,
          hoverinfo = "text",
          hovertext = paste0(warns_summary$summ_sum$Year, ": ",
                             warns_summary$summ_sum$n,
                             " event(s) with mean duration of ",
                             round(warns_summary$summ_sum$AvgLen, 1),
                             " days")) %>% plotly::add_markers() %>%
  layout(title = "Seasonal summertime heat alert weather",
         yaxis = list(title = "Number of heat warnings"))

Once again, we can perform a junky linear regression.

Seasonal characteristics

seas_dat$sum_seas %>% 
  pivot_longer(c(-Station, -Year), names_to = "var", values_to = "value") %>% 
  group_by(Station, var) %>% 
  nest() %>% 
  mutate(model = map(data, ~fx(.))) %>% 
  select(-data) %>% 
  unnest(model) %>% 
  knitr::kable("markdown", digits = 3, escape = FALSE)
Station var min max mean slope rsq p
TORONTO INTL A MaxTemp 24.438 27.755 25.549 0.208 0.089 0.566
TORONTO INTL A MinTemp 17.568 19.825 18.611 0.262 0.241 0.323
TORONTO INTL A MeanTemp 20.137 22.739 21.181 0.138 0.092 0.508
TORONTO INTL A MaxHumidex 28.228 31.261 29.545 0.351 0.226 0.341
TORONTO INTL A HeatDays 1.000 20.000 7.714 0.464 0.021 0.756
TORONTO LESTER B. PEARSON INT’L A MaxTemp 22.128 27.671 24.955 0.012 0.028 0.197
TORONTO LESTER B. PEARSON INT’L A MinTemp 13.993 20.447 16.875 0.051 0.391 0.000
TORONTO LESTER B. PEARSON INT’L A MeanTemp 16.459 23.057 19.926 0.025 0.119 0.006
TORONTO LESTER B. PEARSON INT’L A MaxHumidex 25.174 32.783 28.919 0.023 0.071 0.039
TORONTO LESTER B. PEARSON INT’L A HeatDays 0.000 22.000 6.803 0.047 0.022 0.250

Characteristics of heat waves

warns_summary$summ_sum %>%
  filter(Year != year(Sys.Date())) %>%
  pivot_longer(c(-Station, -Year), names_to = "var", values_to = "value") %>%
  group_by(Station, var) %>% 
  nest() %>% 
  mutate(model = map(data, ~fx(.))) %>% 
  select(-data) %>% 
  unnest(model) %>% 
  knitr::kable("markdown", digits = 3, escape = FALSE)
Station var min max mean slope rsq p
TORONTO INTL A n 1.000 6.000 2.667 -0.057 0.004 0.906
TORONTO INTL A AvgLen 2.000 3.000 2.514 -0.130 0.426 0.160
TORONTO INTL A AvgTmax 23.353 25.433 24.043 -0.219 0.364 0.205
TORONTO INTL A AvgTmean 26.401 27.934 27.071 -0.066 0.077 0.595
TORONTO INTL A AvgMaxHumidex 37.600 43.000 39.856 0.250 0.054 0.659
TORONTO LESTER B. PEARSON INT’L A n 1.000 6.000 2.462 0.019 0.050 0.170
TORONTO LESTER B. PEARSON INT’L A AvgLen 2.000 4.333 2.703 -0.010 0.064 0.120
TORONTO LESTER B. PEARSON INT’L A AvgTmax 20.100 24.788 22.757 0.033 0.256 0.001
TORONTO LESTER B. PEARSON INT’L A AvgTmean 24.760 28.032 26.450 0.017 0.163 0.011
TORONTO LESTER B. PEARSON INT’L A AvgMaxHumidex 37.500 42.000 39.753 -0.004 0.005 0.683

Fin

The code included here is just the start of a method to perform an analysis of weather events across space and time. Future blog posts will look at other regions in Canada, and I will continue to iterate upon and improve this code.

Sheridan, Scott C. 2002. “The Redevelopment of a Weather-Type Classification Scheme for North America.” International Journal of Climatology 22 (1): 51–68. https://doi.org/10.1002/joc.709.

Žaknić-Ćatović, Ana, and William A. Gough. 2018. “A Comparison of Climatological Observing Windows and Their Impact on Detecting Daily Temperature Extrema.” Theoretical and Applied Climatology 132 (1-2): 41–54. https://doi.org/10.1007/S00704-017-2068-Y.

Related

comments powered by Disqus