Heat warning weather at YYZ

I have, somewhat by accident, become a bit of a cold weather expert for Toronto. My published research (Anderson and Gough 2017) and my blog explorations made me a natural choice to ask questions of when the weather got cold. So, when I got an email this week from UTSC communications to ask about heat in the city, I was a little taken aback. Was the sweltering weather that we saw at the start of July something that we can expect more of? I wasn’t sure, but as someone who is too frugal to pay for AC, this topic is of interest to me! So, I decided to take a look. Read on!

Definitions

To start this post, we need to set out some definitions. In Toronto, a heat warning is issued when the hot weather could pose a risk to people or pets. The city follows the Environment and Climate Change Canada (ECCC) thresholds. You can find the definitions for ECCC here, and for Toronto here. In short, heat warnings are issued by ECCC when two or more consecutive days have projected $T_{max}$ at or above 31°C and $T_{min}$ at or above 20°C, *or*, when the humidex is projected to reach or exceed 40 on two or more consecutive days. Toronto uses two classes of heat warnings: “normal” heat warnings when the conditions are forecast to last two days, or “extended” heat warnings, when the forecast conditions will be around for three or more days.

It is also important to define the humidex, a uniquely Canadian metric. According to the ECCC Meteorological Service of Canada glossary, the humidex is essentially a measure of how hot the weather “feels” when we factor in the humidity. The following equations summarize the humidex:

$$ \begin{align*} \mathrm{Humidex} &= T_{air} + h \\\
h &= 0.5555 \times (e - 10.0) \\\
e &= 6.11 \times \mathrm{exp}^{(5417.7530 \times (\frac{1}{273.16} - \frac{1}{T_{dew}} ))} \\\
\mathrm{exp} &= 2.71828 \end{align*} $$

In the above equations, $T_{dew}$ refers to the dew-point temperature, converted to Kelvin. While the glossary definition used 273.16 K as the conversion to 0°C, I assume that this is a typo, or an error. See this very detailed StackExchange Answer for some hints about where the number might have come from. I’ll use 273.15 K as my 0°C point (but it won’t make much of a difference). The $\mathrm{exp}$ in the above equation represents the Euler number, a special constant. The very large number is a rounded constant based on the properties of water and vapour. The humidex is expressed in integers, so we round the above to the nearest whole number.

We can wrap all this up into a nice R function

hdx <- function(tair, tdew) {
  tdew = tdew + 273.15
  e = 6.11 * exp(1)^(5417.7530  * ((1 / 273.16) - (1 / tdew)))
  h = 0.5555 * (e - 10)
  round(tair + h)
}

So, for a temperature of 30 °C and a dew-point temperature of 15 °C, we’d have a humidex of 34.

hdx(30, 15)

#> [1] 34

Obviously heat warnings are issued before a heat event, but in this case, my analysis is retrospective. As such, I’ll look at what I’ll call “heat warning weather", that is, past conditions that met the forecast conditions and would have merited a heat warning. This is what I’ll be referring to from now on. I can’t guarantee that heat warning were actually emitted for the events that I detect.

Preparing the analysis

I typically work with daily data, but daily data doesn’t include humidex values. We’ll have to use hourly data instead. First, I’ll load the libraries I’ll use throughout this post, and then I’ll do a search of stations with hourly data for the past ~30 years

Finding the station

find_station("Toronto", period = 1988:2018, type = "hourly")

#> # A tibble: 0 x 7
#> # … with 7 variables: Name <chr>, Province <chr>, StationID <dbl>,
#> #   LatitudeDD <dbl>, LongitudeDD <dbl>, HourlyFirstYr <dbl>,
#> #   HourlyLastYr <dbl>

Uhh ohh! There were no stations found that have 30 years of data. This is due to station code changes. I’ve accounted for this in the canadaHCDx search function with a special argument. Let’s search again with the “recodes” parameter set to TRUE.

find_station("Toronto", period = 1988:2018, type = "hourly", recodes = TRUE)

#> Note: In addition to the stations found, the following combinations may provide sufficient period data.
#> 
#> >> Combination 1 at coordinates 43.6 -79.4 
#> 
#> 5085  : TORONTO ISLAND A    (1957‒2006)
#> 5086  : TORONTO IS A (AUT)  (1994‒1994)
#> 30247 : TORONTO CITY CENTRE (2006‒2010)
#> 48549 : TORONTO CITY CENTRE (2009‒2020)
#> 
#> >> Combination 2 at coordinates 43.7 -79.6 
#> 
#> 5097  : TORONTO LESTER B. PEARSON INT'L A (1953‒2013)
#> 51459 : TORONTO INTL A                    (2013‒2020)
#> 
#> >> Combination 3 at coordinates 43.9 -79.4 
#> 
#> 4841  : TORONTO BUTTONVILLE A (1986‒2015)
#> 53678 : TORONTO BUTTONVILLE A (2015‒2019)
#> 54239 : TORONTO BUTTONVILLE A (2016‒2020)

#> # A tibble: 0 x 7
#> # … with 7 variables: Name <chr>, Province <chr>, StationID <dbl>,
#> #   LatitudeDD <dbl>, LongitudeDD <dbl>, HourlyFirstYr <dbl>,
#> #   HourlyLastYr <dbl>

It looks like we have three options. I’ll use the Pearson airport station for this analysis. First, I should check the years of available data.

find_station(name = c("TORONTO LESTER B. PEARSON INT'L A", "TORONTO INTL A"), type = "hourly") %>%
  select(-Province)

#> # A tibble: 2 x 6
#>   Name               StationID LatitudeDD LongitudeDD HourlyFirstYr HourlyLastYr
#>   <chr>                  <dbl>      <dbl>       <dbl>         <dbl>        <dbl>
#> 1 TORONTO INTL A         51459       43.7       -79.6          2013         2020
#> 2 TORONTO LESTER B.…      5097       43.7       -79.6          1953         2013

It would seem that in 2013, The “Toronto Lester B. Pearson Int’l A” station was renamed to “Toronto Intl A” and recoded. Let’s pull in data from these two stations for the summer months: June, July, and August (JJA).

tpe <- hcd_hourly(station = 5097, year = 1988:2013, month = 6:8)
tpe2 <- hcd_hourly(station = 51459, year = 2013:2018, month = 6:8)

Let’s see where we can merge these data. An important first step is to set the time zone of these data, which doesn’t happen by default in canadaHCD. The DateTime column of my data is set to UTC time. If I change the timezone, R automatically converts the time to the new timezone, i.e. midnight becomes 8 pm. I don’t want this. I could convert the time and then add four hours, but the force_tz() function in the lubridate package makes short work of this. Note that the ECCC data is always in the local standard time, even when we’re observing daylight savings time. (Thanks, Andrew for the reminder!)

tpe$DateTime <- force_tz(tpe$DateTime, tzone = "Etc/GMT-5")
tpe2$DateTime <- force_tz(tpe2$DateTime, tzone = "Etc/GMT-5")

Now to see about completeness. I’ll search for rows with less than five missing values. My use of five here is arbitrary. There are usually a couple of missing values in a row, such as WindChill, which doesn’t apply to the summer data, so my threshold needs to be above 2. Any day that is completely missing would have 10 missing values.

tpe[rowSums(is.na(tpe)) < 5,] %>% arrange(DateTime) %>% tail()

#> # A tibble: 6 x 12
#>   Station DateTime             Temp DewPointTemp RelHumidity WindDir WindSpeed
#>     <int> <dttm>              <dbl>        <dbl>       <int>   <int>     <int>
#> 1    5097 2013-06-13 08:00:00  15.6         15.2          97       1        17
#> 2    5097 2013-06-13 09:00:00  15.9         14.8          93       1        24
#> 3    5097 2013-06-13 10:00:00  17.4         15.3          87      36        22
#> 4    5097 2013-06-13 11:00:00  17.9         15            83       1        20
#> 5    5097 2013-06-13 12:00:00  20.9         15.6          72       4        19
#> 6    5097 2013-06-13 13:00:00  21.8         14.5          63       3        15
#> # … with 5 more variables: Visibility <dbl>, Pressure <dbl>, Humidex <int>,
#> #   WindChill <int>, Weather <chr>

Station 5097 collected data until June 13, 2013 at 1 pm.

tpe2[rowSums(is.na(tpe2)) < 6,] %>% arrange(DateTime) %>% head()

#> # A tibble: 6 x 12
#>   Station DateTime             Temp DewPointTemp RelHumidity WindDir WindSpeed
#>     <int> <dttm>              <dbl>        <dbl>       <int>   <int>     <int>
#> 1   51459 2013-06-11 09:00:00  17.8         16            89      34        21
#> 2   51459 2013-06-11 10:00:00  17.4         15.9          91      34        20
#> 3   51459 2013-06-11 11:00:00  19.5         16.2          81      33        22
#> 4   51459 2013-06-11 12:00:00  19.6         15.3          76      32        28
#> 5   51459 2013-06-11 13:00:00  21           14.7          67      33        23
#> 6   51459 2013-06-11 14:00:00  21.4         14.9          66      33        25
#> # … with 5 more variables: Visibility <dbl>, Pressure <dbl>, Humidex <int>,
#> #   WindChill <int>, Weather <chr>

Station 51459 collected data starting on June 11, 2013 at 9:00 am. Since we have overlapping data, we can have a quick look to see how the stations compare. I’ll look at the Temp column for this comparison.

val <- tibble(datetime = seq(from = as.POSIXct("2013-06-11 09:00:00", tz = "Etc/GMT-5"),
                             to = as.POSIXct("2013-06-13 13:00:00", tz = "Etc/GMT-5"),
                             by = 3600),
             tpe = filter(tpe, DateTime %in% datetime) %>% select(Temp) %>% unlist(),
             tpe2 = filter(tpe2, DateTime %in% datetime) %>% select(Temp) %>% unlist())
val %>% gather(key = "station", value = "Temp", -datetime) %>% 
  plot_ly(x = ~datetime, y = ~Temp, color = ~station) %>% add_markers()

summary(abs(val$tpe - val$tpe2))

#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   0.000   0.100   0.200   0.208   0.300   0.700

There seems to be a very minor cold bias at station 51459 overnight between June 12 and 13, but not enough to perform any significant bias correction. There is an average difference between points of just over two tenths of a degree Celsius.

What about the dew-point temperature?

val <- tibble(datetime = seq(from = as.POSIXct("2013-06-11 09:00:00", tz = "Etc/GMT-5"),
                             to = as.POSIXct("2013-06-13 13:00:00", tz = "Etc/GMT-5"),
                             by = 3600),
             tpe = filter(tpe, DateTime %in% datetime) %>% select(DewPointTemp) %>% unlist(),
             tpe2 = filter(tpe2, DateTime %in% datetime) %>% select(DewPointTemp) %>% unlist())
summary(abs(val$tpe - val$tpe2))

#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.10    0.80    0.90    1.04    1.20    3.10

There is actually a larger deviation between the DewPointTemp variable at the two stations, with differences of up to 3.1°C and an average of 1.0°C. This could influence the humidex results by a few points, but it is hard to predict, since the metric depends on the magnitude of the dew point temperature. I will leave the data as is. After all, I don’t want to end up fudging the numbers the wrong way.

For convenience, I’ll join the two station data sets between June 10 and 11.

tpe <- bind_rows(filter(tpe, DateTime < as.POSIXct("2013-06-12 00:00:00", tz = "Etc/GMT-5")),
                 filter(tpe2, DateTime >= as.POSIXct("2013-06-12 00:00:00", tz = "Etc/GMT-5"))) %>%
  arrange(DateTime)

As of the time of writing, I have fairly complete data up to July 20, 2018.

Calculating the humidex

Now for our humidex values.

Environment Canada seems to have calculated the humidex … sometimes. I can ensure that my equation is correct by performing a quick validation on their values.

val <- filter_at(tpe, vars(Temp, DewPointTemp, Humidex), all_vars(!is.na(.))) %>%
  mutate(hdx = hdx(Temp, DewPointTemp))
sum(abs(val$Humidex - val$hdx))

#> [1] 0

Good, it seems that there is no difference between my calculations and theirs. Let’s fill in the missing values.

tpe <- mutate(tpe, Humidex = hdx(Temp, DewPointTemp))

Aggregating to daily

To generate daily data, Environment Canada uses 6 am UTC as the “start” of a new meteorological day. This is standard across the country. For this analysis, that means that the overnight temperatures after midnight but before 1 am tomorrow morning will still officially be “tonight”. This method isn’t necessarily optimal. We can take a look at an examples.

Let’s first take a look at the late evening on June 28 and the early morning hours of June 29, 2018.

tpe %>% filter(DateTime >= as.POSIXct("2018-06-28 22:00:00", tz = "Etc/GMT-5") &
                 DateTime <= as.POSIXct("2018-06-29 08:00:00", tz = "Etc/GMT-5"))

#> # A tibble: 11 x 12
#>    Station DateTime             Temp DewPointTemp RelHumidity WindDir WindSpeed
#>      <int> <dttm>              <dbl>        <dbl>       <int>   <int>     <int>
#>  1   51459 2018-06-28 22:00:00  22.9         16.7          68      28        16
#>  2   51459 2018-06-28 23:00:00  22.6         17.2          71      27        16
#>  3   51459 2018-06-29 00:00:00  21.2         17.2          78      27        14
#>  4   51459 2018-06-29 01:00:00  19.9         16.8          82      25        10
#>  5   51459 2018-06-29 02:00:00  19.9         17.3          85      29        11
#>  6   51459 2018-06-29 03:00:00  20.1         17.8          86      29        14
#>  7   51459 2018-06-29 04:00:00  20.3         18            86      30        17
#>  8   51459 2018-06-29 05:00:00  19.4         16.5          83      36         2
#>  9   51459 2018-06-29 06:00:00  21.1         17.2          78      30         8
#> 10   51459 2018-06-29 07:00:00  23.5         17.5          69      25         5
#> 11   51459 2018-06-29 08:00:00  25.9         16.8          57      27         9
#> # … with 5 more variables: Visibility <dbl>, Pressure <dbl>, Humidex <dbl>,
#> #   WindChill <int>, Weather <chr>

As we can see, above, the temperatures overnight between June 28 and June 29 fell to as low as 19.4°C at 5 am. That means that we didn’t see weather that merited a heat alert.

Now on the other side, late evening on June 29 and the early morning on June 30, 2018.

tpe %>% filter(DateTime >= as.POSIXct("2018-06-29 22:00:00", tz = "Etc/GMT-5") &
                 DateTime <= as.POSIXct("2018-06-30 08:00:00", tz = "Etc/GMT-5"))

#> # A tibble: 11 x 12
#>    Station DateTime             Temp DewPointTemp RelHumidity WindDir WindSpeed
#>      <int> <dttm>              <dbl>        <dbl>       <int>   <int>     <int>
#>  1   51459 2018-06-29 22:00:00  26.1         18.1          60      21        12
#>  2   51459 2018-06-29 23:00:00  25.4         18.4          65      19        10
#>  3   51459 2018-06-30 00:00:00  24.5         18.6          69      19         7
#>  4   51459 2018-06-30 01:00:00  24           18.2          70      22        17
#>  5   51459 2018-06-30 02:00:00  23.7         18.1          70      20        15
#>  6   51459 2018-06-30 03:00:00  22.4         17.8          75      25        11
#>  7   51459 2018-06-30 04:00:00  22.2         18.5          79      24         8
#>  8   51459 2018-06-30 05:00:00  23.2         18.8          76      18         8
#>  9   51459 2018-06-30 06:00:00  23.6         19.3          77      22        13
#> 10   51459 2018-06-30 07:00:00  24.1         19.6          76      13         9
#> 11   51459 2018-06-30 08:00:00  25.5         20.8          75      19        10
#> # … with 5 more variables: Visibility <dbl>, Pressure <dbl>, Humidex <dbl>,
#> #   WindChill <int>, Weather <chr>

Temperatures overnight from June 29 into June 30 did not fall below 22.2°C. Thus, given that it got as hot as 31°C on June 29, and the temperature didn’t fall below 20°C, that day should be classed as heat warning weather. If we choose 1 am as the cutoff time, however, the cooler period overnight from June 28 to June 29 means that June 29 did not meet the heat warning criteria.

When a heat warning is issued, it is issued for forecast temperatures. ECCC doesn’t say, “It is going to be hot tonight but since it was cool at 1 am last night, you have nothing to worry about”. We, likewise, live chronologically. What do I care that it was cool in the wee hours of the previous night if I am suffering now? This is where the so-called “climatological observing window” (COW) becomes important. See this great article on the subject by Žaknić-Ćatović and Gough ( 2018).

Given that I am trying to detect days when the temperatures got hot, and didn’t drop below 20°C overnight, I’m going to use the sun-based COWND proposed by Žaknić-Ćatović and Gough ( 2018). I wrote a function for this in the claut package (See: solar_daily()). COWND defines each climatological day as one hour past sunrise until one hour past sunrise the following calendar day. This way, we can avoid double-counting late-evening cooling on both sides of 1 am, and thus, avoid having a day shaved off a heat event due to temperatures from the night before. I’ll create a data set of the maximum daily temperature ($T_{max}$), minimum daily temperature ($T_{min}$), average daily temperature ($T_{mean}$), and the peak daily humidex value. I’ll also add a Boolean column to flag days that met the criteria for a heat warning.

tpe_dly <- solar_daily(dat = select(tpe, DateTime, Temp, Humidex), lat = 43.68, lon = -79.63, tz = -5) %>%
  select(-MinHumidex, -MeanHumidex) %>%
  mutate(MeanTemp = round(MeanTemp, 1),
         Criteria = (MaxTemp >= 31 & MinTemp >= 20) | MaxHumidex >= 40) %>%
  filter(Date >= "1988-06-01" & Date <= "2018-07-19")

Let’s take a look at the weather since late June, for instance.

tpe_dly %>% filter(Date >= "2018-06-28")

#> # A tibble: 22 x 6
#>    Date       MaxHumidex MaxTemp MeanTemp MinTemp Criteria
#>    <date>          <dbl>   <dbl>    <dbl>   <dbl> <lgl>   
#>  1 2018-06-28         34    29.4     24      19.4 FALSE   
#>  2 2018-06-29         37    31       27      22.2 TRUE    
#>  3 2018-06-30         46    34.8     28.9    24   TRUE    
#>  4 2018-07-01         43    34.1     29.4    24.5 TRUE    
#>  5 2018-07-02         43    33.1     27      19.3 TRUE    
#>  6 2018-07-03         34    30.2     25.7    19.7 FALSE   
#>  7 2018-07-04         39    32.7     27.1    23.4 TRUE    
#>  8 2018-07-05         43    33.2     25.8    17.3 TRUE    
#>  9 2018-07-06         24    22.8     NA      NA   FALSE   
#> 10 2018-07-07         26    25.3     21.4    15.8 FALSE   
#> # … with 12 more rows

To begin, let’s take a look at how these four variables have changed in the last 30 years. I’ll use a simple linear regression here, to test the relationship between the aggregate annual metrics and year. Normally there are a few assumptions to check, but for a quick analysis, I’ll omit these.

annual_stats <- tpe_dly %>%
  group_by(Year = year(Date)) %>%
  summarize(MaxTemp = mean(MaxTemp, na.rm = TRUE),
            MinTemp = mean(MinTemp, na.rm = TRUE),
            MeanTemp = mean(MeanTemp, na.rm = TRUE),
            MaxHumidex = mean(MaxHumidex, na.rm = TRUE),
            HeatDays = sum(Criteria, na.rm = TRUE))

#> `summarise()` ungrouping output (override with `.groups` argument)


annual_stats %>% select(-`HeatDays`) %>% gather(key = Var, value = value, -Year) %>%
  plot_ly(x = ~Year, y = ~value, color = ~Var) %>% add_lines() %>%
  layout(yaxis = list(title = "Temperature (°C) / Humidex"),
         title = "Summer temperature and humidex in Toronto, 1988-2018 (YTD)")

dat <- annual_stats %>% filter(Year != 2018)
splits <- dat %>%
  gather(key = var, value = value, -Year) %>%
  split(., list(.$var))

splits %>%
  map_df(function(x) {
    summ <- summary(lm(value~Year, data = x))
    tibble(slope = coef(summ)[2], rsq = summ$r.squared, p = coef(summ)[8])
    }) %>%
  add_column(name = names(splits), .before = 1)

#> # A tibble: 5 x 4
#>   name        slope     rsq       p
#>   <chr>       <dbl>   <dbl>   <dbl>
#> 1 HeatDays   0.0549 0.00726 0.654  
#> 2 MaxHumidex 0.0361 0.0429  0.272  
#> 3 MaxTemp    0.0213 0.0218  0.436  
#> 4 MeanTemp   0.0428 0.110   0.0734 
#> 5 MinTemp    0.0738 0.310   0.00141

The linear regression results do not indicate that daytime high temperatures are increasing, nor are we seeing more days, year-over-year that meet the criteria for a heat warning. I did, however, detect a significant (at the 0.01 level) warming trend in the summer evening temperatures, and a respective increase (significant at the 0.1 level) in the mean temperature, which is probably capturing the signal in $T_{min}$. This is an interesting find, and is consistent with the changes in climate that are associated with urbanization. Which by the way, I *just* published on! See Anderson, Gough, and Mohsin ( 2018). This is consistent with the assertions made by Mohsin and Gough ( 2012), however I recall having heard that it was the 1970s that saw the fastest urbanization near Pearson. Certainly, over the last 20 years or so the airport has remained largely “developed”. There may be a little more to this.

Analyzing heat events

I established a methodology of analyzing short, non-uniform periods when I looked at cold snaps back in January. My methodology hasn’t changed much since the winter, but I have wrapped it up in a cleaner function and shipped it in the claut package. See: describe_snaps() for the code. So, let’s take a look at some summertime heat warnings.

tpe_warns <- tpe_dly %>% 
  group_by(Year = year(Date)) %>%
  mutate(Start = as.Date(describe_snaps(rle(Criteria), 2, Date, min), origin = origin),
         End = as.Date(describe_snaps(rle(Criteria), 2, Date, max), origin = origin),
         Length = describe_snaps(rle(Criteria), 2)) %>%
  filter(!is.na(Length)) %>%
  group_by(Year, Start, End, Length) %>%
  summarize_all(mean) %>%
  select(-Criteria, -Date)

tpe_warns

#> # A tibble: 68 x 8
#> # Groups:   Year, Start, End [68]
#>     Year Start      End        Length MaxHumidex MaxTemp MeanTemp MinTemp
#>    <dbl> <date>     <date>      <int>      <dbl>   <dbl>    <dbl>   <dbl>
#>  1  1988 1988-06-19 1988-06-20      2       37      32.7     25.5    18.1
#>  2  1988 1988-07-07 1988-08-19      4       39.8    36.0     28.8    21.1
#>  3  1988 1988-08-02 1988-08-05      4       42.5    32.5     27.6    23.6
#>  4  1988 1988-08-12 1988-08-14      3       42      33.2     27.7    23.3
#>  5  1991 1991-06-27 1991-06-28      2       38.5    32.8     27.6    22.0
#>  6  1991 1991-07-17 1991-07-20      4       39.2    32.9     27.2    21.6
#>  7  1993 1993-07-08 1993-07-09      2       41      31.8     26.2    21.2
#>  8  1993 1993-08-26 1993-08-27      2       40      32.5     26.4    20.8
#>  9  1994 1994-06-16 1994-06-18      3       41      33.5     27.5    21  
#> 10  1994 1994-07-05 1994-08-11      2       41      31.5     25.5    20.8
#> # … with 58 more rows

Since June 1988, there have been 68 weather events that merited a heat warning in Toronto. Of those, 27 should have been “extended warnings”. That leaves 41 events that brought exactly two-days of extra hot weather. The average length of heat wearning weather is 2.63 days. In the plot below, I show these heat warnings. The y-axis represents the average daily maximum humidex. The colour represents the average daily maximum temperature. The size of the points represents the length of the heat warning weather.

plot_ly(data = tpe_warns, x = ~Start, y = ~MaxHumidex, color = ~MaxTemp, size = ~Length,
        hoverinfo = "text", hovertext = paste("Length:", tpe_warns$Length,
                                              "days")) %>% add_markers() %>%
  layout(
    title = "Summertime heat warning weather in Toronto, 1988-2018 (YTD)",
    yaxis = list(title = "Average Daily Maximum Humidex"),
    xaxis = list(title = "Start Date",
                      range = c(min(tpe_warns$Start - 1000), max(tpe_warns$Start) + 1000)))

#> Warning: `line.width` does not currently support multiple values.

Heat warning rankings

Let’s look at some top five lists.

Longest heat events

tpe_warns %>% arrange(desc(Length)) %>% head(5)

#> # A tibble: 5 x 8
#> # Groups:   Year, Start, End [5]
#>    Year Start      End        Length MaxHumidex MaxTemp MeanTemp MinTemp
#>   <dbl> <date>     <date>      <int>      <dbl>   <dbl>    <dbl>   <dbl>
#> 1  1999 1999-07-22 1999-07-26      5       36.8    32.4     26.1    21.2
#> 2  2001 2001-08-05 2001-08-09      5       41      35.1     NA      NA  
#> 3  2005 2005-07-10 2005-07-14      5       39.2    33.5     28.2    22.9
#> 4  2010 2010-07-04 2010-07-08      5       41.6    32.7     28.0    23.4
#> 5  2016 2016-08-09 2016-08-13      5       39.6    32.6     27.3    22.9

Hottest heat events

tpe_warns %>% arrange(desc(MaxTemp)) %>% head(5)

#> # A tibble: 5 x 8
#> # Groups:   Year, Start, End [5]
#>    Year Start      End        Length MaxHumidex MaxTemp MeanTemp MinTemp
#>   <dbl> <date>     <date>      <int>      <dbl>   <dbl>    <dbl>   <dbl>
#> 1  2011 2011-07-20 2011-07-21      2       44.5    36.4     30.6    24.5
#> 2  1988 1988-07-07 1988-08-19      4       39.8    36.0     28.8    21.1
#> 3  2001 2001-08-05 2001-08-09      5       41      35.1     NA      NA  
#> 4  2011 2011-07-17 2011-07-18      2       41.5    34.6     28.4    23.6
#> 5  1995 1995-07-13 1995-08-29      3       44.3    34.4     27.7    21.7

Least relief

tpe_warns %>% arrange(desc(MinTemp)) %>% head(5)

#> # A tibble: 5 x 8
#> # Groups:   Year, Start, End [5]
#>    Year Start      End        Length MaxHumidex MaxTemp MeanTemp MinTemp
#>   <dbl> <date>     <date>      <int>      <dbl>   <dbl>    <dbl>   <dbl>
#> 1  2006 2006-07-31 2006-08-02      3       45.7    34.4     29.1    25.0
#> 2  2011 2011-07-20 2011-07-21      2       44.5    36.4     30.6    24.5
#> 3  2002 2002-07-31 2002-08-01      2       40      33.3     28.2    24.3
#> 4  2010 2010-08-30 2010-08-31      2       40.5    33.6     28.6    23.9
#> 5  2013 2013-07-16 2013-07-19      4       42.8    33.5     28.6    23.9

Highest Humidex

tpe_warns %>% arrange(desc(MaxHumidex)) %>% head(5)

#> # A tibble: 5 x 8
#> # Groups:   Year, Start, End [5]
#>    Year Start      End        Length MaxHumidex MaxTemp MeanTemp MinTemp
#>   <dbl> <date>     <date>      <int>      <dbl>   <dbl>    <dbl>   <dbl>
#> 1  2006 2006-07-31 2006-08-02      3       45.7    34.4     29.1    25.0
#> 2  2011 2011-07-20 2011-07-21      2       44.5    36.4     30.6    24.5
#> 3  1995 1995-07-13 1995-08-29      3       44.3    34.4     27.7    21.7
#> 4  2002 2002-07-28 2002-07-29      2       43.5    32.1     26.0    21.6
#> 5  1999 1999-07-04 1999-08-11      3       43.3    33.6     28.6    23

We can also summarize these events by year. For the temperature and humidex, I’ll weight the averages by the heat alert length.

summ <- left_join(tpe_warns %>% group_by(Year) %>% count(),
                  tpe_warns %>% group_by(Year) %>%
                    summarize(AvgLen = mean(Length),
                              AvgTmax = weighted.mean(MaxTemp, Length),
                              AvgTmax = weighted.mean(MinTemp, Length),
                              AvgTmean = weighted.mean(MeanTemp, Length),
                              AvgMaxHumidex = weighted.mean(MaxHumidex, Length)),
                  by = "Year")

#> `summarise()` ungrouping output (override with `.groups` argument)


print(summ, n = Inf)

#> # A tibble: 25 x 6
#> # Groups:   Year [25]
#>     Year     n AvgLen AvgTmax AvgTmean AvgMaxHumidex
#>    <dbl> <int>  <dbl>   <dbl>    <dbl>         <dbl>
#>  1  1988     4   3.25    21.9     27.7          40.7
#>  2  1991     2   3       21.8     27.3          39  
#>  3  1993     2   2       21.0     26.3          40.5
#>  4  1994     2   2.5     20.9     26.7          41  
#>  5  1995     3   2.67    21.4     26.8          40.4
#>  6  1997     1   2       21.4     26.7          39  
#>  7  1998     3   2       21.3     26.1          38.3
#>  8  1999     4   3.25    21.8     27.0          39.6
#>  9  2001     3   3       NA       NA            39.9
#> 10  2002     6   2.5     22.7     27.6          39.9
#> 11  2003     3   2.67    20.7     26.5          38.5
#> 12  2004     1   2       19.3     25.0          39  
#> 13  2005     4   2.75    22.2     27.7          40.1
#> 14  2006     3   2.67    23.0     28.1          41.9
#> 15  2007     3   2.67    22.0     27.6          40  
#> 16  2008     1   2       19.4     24.9          39.5
#> 17  2009     1   2       20.4     23.2          40.5
#> 18  2010     3   3       23.1     27.8          41.3
#> 19  2011     4   2       23.0     27.9          40.6
#> 20  2012     2   3       22.2     27.4          40.5
#> 21  2013     2   3       23.3     28            41.8
#> 22  2015     2   2       20.3     26.4          38.5
#> 23  2016     5   2.8     21.5     27.0          38.5
#> 24  2017     1   2       21.6     27.2          38.5
#> 25  2018     3   2.67    21.4     27.2          41.5

According to this summary, we have already seen heat warning weather three times in 2018, which is above the average of 2.72. If this weather keeps up, we might beat the record of 6 heat events in 2002. Let’s take a look on a plot. I have added an extra variable, n, so I’ll have to discard one of my climate variables. In the below plot, the y-axis represents the number of heat waves per year, the size of the point is the average length, and the colour is the average daily maximum humidex during heat warnings.

plot_ly(data = summ, x = ~Year, y = ~n, color = ~AvgMaxHumidex, size = ~AvgLen,
        hoverinfo = "text", hovertext = paste(summ$n,
                                              "event(s) with mean duration of",
                                              round(summ$AvgLen, 1),
                                              "days")) %>% add_markers() %>%
  layout(
    title = "Annual summertime heat warning weather in Toronto, 1988-2018 (YTD)",
    yaxis = list(title = "Number of heat warnings"))

#> Warning: `line.width` does not currently support multiple values.

To perform a fair trend detection, I need to filter off 2018, since it is not a complete year. In this regression analysis, I’ll weight the climate metrics by the number of events, so that the years with more events are weighted more strongly.

dat <- summ %>% filter(Year != 2018)
splits <- dat %>%
  gather(key = var, value = value, -Year) %>%
  split(., list(.$var))

splits %>%
  map_df(function(x) {
    fit <- if (unlist(x[2,1]) != "n") {
      lm(value~Year, data = x, weights = dat$n)
    } else {
      lm(value~Year, data = x)
    }
    summ <- summary(fit)
    tibble(slope = coef(summ)[2], rsq = summ$r.squared, p = coef(summ)[8])
    }) %>%
  add_column(name = names(splits), .before = 1)

#> # A tibble: 5 x 4
#>   name             slope     rsq     p
#>   <chr>            <dbl>   <dbl> <dbl>
#> 1 AvgLen        -0.0114  0.0470  0.309
#> 2 AvgMaxHumidex -0.0137  0.0116  0.616
#> 3 AvgTmax        0.0173  0.0236  0.484
#> 4 AvgTmean       0.00768 0.00543 0.738
#> 5 n              0.00540 0.00106 0.880

The linear regression did not detect any significant trends in the aggregate summertime heat warning weather. There are, a few negative slopes in the chart, but none of them are even close to being statistically significant. The $R^2$ values are also very low, suggesting that a linear model does not explain the variance in these numbers very well. Indeed, examining the charts above, it seems that we have a lot of variation in the number, duration, and magnitude of heat-events from one year to the next. They are common, but there seems to be more than a simple year-to-year change. It is also interesting to note that, somewhat by chance, I started my analysis on a period of intense heat in much of North America. Indeed, in 1988, we saw the second hottest period of heat warning weather that we’ve seen in the past 30 years. In addition, we saw some very strong events towards the middle of the series, in particular the mid 2000s. In the interim between the strong events, we have a lot of rather insignificant years. Summer 2017, for instance, was cool and damp, and saw widespread flooding, right after 2016, which saw the second highest number of heat events in the past 30 years. I am interested to follow this topic into the future, as we certainly seem to be seeing something since around the year 2000, but the signal hasn’t come out through the noise in this particular data set.

What about 2018?

If we aren’t seeing increasing trends in the last 30 years, and the warm events so far this year have been relatively short, what was all the fuss about this month? Was it just a case of our memory biasing more recent events? Do we Canadians just love to complain about the weather.1 Not exactly. I think we found the heatwave in July to be particularly unbearable because of the lack of relief. The first ECCC weather alert (which someone has, thankfully, archived here) came out on June 26, forecasting extreme weather from June 28 or June 29 through to July 2. Their forecast was spot on. On July 3, another alert was issued for extreme weather through to Thursday July 5. ECCC called it “the most significant heat event in the past few years”. Indeed, two of the three heat events so far this year were June 29 to July 2, and July 4 to July 5. July 3 saw temperatures above 30 degrees, but missed the $T_{max}$ threshold by 0.8°C and the $T_{min}$ threshold by just three tenths of a degree Celsius. We can tweak our criteria a little. Let’s say, “if the two days before were super hot, and the two days after were super hot, then it was damn hot!”

tpe_dly <- tpe_dly %>%
  mutate(Criteria = Criteria | 
           (lead(Criteria, 2) & lead(Criteria) & lag(Criteria) & lag(Criteria, 2)))

tpe_warns_lax <- tpe_dly %>% 
  group_by(Year = year(Date)) %>%
  mutate(Start = as.Date(describe_snaps(rle(Criteria), 2, Date, min),
                         origin = origin),
         End = as.Date(describe_snaps(rle(Criteria), 2, Date, max),
                         origin = origin),
         Length = describe_snaps(rle(Criteria), 2)) %>%
  filter(!is.na(Length)) %>%
  group_by(Year, Start, End, Length) %>%
  summarize_all(mean) %>%
  select(-Criteria, -Date)

Now, let’s take a look at the length of the top 3 events.

tpe_warns_lax %>% arrange(desc(Length)) %>% head(3)

#> # A tibble: 3 x 8
#> # Groups:   Year, Start, End [3]
#>    Year Start      End        Length MaxHumidex MaxTemp MeanTemp MinTemp
#>   <dbl> <date>     <date>      <int>      <dbl>   <dbl>    <dbl>   <dbl>
#> 1  2018 2018-06-29 2018-07-05      7       40.7    32.7     27.3    21.5
#> 2  1999 1999-07-22 1999-07-26      5       36.8    32.4     26.1    21.2
#> 3  2001 2001-08-05 2001-08-09      5       41      35.1     NA      NA

There you have it! The Canada Day 2018 heat event was the longest in the past 30 years, if we leave room for one brief moment of “relief”.

Origin of air during heat events

So, where does this hot weather come from? I think this goes without saying, but it certainly isn’t polar air in the city when we’re in the middle of a heat wave. Indeed, ECCC’s heat warnings often contain language like “a passing warm front” or, like the June 26 warning linked to above, “hot, humid air from the Gulf of Mexico”. So, is it always warm, tropical air that causes us to feel the heat? Let’s look at the SSC2 air masses by Sheridan ( 2002) that I’ve used in my previous posts.

I will bin these values, as I have done in the past, but I will split the tropical air into dry and moist bins, which were less important for winter weather, but are useful here.

  1. ‘cool’: SSC codes 2 (dry polar) and 5 (moist polar)
  2. ‘mod’: SSC codes 1 (dry moderate) and 4 (moist moderate)
  3. ‘dt’: SSC code 3 (dry tropical)
  4. ‘mt’: SSC codes 6 (moist tropical), 66 (moist tropical plus), and 67 (moist tropical double plus)
  5. ‘trans’: SCC code 7 (transition)
f <- tempfile()
download.file("http://sheridan.geog.kent.edu/ssc/files/YYZ.dbdmt", f, quiet = TRUE)
air <- read_table(f, col_names = c("Station", "Date", "Air"), col_types = cols(col_character(), col_date(format = "%Y%m%d"), col_factor(levels = c(1, 2, 3, 4, 5, 6, 7, 66, 67))))
levels(air$Air) <- list(cool = c(2, 5), mod = c(1, 4), dt = 3, mt = c(6, 66, 67), trans = "7")

I will add these air mass codes to the daily data set, and determine the proportion of each air mass type during heat warning weather.

tpe_air <- left_join(tpe_dly, select(air, -Station), by = "Date") %>%
  filter(Criteria) %>%
  group_by(Air) %>%
  count() %>%
  mutate(prop = n/sum(tpe_dly$Criteria, na.rm = TRUE)) %>%
  select(-n) %>%
  ungroup %>%
  spread(Air, prop, fill = 0)

tpe_air

#> # A tibble: 1 x 5
#>      mod    dt    mt  trans  `<NA>`
#>    <dbl> <dbl> <dbl>  <dbl>   <dbl>
#> 1 0.0120 0.474 0.458 0.0478 0.00797

It’s true! When we experience super hot weather in the city, it is due to the presence of a warm front of hot, tropical air over the city. Of all the summer days where the criteria for a heat warning were met, none showed air of polar origin, and only 1.2% showed continental, “moderate” air. Indeed, a full 93.2% of days that merited a heat warning showed air of tropical origin (47.4% dry tropical and 45.8% moist tropical). That’s right. Toronto, at 43°N latitude, is affected by air from the tropics! No need to visit the Caribbean to get that clima caribeño!

Outro

So, that’s that. A brief exploration of the kind of weather that provokes heat warnings in Toronto. So remember, next time you are sweating buckets and wishing the heat would just go away, head over to Toronto Island and pretend you’re in some tropical paradise. If that doesn’t work, wait a minute. It’ll be winter in no time!


This post was compiled on 2020-10-09 12:01:39. Since that time, there may have been changes to the packages that were used in this post. If you can no longer use this code, please notify the author in the comments below.

Packages Used in this post
sessioninfo::package_info(dependencies = "Depends")

#>  package      * version    date       lib source                             
#>  assertthat     0.2.1      2019-03-21 [1] RSPM (R 4.0.0)                     
#>  canadaHCD    * 0.0-2      2020-10-01 [1] Github (ConorIA/canadaHCD@e2f4d2b) 
#>  canadaHCDx   * 0.0.8      2020-10-01 [1] gitlab (ConorIA/canadaHCDx@0f99419)
#>  claut        * 0.1.10     2020-10-08 [1] local                              
#>  cli            2.0.2      2020-02-28 [1] RSPM (R 4.0.0)                     
#>  codetools      0.2-16     2018-12-24 [3] CRAN (R 4.0.2)                     
#>  colorspace     1.4-1      2019-03-18 [1] RSPM (R 4.0.0)                     
#>  crayon         1.3.4      2017-09-16 [1] RSPM (R 4.0.0)                     
#>  crosstalk      1.1.0.1    2020-03-13 [1] RSPM (R 4.0.0)                     
#>  curl           4.3        2019-12-02 [1] RSPM (R 4.0.0)                     
#>  data.table     1.13.0     2020-07-24 [1] RSPM (R 4.0.2)                     
#>  digest         0.6.25     2020-02-23 [1] RSPM (R 4.0.0)                     
#>  doParallel     1.0.15     2019-08-02 [1] RSPM (R 4.0.0)                     
#>  downlit        0.2.0      2020-09-25 [1] RSPM (R 4.0.2)                     
#>  dplyr        * 1.0.2      2020-08-18 [1] RSPM (R 4.0.2)                     
#>  ellipsis       0.3.1      2020-05-15 [1] RSPM (R 4.0.0)                     
#>  evaluate       0.14       2019-05-28 [1] RSPM (R 4.0.0)                     
#>  fansi          0.4.1      2020-01-08 [1] RSPM (R 4.0.0)                     
#>  farver         2.0.3      2020-01-16 [1] RSPM (R 4.0.0)                     
#>  foreach        1.5.0      2020-03-30 [1] RSPM (R 4.0.0)                     
#>  fs             1.5.0      2020-07-31 [1] RSPM (R 4.0.2)                     
#>  gdata          2.18.0     2017-06-06 [1] RSPM (R 4.0.0)                     
#>  generics       0.0.2      2018-11-29 [1] RSPM (R 4.0.0)                     
#>  geosphere      1.5-10     2019-05-26 [1] RSPM (R 4.0.0)                     
#>  ggplot2      * 3.3.2      2020-06-19 [1] RSPM (R 4.0.1)                     
#>  glue           1.4.2      2020-08-27 [1] RSPM (R 4.0.2)                     
#>  gtable         0.3.0      2019-03-25 [1] RSPM (R 4.0.0)                     
#>  gtools         3.8.2      2020-03-31 [1] RSPM (R 4.0.0)                     
#>  hms            0.5.3      2020-01-08 [1] RSPM (R 4.0.0)                     
#>  htmltools      0.5.0      2020-06-16 [1] RSPM (R 4.0.1)                     
#>  htmlwidgets    1.5.1      2019-10-08 [1] RSPM (R 4.0.0)                     
#>  httr           1.4.2      2020-07-20 [1] RSPM (R 4.0.2)                     
#>  hugodown       0.0.0.9000 2020-10-08 [1] Github (r-lib/hugodown@18911fc)    
#>  iterators      1.0.12     2019-07-26 [1] RSPM (R 4.0.0)                     
#>  jsonlite       1.7.1      2020-09-07 [1] RSPM (R 4.0.2)                     
#>  knitr          1.30       2020-09-22 [1] RSPM (R 4.0.2)                     
#>  lattice        0.20-41    2020-04-02 [1] RSPM (R 4.0.0)                     
#>  lazyeval       0.2.2      2019-03-15 [1] RSPM (R 4.0.0)                     
#>  lifecycle      0.2.0      2020-03-06 [1] RSPM (R 4.0.0)                     
#>  lpSolve        5.6.15     2020-01-24 [1] RSPM (R 4.0.0)                     
#>  lubridate    * 1.7.9      2020-06-08 [1] RSPM (R 4.0.2)                     
#>  magrittr       1.5        2014-11-22 [1] RSPM (R 4.0.0)                     
#>  munsell        0.5.0      2018-06-12 [1] RSPM (R 4.0.0)                     
#>  pillar         1.4.6      2020-07-10 [1] RSPM (R 4.0.2)                     
#>  pkgconfig      2.0.3      2019-09-22 [1] RSPM (R 4.0.0)                     
#>  plotly       * 4.9.2.1    2020-04-04 [1] RSPM (R 4.0.0)                     
#>  plyr           1.8.6      2020-03-03 [1] RSPM (R 4.0.2)                     
#>  purrr        * 0.3.4      2020-04-17 [1] RSPM (R 4.0.0)                     
#>  R6             2.4.1      2019-11-12 [1] RSPM (R 4.0.0)                     
#>  rappdirs       0.3.1      2016-03-28 [1] RSPM (R 4.0.0)                     
#>  RColorBrewer   1.1-2      2014-12-07 [1] RSPM (R 4.0.0)                     
#>  Rcpp           1.0.5      2020-07-06 [1] RSPM (R 4.0.2)                     
#>  readr          1.3.1      2018-12-21 [1] RSPM (R 4.0.2)                     
#>  reshape2       1.4.4      2020-04-09 [1] RSPM (R 4.0.2)                     
#>  rlang          0.4.7      2020-07-09 [1] RSPM (R 4.0.2)                     
#>  rmarkdown      2.3        2020-06-18 [1] RSPM (R 4.0.1)                     
#>  scales         1.1.1      2020-05-11 [1] RSPM (R 4.0.0)                     
#>  sessioninfo    1.1.1      2018-11-05 [1] RSPM (R 4.0.0)                     
#>  sp             1.4-2      2020-05-20 [1] RSPM (R 4.0.0)                     
#>  storr          1.2.1      2018-10-18 [1] RSPM (R 4.0.0)                     
#>  stringi        1.5.3      2020-09-09 [1] RSPM (R 4.0.2)                     
#>  stringr        1.4.0      2019-02-10 [1] RSPM (R 4.0.0)                     
#>  tibble       * 3.0.3      2020-07-10 [1] RSPM (R 4.0.2)                     
#>  tidyr        * 1.1.2      2020-08-27 [1] RSPM (R 4.0.2)                     
#>  tidyselect     1.1.0      2020-05-11 [1] RSPM (R 4.0.0)                     
#>  utf8           1.1.4      2018-05-24 [1] RSPM (R 4.0.0)                     
#>  vctrs          0.3.4      2020-08-29 [1] RSPM (R 4.0.2)                     
#>  viridisLite    0.3.0      2018-02-01 [1] RSPM (R 4.0.0)                     
#>  withr          2.3.0      2020-09-22 [1] RSPM (R 4.0.2)                     
#>  xfun           0.18       2020-09-29 [2] RSPM (R 4.0.2)                     
#>  yaml           2.2.1      2020-02-01 [1] RSPM (R 4.0.0)                     
#>  zoo            1.8-8      2020-05-02 [1] RSPM (R 4.0.0)                     
#> 
#> [1] /home/conor/Library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/local/lib/R/library

References

Anderson, Conor I., and William A. Gough. 2017. “Evolution of Winter Temperature in Toronto, Ontario, Canada: A Case Study of Winters 2013/14 and 2014/15.” Journal of Climate 30 (14): 5361–76. https://doi.org/10.1175/JCLI-D-16-0562.1.

Anderson, Conor I., William A. Gough, and Tanzina Mohsin. 2018. “Characterization of the Urban Heat Island at Toronto: Revisiting the Choice of Rural Sites Using a Measure of Day-to-Day Variation.” Urban Climate 25 (September): 187–95. https://doi.org/10.1016/J.UCLIM.2018.07.002.

Mohsin, Tanzina, and William A. Gough. 2012. “Characterization and Estimation of Urban Heat Island at Toronto: Impact of the Choice of Rural Sites.” Theoretical and Applied Climatology 108 (April): 105–17. https://doi.org/10.1007/s00704-011-0516-7.

Sheridan, Scott C. 2002. “The Redevelopment of a Weather-Type Classification Scheme for North America.” International Journal of Climatology 22 (1): 51–68. https://doi.org/10.1002/joc.709.

Žaknić-Ćatović, Ana, and William A. Gough. 2018. “A Comparison of Climatological Observing Windows and Their Impact on Detecting Daily Temperature Extrema.” Theoretical and Applied Climatology 132 (1-2): 41–54. https://doi.org/10.1007/S00704-017-2068-Y.


  1. Yes! Of course! ↩︎

Avatar
Conor I. Anderson, PhD
Alumnus, Climate Lab

Conor is a recent PhD graduate from the Department of Physical and Environmental Sciences at the University of Toronto Scarborough (UTSC).

Related