Prototyping a large-scale analysis of "weather events" in Canada
In past blog posts I have looked at weather “events” of or “blocks” of similar “extreme” weather (specifically cold snaps and heat waves), and how these weather events are described by the SSC2 (Sheridan 2002) weather types. My posts have focused on Toronto, but my research interest is broader. So I have developed some code to help automate some of the analysis that I would perform at the station level. I originally developed some of this code as a Shiny app, but that app needs an update, so this post is as much about debugging the code as it is about writing it!
First, we load the libraries that we will need for this analysis.
library(canadaHCDx)
library(claut)
library(dplyr)
library(lubridate)
library(plotly)
library(purrr)
library(readr)
library(stringr)
library(tibble)
library(tidyr)
library(XML)
Getting information about the available stations
Let’s first grab the list of SSC2 stations. We want to look at both local weather conditions and the weather type, so I will use the SSC2 stations as my “data points”, and find ECCC data from the same, or nearby stations.
First I will download the webpage that shows the available stations, pull the table from the HTML using the XML package, and then filter so that my list only contains the stations in Canada.
f <- tempfile(fileext = ".html")
url <- "http://sheridan.geog.kent.edu/ssc/stations.html"
download.file(url, f, quiet = TRUE)
readHTMLTable(f,stringsAsFactors = FALSE)[[1]] %>%
as_tibble() %>%
filter(grepl("Canada", Station)) %>%
mutate(Station = sub("\\s*, Canada", "", Station)) -> stns
stns
#> # A tibble: 110 x 3
#> Station ID Years
#> <chr> <chr> <chr>
#> 1 Abbotsford, British Columbia YXX 1977-2019
#> 2 Alert, Nunavut YLT 1950-1963 1973-2018
#> 3 Bagotville, Quebec YBG 1951-2019
#> 4 Baker Lake, Nunavut YBK 1946-1947 1955-2019
#> 5 Big Trout Lake, Ontario YTL 1953-1991 1994-2019
#> 6 Brandon, Manitoba YBR 1977-2019
#> 7 Calgary, Alberta YYC 1953-2019
#> 8 Cambridge Bay, Nunavut YCB 1950-1959 1973-2004 2006-2019
#> 9 Cartwright, Newfoundland YRF 1955-1996 1998-2019
#> 10 Charlottetown, Prince Edward Island YYG 1953-2019
#> # … with 100 more rows
We want to know the years of available data, but the formatted list of ranges is not very easy to automate. Let’s get the minimum and maximum years as “start” and “end”.
str_extract_all(stns$Years, "[0-9]{4}") %>%
map_df(~tibble(Start = min(as.integer(.)),
End = max(as.integer(.)))) %>%
bind_cols(stns, .) -> stns
stns
#> # A tibble: 110 x 5
#> Station ID Years Start End
#> <chr> <chr> <chr> <int> <int>
#> 1 Abbotsford, British Columbia YXX 1977-2019 1977 2019
#> 2 Alert, Nunavut YLT 1950-1963 1973-2018 1950 2018
#> 3 Bagotville, Quebec YBG 1951-2019 1951 2019
#> 4 Baker Lake, Nunavut YBK 1946-1947 1955-2019 1946 2019
#> 5 Big Trout Lake, Ontario YTL 1953-1991 1994-2019 1953 2019
#> 6 Brandon, Manitoba YBR 1977-2019 1977 2019
#> 7 Calgary, Alberta YYC 1953-2019 1953 2019
#> 8 Cambridge Bay, Nunavut YCB 1950-1959 1973-2004 2006-… 1950 2019
#> 9 Cartwright, Newfoundland YRF 1955-1996 1998-2019 1955 2019
#> 10 Charlottetown, Prince Edward Is… YYG 1953-2019 1953 2019
#> # … with 100 more rows
The “Years” column is now almost redundant, but it still contains an extra point of interest that we have not captured in “Start” and “End”—the periods of missing data are apparent as the gaps between the last year of each range and the first year of the next range. Let’s use this logic to pull this information out, explicitly.
str_extract_all(stns$Years, "-[0-9]{4} [0-9]{4}-") %>%
map_df(~list(Gaps = paste(as.integer(str_sub(., 2, 5)) + 1L,
as.integer(str_sub(., 6, 10)) -1L,
sep = "-", collapse = ", "))) %>%
mutate(Gaps = str_replace_all(Gaps, "([0-9]{4})-\\1", "\\1")) %>%
bind_cols(select(stns, -Years), .) -> stns
stns
#> # A tibble: 110 x 5
#> Station ID Start End Gaps
#> <chr> <chr> <int> <int> <chr>
#> 1 Abbotsford, British Columbia YXX 1977 2019 ""
#> 2 Alert, Nunavut YLT 1950 2018 "1964-1972"
#> 3 Bagotville, Quebec YBG 1951 2019 ""
#> 4 Baker Lake, Nunavut YBK 1946 2019 "1948-1954"
#> 5 Big Trout Lake, Ontario YTL 1953 2019 "1992-1993"
#> 6 Brandon, Manitoba YBR 1977 2019 ""
#> 7 Calgary, Alberta YYC 1953 2019 ""
#> 8 Cambridge Bay, Nunavut YCB 1950 2019 "1960-1972"
#> 9 Cartwright, Newfoundland YRF 1955 2019 "1997"
#> 10 Charlottetown, Prince Edward Island YYG 1953 2019 ""
#> # … with 100 more rows
Now we have information on the temporal availability of data, but we are missing the spatial information. Let’s pull the information out of the Google Map at http://sheridan.geog.kent.edu/ssc.html
f <- tempfile(fileext = ".kml")
download.file("https://www.google.com/maps/d/kml?mid=1wHXUzwNjNvmnz2JW9XFr8AhqntWEaeFQ&lid=PlhtYW-UWmA&forcekml=1", f, quiet = TRUE)
coords <- rgdal::readOGR(f)
#> OGR data source with driver: LIBKML
#> Source: "/tmp/Rtmpa75RNS/file1763715ad1cc.kml", layer: "SSC stations 2018.csv"
#> with 808 features
#> It has 13 fields
Let’s see what we’re working with:
coords %>% head()
#> coordinates Name description timestamp begin end
#> 1 (3.22, 36.69, 0) 36.69 AAE: ALG<br>7.81: 3.22 <NA> <NA> <NA>
#> 2 (6.62, 36.28, 0) 36.28 AAE: CZL<br>7.81: 6.62 <NA> <NA> <NA>
#> 3 (-0.62, 35.62, 0) 35.62 AAE: ORN<br>7.81: -0.62 <NA> <NA> <NA>
#> 4 (-61.79, 17.14, 0) 17.14 AAE: ANU<br>7.81: -61.79 <NA> <NA> <NA>
#> 5 (-70.02, 12.5, 0) 12.50 AAE: AUA<br>7.81: -70.02 <NA> <NA> <NA>
#> 6 (14.19, 48.23, 0) 48.23 AAE: LNZ<br>7.81: 14.19 <NA> <NA> <NA>
#> altitudeMode tessellate extrude visibility drawOrder icon AAE X7_81
#> 1 <NA> -1 0 -1 NA <NA> ALG 3.22
#> 2 <NA> -1 0 -1 NA <NA> CZL 6.62
#> 3 <NA> -1 0 -1 NA <NA> ORN -0.62
#> 4 <NA> -1 0 -1 NA <NA> ANU -61.79
#> 5 <NA> -1 0 -1 NA <NA> AUA -70.02
#> 6 <NA> -1 0 -1 NA <NA> LNZ 14.19
Extract the information that we are interested in.
coords <- bind_cols(
attr(coords, "data") %>% mutate(ID = str_sub(description, 6, 8)) %>% select(ID),
attr(coords, "coords") %>% as_tibble %>% select(Latitude = coords.x2, Longitude = coords.x1)
)
coords %>% head()
#> ID Latitude Longitude
#> 1 ALG 36.69 3.22
#> 2 CZL 36.28 6.62
#> 3 ORN 35.62 -0.62
#> 4 ANU 17.14 -61.79
#> 5 AUA 12.50 -70.02
#> 6 LNZ 48.23 14.19
Join to our stns
table.
left_join(stns, coords, by = "ID") %>%
select(Station, ID, Latitude, Longitude, Start:Gaps) -> stns
stns
#> # A tibble: 110 x 7
#> Station ID Latitude Longitude Start End Gaps
#> <chr> <chr> <dbl> <dbl> <int> <int> <chr>
#> 1 Abbotsford, British Columbia YXX 49.0 -122. 1977 2019 ""
#> 2 Alert, Nunavut YLT 82.5 -62.3 1950 2018 "1964-19…
#> 3 Bagotville, Quebec YBG 48.3 -71 1951 2019 ""
#> 4 Baker Lake, Nunavut YBK 64.3 -96.1 1946 2019 "1948-19…
#> 5 Big Trout Lake, Ontario YTL 53.8 -89.9 1953 2019 "1992-19…
#> 6 Brandon, Manitoba YBR 49.9 -100. 1977 2019 ""
#> 7 Calgary, Alberta YYC 51.1 -114. 1953 2019 ""
#> 8 Cambridge Bay, Nunavut YCB 69.1 -105. 1950 2019 "1960-19…
#> 9 Cartwright, Newfoundland YRF 53.7 -57.0 1955 2019 "1997"
#> 10 Charlottetown, Prince Edward … YYG 46.3 -63.1 1953 2019 ""
#> # … with 100 more rows
Getting station data
Let’s look at how we would analyze data for a given station. True to form, I’ll stick to Toronto for the first portion of this blog post. This is the station called “Toronto, Ontario” in our table above.
First, we want to get the relevant data from our stns
object for the Station.
get_stn_info <- function(station_in, stns) {
filter(stns, Station == station_in)
}
stn_info <- get_stn_info("Toronto, Ontario", stns)
stn_info
#> # A tibble: 1 x 7
#> Station ID Latitude Longitude Start End Gaps
#> <chr> <chr> <dbl> <dbl> <int> <int> <chr>
#> 1 Toronto, Ontario YYZ 43.7 -79.6 1950 2019 1967-1970
Now we need to find the ECCC stations that are co-located (or nearby) to 43.68°N, 79.63°W.
get_eccc_data <- function(stn_info) {
# Look for nearby stations using canadaHCDx
eccc_stns <- filter(find_station(),
LatitudeDD == stn_info$Latitude & LongitudeDD == stn_info$Longitude)
# If no station is co-located, find another within (an arbitrary) 2 km
if (nrow(eccc_stns) == 0) {
eccc_stns <- find_station(target = c(stn_info$latitude, stn_info$longitude), dist = 0:2)
}
# Return data using the CanadaHCD package.
# Cache requires fork - remotes::install_github("ConorIA/canadaHCD@cache_external")
hcd_hourly(eccc_stns$StationID,
year = stn_info$Start:stn_info$End,
month = c(1, 2, 6, 7, 8, 12), cache = TRUE, progress = FALSE)
}
The function above will return ECCC data for one or more nearby stations (co-located or within 2 km).
dat <- get_eccc_data(stn_info)
dat
#> # A tibble: 612,336 x 15
#> Station StationID Longitude Latitude DateTime Temp DewPointTemp
#> <chr> <chr> <dbl> <dbl> <dttm> <dbl> <dbl>
#> 1 TORONT… 6158731 -79.6 43.7 1950-01-01 00:00:00 NA NA
#> 2 TORONT… 6158731 -79.6 43.7 1950-01-01 01:00:00 NA NA
#> 3 TORONT… 6158731 -79.6 43.7 1950-01-01 02:00:00 NA NA
#> 4 TORONT… 6158731 -79.6 43.7 1950-01-01 03:00:00 NA NA
#> 5 TORONT… 6158731 -79.6 43.7 1950-01-01 04:00:00 NA NA
#> 6 TORONT… 6158731 -79.6 43.7 1950-01-01 05:00:00 NA NA
#> 7 TORONT… 6158731 -79.6 43.7 1950-01-01 06:00:00 NA NA
#> 8 TORONT… 6158731 -79.6 43.7 1950-01-01 07:00:00 NA NA
#> 9 TORONT… 6158731 -79.6 43.7 1950-01-01 08:00:00 NA NA
#> 10 TORONT… 6158731 -79.6 43.7 1950-01-01 09:00:00 NA NA
#> # … with 612,326 more rows, and 8 more variables: RelHumidity <int>,
#> # WindDir <int>, WindSpeed <int>, Visibility <dbl>, Pressure <dbl>,
#> # Humidex <int>, WindChill <int>, Weather <chr>
Now we need the corresponding weather types data, binned as I have done in previously: “warm” (DT, MT, MT+, MT++), “moderate” (DM, MM), and “cool” (DP, MP), plus “transitional”.
get_ssc_data <- function(stn_info) {
f <- tempfile(fileext = ".dbdmt")
url <- sprintf("http://sheridan.geog.kent.edu/ssc/files/%s.dbdmt", stn_info$ID)
download.file(url, f, quiet = TRUE)
air <- read_table(f, col_names = c("Station", "Date", "Air"),
col_types = cols(col_character(),
col_date(format = "%Y%m%d"),
col_factor(levels = c(1, 2, 3, 4, 5, 6, 7, 66, 67))))
levels(air$Air) <- list(warm = c(3, 6, 66, 67), mod = c(1, 4), cool = c(2, 5), trans = 7)
air %>% mutate(Air = forcats::fct_explicit_na(air$Air, na_level = "(none)"))
}
air <- get_ssc_data(stn_info)
air
#> # A tibble: 25,933 x 3
#> Station Date Air
#> <chr> <date> <fct>
#> 1 YYZ 1950-01-01 mod
#> 2 YYZ 1950-01-02 mod
#> 3 YYZ 1950-01-03 warm
#> 4 YYZ 1950-01-04 trans
#> 5 YYZ 1950-01-05 cool
#> 6 YYZ 1950-01-06 cool
#> 7 YYZ 1950-01-07 trans
#> 8 YYZ 1950-01-08 trans
#> 9 YYZ 1950-01-09 cool
#> 10 YYZ 1950-01-10 trans
#> # … with 25,923 more rows
Making daily data
As explained in a previous post, the use of 6AM UTC for the “cutoff” point for the dermatological day is not ideal. I will once again use the sun-based COWND proposed by Žaknić-Ćatović and Gough ( 2018).
I need a few helper functions to get me from A to B in this case.
A function to find the timezone of each station is implemented in the claut package, which in turn uses the lutz package.
And functions to calculate windchill (wcl
) and humidex (hdx
).
wcl <- function(tair, wspeed) {
wcl = 13.12 + 0.6215 * tair - 11.37 * wspeed^0.16 + 0.3965 * tair * wspeed^0.16
wcl[tair >= 10 | wspeed < 4.8] <- NA
round(wcl)
}
hdx <- function(tair, tdew) {
tdew = tdew + 273.15
e = 6.11 * exp(1)^(5417.7530 * ((1 / 273.16) - (1 / tdew)))
h = 0.5555 * (e - 10)
round(tair + h)
}
Now let’s create the function to create our data set(s).
I am going to include an additional note
make_eccc_daily <- function(stn_info, dat) {
tz <- get_tz(stn_info$Latitude, stn_info$Longitude)
dat %>% group_by(DateTime) %>%
summarize(StationID = paste(StationID[!is.na(Temp)], collapse = ", "),
Temp = mean(Temp, na.rm = TRUE),
DewPointTemp = mean(DewPointTemp, na.rm = TRUE),
WindSpeed = mean(WindSpeed, na.rm = TRUE),
.groups = "drop") -> dat
filter(dat, month(DateTime) %in% c(1, 2, 12)) %>%
mutate(WindChill = wcl(Temp, WindSpeed)) %>%
select(DateTime, Temp, WindChill) %>%
solar_daily(., stn_info$Latitude, stn_info$Longitude,
tz, na.rm = TRUE) -> win_dly
filter(dat, month(DateTime) %in% 6:8) %>%
mutate(Humidex = hdx(Temp, DewPointTemp)) %>%
select(DateTime, Temp, Humidex) %>%
solar_daily(., stn_info$Latitude, stn_info$Longitude,
tz, na.rm = TRUE) -> sum_dly
list(win_dly = win_dly, sum_dly = sum_dly)
}
daily_dat <- make_eccc_daily(stn_info, dat)
Identifying “events”
Now we can identify our events. Here we will look at heatwaves and cold snaps. I will be working with the following definitions:
- A cold snap is any two or more days with minimum temperatures below -15°C or windchill below -20°C.
- A heat wave is any two or more days with maximum temperatures above 30°C and minimum temperatures above 20°C, or days when the humidex exceeds 40.
win_expression <- "MinTemp <= -15 | MinWindChill <= -20"
win_event_length <- 2
sum_expression <- "(MaxTemp >= 31 & MinTemp >= 20) | MaxHumidex >= 40"
sum_event_length <- 2
make_eccc_warns <- function(stn_info, daily_dat,
win_expression, win_event_length,
sum_expression, sum_event_length) {
sum_warns <- daily_dat$sum_dly %>%
mutate(Criteria = eval(parse(text = sum_expression))) %>%
group_by(Year = year(Date)) %>%
mutate(Start = describe_snaps(rle(Criteria), sum_event_length, Date, min),
End = describe_snaps(rle(Criteria), sum_event_length, Date, max),
Length = describe_snaps(rle(Criteria), sum_event_length)) %>%
filter(!is.na(Length)) %>%
group_by(Year, Start, End, Length) %>%
summarize_all(mean) %>%
select(-Criteria, -Date)
win_warns <- daily_dat$win_dly %>%
mutate(Criteria = eval(parse(text = win_expression))) %>%
mutate(Year = case_when(month(Date) == 12 ~ year(Date) + 1,
TRUE ~ year(Date))) %>%
filter(Year != stn_info$Start) %>%
group_by(Year) %>%
mutate(Start = describe_snaps(rle(Criteria), win_event_length, Date, min),
End = describe_snaps(rle(Criteria), win_event_length, Date, max),
Length = describe_snaps(rle(Criteria), win_event_length)) %>%
filter(!is.na(Length)) %>%
group_by(Year, Start, End, Length) %>%
summarize_all(mean) %>%
select(-Criteria, -Date)
list(win_warns = win_warns, sum_warns = sum_warns)
}
warns <- make_eccc_warns(stn_info, daily_dat,
win_expression, win_event_length,
sum_expression, sum_event_length)
Creating seasonal summaries
Let’s look at some seasonal summaries as well. Summarize the number of days with a particular air mass, by season.
make_air_sets <- function(stn_info, air, daily_dat) {
sum_air <- left_join(daily_dat$sum_dly, select(air, -Station), by = "Date") %>%
group_by(Year = year(Date), Air) %>%
count() %>%
ungroup() %>%
pivot_wider(names_from = "Air", values_from = "n", values_fill = list(n = 0L))
win_air <- left_join(daily_dat$win_dly, select(air, -Station), by = "Date") %>%
mutate(Year = case_when(month(Date) == 12 ~ year(Date) + 1,
TRUE ~ year(Date))) %>%
filter(Year != stn_info$Start) %>%
group_by(Year, Air) %>%
count() %>%
ungroup() %>%
pivot_wider(names_from = "Air", values_from = "n", values_fill = list(n = 0L))
list(win_air = win_air, sum_air = sum_air)
}
airs <- make_air_sets(stn_info, air, daily_dat)
airs$sum_air
#> # A tibble: 70 x 6
#> Year warm mod cool trans `(none)`
#> <dbl> <int> <int> <int> <int> <int>
#> 1 1950 8 48 21 15 0
#> 2 1951 16 48 15 13 0
#> 3 1952 26 45 7 14 0
#> 4 1953 22 44 17 8 1
#> 5 1954 17 50 15 10 0
#> 6 1955 43 34 9 6 0
#> 7 1956 11 51 21 9 0
#> 8 1957 18 43 18 13 0
#> 9 1958 12 45 14 21 0
#> 10 1959 35 37 8 12 0
#> # … with 60 more rows
Summarize the characteristics of “events” during each season.
make_eccc_warns_seasonal <- function(warns) {
summ_sum <- left_join(warns$sum_warns %>% group_by(Year) %>% count(),
warns$sum_warns %>% group_by(Year) %>%
summarize(AvgLen = mean(Length),
AvgTmax = weighted.mean(MaxTemp, Length),
AvgTmax = weighted.mean(MinTemp, Length),
AvgTmean = weighted.mean(MeanTemp, Length),
AvgMaxHumidex = weighted.mean(MaxHumidex, Length)),
by = "Year")
summ_win <- left_join(warns$win_warns %>% group_by(Year) %>% count(),
warns$win_warns %>% group_by(Year) %>%
summarize(AvgLen = mean(Length),
AvgTmax = weighted.mean(MaxTemp, Length),
AvgTmax = weighted.mean(MinTemp, Length),
AvgTmean = weighted.mean(MeanTemp, Length),
AvgMinWindChill = weighted.mean(MinWindChill, Length)),
by = "Year")
list(summ_win = summ_win, summ_sum = summ_sum)
}
warns_summary <- make_eccc_warns_seasonal(warns)
warns_summary$summ_win
#> # A tibble: 67 x 6
#> # Groups: Year [67]
#> Year n AvgLen AvgTmax AvgTmean AvgMinWindChill
#> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1953 2 2 -13.5 -8.50 -22.2
#> 2 1954 7 2.71 -17.3 -12.3 -25.8
#> 3 1955 5 3.2 -16.5 -12.4 -25.1
#> 4 1956 7 2.86 -14.5 -10.3 -22.2
#> 5 1957 4 3.5 -16.9 -11.9 -25.4
#> 6 1958 5 4.6 -15.6 -11.8 -25.5
#> 7 1959 13 3.15 -16.0 -11.6 -24.4
#> 8 1960 4 2.25 -14.8 -10.6 -24
#> 9 1961 5 5.2 -17.7 -13.3 -26.3
#> 10 1962 11 2.73 -15.7 -11.3 -25.2
#> # … with 57 more rows
And finally a seasonal weather data set.
make_eccc_seasonal <- function(stn_info, daily_dat,
sum_expression, win_expression) {
win_seas <- daily_dat$win_dly %>%
mutate(Year = case_when(month(Date) == 12 ~ year(Date) + 1,
TRUE ~ year(Date)),
Criteria = eval(parse(text = win_expression))) %>%
filter(Year != stn_info$Start) %>%
group_by(Year) %>%
summarize(MaxTemp = mean(MaxTemp, na.rm = TRUE),
MinTemp = mean(MinTemp, na.rm = TRUE),
MeanTemp = mean(MeanTemp, na.rm = TRUE),
MinWindChill = mean(MinWindChill, na.rm = TRUE),
ColdDays = sum(Criteria, na.rm = TRUE)) %>%
mutate_if(is.numeric, ~ifelse(abs(.) == Inf,NA,.)) %>%
mutate(ColdDays = ifelse(is.na(MaxTemp) & is.na(MinTemp) & is.na(MeanTemp),
NA, ColdDays)) #FIXME: Need a more elegant solution to this
sum_seas <- daily_dat$sum_dly %>%
mutate(Criteria = eval(parse(text = sum_expression))) %>%
group_by(Year = year(Date)) %>%
summarize(MaxTemp = mean(MaxTemp, na.rm = TRUE),
MinTemp = mean(MinTemp, na.rm = TRUE),
MeanTemp = mean(MeanTemp, na.rm = TRUE),
MaxHumidex = mean(MaxHumidex, na.rm = TRUE),
HeatDays = sum(Criteria, na.rm = TRUE)) %>%
mutate_if(is.numeric, ~ifelse(abs(.) == Inf,NA,.)) %>%
mutate(HeatDays = ifelse(is.na(MaxTemp) & is.na(MinTemp) & is.na(MeanTemp),
NA, HeatDays)) #FIXME: Need a more elegant solution to this
list(win_seas = win_seas, sum_seas = sum_seas)
}
seas_dat <- make_eccc_seasonal(stn_info, daily_dat,
sum_expression, win_expression)
seas_dat$sum_seas
#> # A tibble: 70 x 6
#> Year MaxTemp MinTemp MeanTemp MaxHumidex HeatDays
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1950 NA NA NA NA NA
#> 2 1951 NA NA NA NA NA
#> 3 1952 NA NA NA NA NA
#> 4 1953 25.8 15.0 20.3 30.3 10
#> 5 1954 25.0 14.3 19.5 28.8 3
#> 6 1955 27.7 16.6 22.1 32.4 17
#> 7 1956 23.9 14.6 19.0 27.8 3
#> 8 1957 25.4 14.6 19.8 29.1 9
#> 9 1958 24.7 13.3 19.1 27.5 1
#> 10 1959 27.4 16.3 21.7 31.6 13
#> # … with 60 more rows
Exploring the initial results
Now we can plot the initial results. Some of these plots will need to be tweaked, for instance, to remove months with too many missing values.
Winter “cold snaps”
Winter SSC classes
airs$win_air %>%
pivot_longer(-Year, names_to = "air", values_to = "n") %>%
plot_ly(x = ~Year, y = ~n, color = ~air) %>%
add_lines() %>%
layout(title = "Seasonal wintertime air masses",
yaxis = list(title = "Number of instances of air mass type"))
Winter weather events
warns$win_warns %>%
plot_ly(x = ~Start, y = ~MinWindChill, color = ~MinTemp, size = ~Length,
hoverinfo = "text",
hovertext = paste0(warns$win_warns$Start, ": ",
warns$win_warns$Length, "days")) %>%
plotly::add_markers() %>%
layout(title = "Wintertime cold alerts",
xaxis = list(title = "Start Date",
range = c(min(warns$win_warns$Start - 1000),
max(warns$win_warns$Start) + 1000)))
Seasonal winter weather event summary
seas_dat$win_seas %>%
select(-`ColdDays`) %>%
pivot_longer(-Year, names_to = "Var", values_to = "value") %>%
plot_ly(x = ~Year, y = ~value, color = ~Var) %>%
add_lines() %>%
layout(yaxis = list(title = "Temperature (°C) / WindChill"),
title = "Winter temperature and windchill")
warns_summary$summ_win %>%
plot_ly(x = ~Year, y = ~n,
color = ~AvgMinWindChill, size = ~AvgLen,
hoverinfo = "text",
hovertext = paste0(warns_summary$summ_win$Year, ": ",
warns_summary$summ_win$n,
" event(s) with mean duration of ",
round(warns_summary$summ_win$AvgLen, 1),
" days")) %>% plotly::add_markers() %>%
layout(title = "Seasonal wintertime cold alert weather",
yaxis = list(title = "Number of cold alerts"))
We can also perform a quick linear regression (though these data are likely to be junk until I’ve done a better job of dealing with missing values and adding in the “zero-event” years.)
Seasonal characteristics
fx <- function(x) {
if (all(is.na(x$value))) {
tibble(min = NA, max = NA, mean = NA)
} else {
summ <- summary(lm(value~Year, data = x))
tibble(min = min(x$value, na.rm = TRUE),
max = max(x$value, na.rm = TRUE),
mean = mean(x$value, na.rm = TRUE),
slope = coef(summ)[2], rsq = summ$r.squared, p = coef(summ)[8])
}
}
seas_dat$win_seas %>%
pivot_longer(-Year, names_to = "var", values_to = "value") %>%
group_by(var) %>%
nest() %>%
mutate(model = map(data, ~fx(.))) %>%
select(-data) %>%
unnest(model) %>%
knitr::kable("markdown", digits = 3, escape = FALSE)
var | min | max | mean | slope | rsq | p |
---|---|---|---|---|---|---|
MaxTemp | -5.339 | 2.738 | -1.617 | 0.032 | 0.131 | 0.002 |
MinTemp | -11.398 | -2.677 | -7.110 | 0.049 | 0.214 | 0.000 |
MeanTemp | -8.050 | 0.047 | -4.243 | 0.039 | 0.170 | 0.000 |
MinWindChill | -18.644 | -7.912 | -13.363 | 0.056 | 0.185 | 0.000 |
ColdDays | 1.000 | 44.000 | 22.191 | -0.192 | 0.138 | 0.002 |
Characteristics of cold snaps
warns_summary$summ_win %>%
filter(Year != year(Sys.Date())) %>%
pivot_longer(-Year, names_to = "var", values_to = "value") %>%
group_by(var) %>%
nest() %>%
mutate(model = map(data, ~fx(.))) %>%
select(-data) %>%
unnest(model) %>%
knitr::kable("markdown", digits = 3, escape = FALSE)
var | min | max | mean | slope | rsq | p |
---|---|---|---|---|---|---|
n | 1.00 | 13.000 | 5.657 | -0.057 | 0.171 | 0.001 |
AvgLen | 2.00 | 5.571 | 3.301 | 0.000 | 0.000 | 0.965 |
AvgTmax | -18.96 | -12.000 | -16.180 | 0.018 | 0.062 | 0.042 |
AvgTmean | -14.75 | -6.621 | -11.854 | 0.009 | 0.016 | 0.305 |
AvgMinWindChill | -29.00 | -21.000 | -25.082 | 0.009 | 0.014 | 0.338 |
Summer “heat waves”
Summer SSC classes
airs$sum_air %>%
pivot_longer(-Year, names_to = "air", values_to = "n") %>%
plot_ly(x = ~Year, y = ~n, color = ~air) %>%
add_lines() %>%
layout(title = "Seasonal summertime air masses",
yaxis = list(title = "Number of instances of air mass type"))
Summer weather events
warns$sum_warns %>%
plot_ly(x = ~Start, y = ~MaxHumidex, color = ~MaxTemp, size = ~Length,
hoverinfo = "text",
hovertext = paste0(warns$sum_warns$Start, ": ",
warns$sum_warns$Length, "days")) %>%
plotly::add_markers() %>%
layout(title = "Summertime heat warnings",
xaxis = list(title = "Start Date",
range = c(min(warns$sum_warns$Start - 1000),
max(warns$sum_warns$Start) + 1000)))
Seasonal summer weather event summary
seas_dat$sum_seas %>%
select(-`HeatDays`) %>%
pivot_longer(-Year, names_to = "Var", values_to = "value") %>%
plot_ly(x = ~Year, y = ~value, color = ~Var) %>%
add_lines() %>%
layout(yaxis = list(title = "Temperature (°C) / Humidex"),
title = "Summertime temperature and humidex")
warns_summary$summ_sum %>%
plot_ly(x = ~Year, y = ~n,
color = ~AvgMaxHumidex, size = ~AvgLen,
hoverinfo = "text",
hovertext = paste0(warns_summary$summ_sum$Year, ": ",
warns_summary$summ_sum$n,
" event(s) with mean duration of ",
round(warns_summary$summ_sum$AvgLen, 1),
" days")) %>% plotly::add_markers() %>%
layout(title = "Seasonal summertime heat alert weather",
yaxis = list(title = "Number of heat warnings"))
Once again, we can perform a junky linear regression.
Seasonal characteristics
seas_dat$sum_seas %>%
pivot_longer(-Year, names_to = "var", values_to = "value") %>%
group_by(var) %>%
nest() %>%
mutate(model = map(data, ~fx(.))) %>%
select(-data) %>%
unnest(model) %>%
knitr::kable("markdown", digits = 3, escape = FALSE)
var | min | max | mean | slope | rsq | p |
---|---|---|---|---|---|---|
MaxTemp | 22.27 | 27.87 | 25.136 | 0.013 | 0.043 | 0.091 |
MinTemp | 12.02 | 18.66 | 15.234 | 0.053 | 0.486 | 0.000 |
MeanTemp | 17.61 | 23.08 | 20.119 | 0.033 | 0.262 | 0.000 |
MaxHumidex | 25.32 | 33.01 | 29.115 | 0.021 | 0.075 | 0.025 |
HeatDays | 0.00 | 21.00 | 6.313 | 0.060 | 0.053 | 0.060 |
Characteristics of heat waves
warns_summary$summ_sum %>%
filter(Year != year(Sys.Date())) %>%
pivot_longer(-Year, names_to = "var", values_to = "value") %>%
group_by(var) %>%
nest() %>%
mutate(model = map(data, ~fx(.))) %>%
select(-data) %>%
unnest(model) %>%
knitr::kable("markdown", digits = 3, escape = FALSE)
var | min | max | mean | slope | rsq | p |
---|---|---|---|---|---|---|
n | 1.00 | 6.000 | 2.311 | 0.012 | 0.032 | 0.239 |
AvgLen | 2.00 | 4.333 | 2.609 | -0.006 | 0.030 | 0.254 |
AvgTmax | 17.50 | 23.400 | 21.179 | 0.031 | 0.255 | 0.000 |
AvgTmean | 23.26 | 28.072 | 26.658 | 0.012 | 0.058 | 0.110 |
AvgMaxHumidex | 38.33 | 43.500 | 40.208 | -0.005 | 0.008 | 0.557 |
Fin
The code included here is just the start of a method to perform an analysis of weather events across space and time. Future blog posts will look at other regions in Canada, and I will continue to iterate upon and improve this code.
This post was compiled on 2020-10-09 10:50:57. Since that time, there may have been changes to the packages that were used in this post. If you can no longer use this code, please notify the author in the comments below.
Packages Used in this post
sessioninfo::package_info(dependencies = "Depends")
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.0)
#> canadaHCD * 0.0-2 2020-10-01 [1] Github (ConorIA/canadaHCD@e2f4d2b)
#> canadaHCDx * 0.0.8 2020-10-01 [1] gitlab (ConorIA/canadaHCDx@0f99419)
#> class 7.3-17 2020-04-26 [3] CRAN (R 4.0.2)
#> classInt 0.4-3 2020-04-07 [1] RSPM (R 4.0.0)
#> claut * 0.1.10 2020-10-08 [1] local
#> cli 2.0.2 2020-02-28 [1] RSPM (R 4.0.0)
#> codetools 0.2-16 2018-12-24 [3] CRAN (R 4.0.2)
#> colorspace 1.4-1 2019-03-18 [1] RSPM (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0)
#> crosstalk 1.1.0.1 2020-03-13 [1] RSPM (R 4.0.0)
#> curl 4.3 2019-12-02 [1] RSPM (R 4.0.0)
#> data.table 1.13.0 2020-07-24 [1] RSPM (R 4.0.2)
#> DBI 1.1.0 2019-12-15 [1] RSPM (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0)
#> doParallel 1.0.15 2019-08-02 [1] RSPM (R 4.0.0)
#> downlit 0.2.0 2020-09-25 [1] RSPM (R 4.0.2)
#> dplyr * 1.0.2 2020-08-18 [1] RSPM (R 4.0.2)
#> e1071 1.7-3 2019-11-26 [1] RSPM (R 4.0.0)
#> ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] RSPM (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0)
#> farver 2.0.3 2020-01-16 [1] RSPM (R 4.0.0)
#> forcats 0.5.0 2020-03-01 [1] RSPM (R 4.0.0)
#> foreach 1.5.0 2020-03-30 [1] RSPM (R 4.0.0)
#> fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.2)
#> gdata 2.18.0 2017-06-06 [1] RSPM (R 4.0.0)
#> generics 0.0.2 2018-11-29 [1] RSPM (R 4.0.0)
#> geosphere 1.5-10 2019-05-26 [1] RSPM (R 4.0.0)
#> ggplot2 * 3.3.2 2020-06-19 [1] RSPM (R 4.0.1)
#> glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.2)
#> gtable 0.3.0 2019-03-25 [1] RSPM (R 4.0.0)
#> gtools 3.8.2 2020-03-31 [1] RSPM (R 4.0.0)
#> highr 0.8 2019-03-20 [1] RSPM (R 4.0.0)
#> hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0)
#> htmltools 0.5.0 2020-06-16 [1] RSPM (R 4.0.1)
#> htmlwidgets 1.5.1 2019-10-08 [1] RSPM (R 4.0.0)
#> httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.2)
#> hugodown 0.0.0.9000 2020-10-08 [1] Github (r-lib/hugodown@18911fc)
#> iterators 1.0.12 2019-07-26 [1] RSPM (R 4.0.0)
#> jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2)
#> KernSmooth 2.23-17 2020-04-26 [3] CRAN (R 4.0.2)
#> knitr 1.30 2020-09-22 [1] RSPM (R 4.0.2)
#> lattice 0.20-41 2020-04-02 [1] RSPM (R 4.0.0)
#> lazyeval 0.2.2 2019-03-15 [1] RSPM (R 4.0.0)
#> lifecycle 0.2.0 2020-03-06 [1] RSPM (R 4.0.0)
#> lpSolve 5.6.15 2020-01-24 [1] RSPM (R 4.0.0)
#> lubridate * 1.7.9 2020-06-08 [1] RSPM (R 4.0.2)
#> lutz 0.3.1 2019-07-19 [1] RSPM (R 4.0.2)
#> magrittr 1.5 2014-11-22 [1] RSPM (R 4.0.0)
#> munsell 0.5.0 2018-06-12 [1] RSPM (R 4.0.0)
#> pillar 1.4.6 2020-07-10 [1] RSPM (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.0)
#> plotly * 4.9.2.1 2020-04-04 [1] RSPM (R 4.0.0)
#> plyr 1.8.6 2020-03-03 [1] RSPM (R 4.0.2)
#> purrr * 0.3.4 2020-04-17 [1] RSPM (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0)
#> rappdirs 0.3.1 2016-03-28 [1] RSPM (R 4.0.0)
#> RColorBrewer 1.1-2 2014-12-07 [1] RSPM (R 4.0.0)
#> Rcpp 1.0.5 2020-07-06 [1] RSPM (R 4.0.2)
#> readr * 1.3.1 2018-12-21 [1] RSPM (R 4.0.2)
#> reshape2 1.4.4 2020-04-09 [1] RSPM (R 4.0.2)
#> rgdal 1.5-16 2020-08-07 [1] RSPM (R 4.0.2)
#> rlang 0.4.7 2020-07-09 [1] RSPM (R 4.0.2)
#> rmarkdown 2.3 2020-06-18 [1] RSPM (R 4.0.1)
#> scales 1.1.1 2020-05-11 [1] RSPM (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.0)
#> sf 0.9-6 2020-09-13 [1] RSPM (R 4.0.2)
#> sp 1.4-2 2020-05-20 [1] RSPM (R 4.0.0)
#> storr 1.2.1 2018-10-18 [1] RSPM (R 4.0.0)
#> stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.2)
#> stringr * 1.4.0 2019-02-10 [1] RSPM (R 4.0.0)
#> tibble * 3.0.3 2020-07-10 [1] RSPM (R 4.0.2)
#> tidyr * 1.1.2 2020-08-27 [1] RSPM (R 4.0.2)
#> tidyselect 1.1.0 2020-05-11 [1] RSPM (R 4.0.0)
#> units 0.6-7 2020-06-13 [1] RSPM (R 4.0.2)
#> utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.0)
#> vctrs 0.3.4 2020-08-29 [1] RSPM (R 4.0.2)
#> viridisLite 0.3.0 2018-02-01 [1] RSPM (R 4.0.0)
#> withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2)
#> xfun 0.18 2020-09-29 [2] RSPM (R 4.0.2)
#> XML * 3.99-0.5 2020-07-23 [1] RSPM (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.0)
#> zoo 1.8-8 2020-05-02 [1] RSPM (R 4.0.0)
#>
#> [1] /home/conor/Library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/local/lib/R/library
Sheridan, Scott C. 2002. “The Redevelopment of a Weather-Type Classification Scheme for North America.” International Journal of Climatology 22 (1): 51–68. https://doi.org/10.1002/joc.709.
Žaknić-Ćatović, Ana, and William A. Gough. 2018. “A Comparison of Climatological Observing Windows and Their Impact on Detecting Daily Temperature Extrema.” Theoretical and Applied Climatology 132 (1-2): 41–54. https://doi.org/10.1007/S00704-017-2068-Y.