Choosing stations in Peru pt. 2, Hydrological Stations

This is a follow-up to my last post, where I looked at some relevant meteorological stations for a study in Peru. Today, I will look for hydrological stations (river gauges), that could inform my study of flooding in the area. Libraries first.

In my previous post, I found just two river stations within 20 km of my study site. These were both downstream, though, so I think I will have to open up my radius a little more. Let’s see the stations that fall within 100 km of San Rafael.

hstations <- station_search(config = "h", target = c(-7.023833, -76.465222), dist = 1:100)
hstations

#> # A tibble: 12 x 15
#> # Rowwise: 
#>    Station StationID Type  Configuration `Data Start` `Data End` `Period (Yr)`
#>    <chr>   <chr>     <chr> <chr>         <chr>        <chr>              <dbl>
#>  1 SAN CR… 221803    CON   H             1968         1980                  13
#>  2 PICOTA  230715    CON   H             2003         2020                  18
#>  3 BIAVO   221804    CON   H             1969         2020                  52
#>  4 SHAMBO… 221813    CON   H             2006         2020                  15
#>  5 HUAYAB… 230504    CON   H             2005         2020                  16
#>  6 DESAGU… 221807    CON   H             1972         1991                  20
#>  7 LAGUNA… 221806    CON   H             1972         2020                  49
#>  8 CAMPAN… 221821    CON   H             1998         2020                  23
#>  9 CUMBAZA 221801    CON   H             1968         2020                  53
#> 10 CHAZUTA 230601    CON   H             2003         2020                  18
#> 11 SHANAO  221802    CON   H             1965         2020                  56
#> 12 SHANAO  210006    SUT   H             2016         2020                   5
#> # … with 8 more variables: `Station Status` <chr>, Latitude <dbl>,
#> #   Longitude <dbl>, Altitude <int>, Region <chr>, Province <chr>,
#> #   District <chr>, Dist <dbl>

There are 12 stations within 100 km. Let’s see where these are located.

# This block won't run because it addAwesomeMarkers doesn't seem to work in hugodown
map_stations(hstations)

I can be pretty sure that I can discard Shamboyacu, Shanao, Cumbaza, and Laguna Sauce … these are on tributaries that are downstream of my study site. There are also a couple of closed stations that only have old data, so I can discard those as well. Chazuta isn’t near San Rafael, but I did do some interviews in that district, so I’ll keep the station for the time being.

hstations_filtered <- hstations %>%
  filter(!(Station %in% c("CUMBAZA", "LAGUNA SAUCE", "SHANAO", "SHAMBOYACU")) &
           `Station Status` != "closed")

Let’s see the stations that are left.

# This block won't run because it addAwesomeMarkers doesn't seem to work in hugodown
map_stations(hstations_filtered)


We can download that data using the senamhiR package.

dat <- senamhiR(hstations_filtered$StationID)

Data audit

The completeness of the data can be assessed, as in my last post, with quick_audit(), in the senamhiR package.

dat_audit <- lapply(dat, quick_audit, by = "year")

Station 1: PICOTA (230715)

dat_audit[[1]] %>% filter_at(2:ncol(.), any_vars(. < 100)) %>% print(n = 50)

#> # A tibble: 18 x 6
#>     Year `Nivel06 (m) pc… `Nivel10 (m) pc… `Nivel14 (m) pc… `Nivel18 (m) pc…
#>    <int>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1  2003              0                0                0                0  
#>  2  2004              0                0                0                0  
#>  3  2005              0                0                0                0  
#>  4  2006              0                0                0                0  
#>  5  2007              0                0                0                0  
#>  6  2008              0                0                0                0  
#>  7  2009              0                0                0                0  
#>  8  2010              0                0                0                0  
#>  9  2011              0                0                0                0  
#> 10  2012              0                0                0                0  
#> 11  2013              0                0                0                0  
#> 12  2014              0                0                0                0  
#> 13  2015             75.3             75.3             75.3             75.3
#> 14  2016              0                0                0                0  
#> 15  2017              0                0                0                0  
#> 16  2018              0                0                0                0  
#> 17  2019              0                0                0                0  
#> 18  2020             83.6             83.6             83.6             83.6
#> # … with 1 more variable: `Caudal (m^3/s) pct NA` <dbl>

Good level data at Picota from 2003 to 2017, except for in 2015. The “flow” data is not useful though.

Station 2: BIAVO (221804)

dat_audit[[2]] %>% filter_at(2:ncol(.), any_vars(. < 100)) %>% print(n = 50)

#> # A tibble: 39 x 6
#>     Year `Nivel06 (m) pc… `Nivel10 (m) pc… `Nivel14 (m) pc… `Nivel18 (m) pc…
#>    <int>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1  1969            0              100              100                0    
#>  2  1970           16.7             41.4             41.4             16.7  
#>  3  1971           16.7             16.7             16.7             16.7  
#>  4  1972            0                0                0                0    
#>  5  1973            0                0                0                0    
#>  6  1974            0                0                0                0    
#>  7  1975            2.74             2.74             2.74             2.74 
#>  8  1976           16.9             16.9             16.9             16.9  
#>  9  1977           67.9             67.9             67.9             67.9  
#> 10  1978            5.21             5.21             4.66             4.66 
#> 11  1979           21.1             21.1             21.1             21.1  
#> 12  1993           69.9             69.9             69.9             69.9  
#> 13  1994            0.548            0.548            0.548            0.548
#> 14  1995            0                0                0                0    
#> 15  1996            0                0                0                0    
#> 16  1997            0                0                0                0    
#> 17  1998            1.37             0                0                0    
#> 18  1999            0                0                0                0    
#> 19  2000            0                0                0                0    
#> 20  2001            0                0                0                0    
#> 21  2002            0                0                0                0    
#> 22  2003            0                0                0                0    
#> 23  2004            0                0                0                0    
#> 24  2005            0                0                0                0    
#> 25  2006            0                0                0                0    
#> 26  2007            0                0                0                0    
#> 27  2008            0                0                0                0    
#> 28  2009            0                0                0                0    
#> 29  2010            0                0                0                0    
#> 30  2011            0                0                0                0    
#> 31  2012            0                0                0                0    
#> 32  2013            0                0                0                0    
#> 33  2014            0                0                0                0    
#> 34  2015           75.3             75.3             75.3             75.3  
#> 35  2016            0                0                0                0    
#> 36  2017           25.2             25.2             25.2             25.2  
#> 37  2018           67.1             67.1             67.1             67.1  
#> 38  2019            0                0                0                0    
#> 39  2020           83.6             83.6             83.6             83.6  
#> # … with 1 more variable: `Caudal (m^3/s) pct NA` <dbl>

We have some data for Biavo in the 1970s, but the data becomes most useful in the mid 90s. Some data is missing in 2015 and 2017. We have both river level and discharge here.

Station 3: HUAYABAMBA (230504)

dat_audit[[3]] %>% filter_at(2:ncol(.), any_vars(. < 100)) %>% print(n = 50)

#> # A tibble: 16 x 6
#>     Year `Nivel06 (m) pc… `Nivel10 (m) pc… `Nivel14 (m) pc… `Nivel18 (m) pc…
#>    <int>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1  2005             8.22             8.22             16.4             8.22
#>  2  2006            24.9             24.9              24.9            24.9 
#>  3  2007            16.2             16.2              16.2            16.2 
#>  4  2008             0                0                 0               0   
#>  5  2009             0                0                 0               0   
#>  6  2010             0                0                 0               0   
#>  7  2011             0                0                 0               0   
#>  8  2012             0                0                 0               0   
#>  9  2013             0                0                 0               0   
#> 10  2014             0                0                 0               0   
#> 11  2015            75.3             75.3              75.3            75.3 
#> 12  2016             0                0                 0               0   
#> 13  2017             0                0                 0               0   
#> 14  2018             0                0                 0               0   
#> 15  2019             0                0                 0               0   
#> 16  2020            83.6             83.6              83.6            83.6 
#> # … with 1 more variable: `Caudal (m^3/s) pct NA` <dbl>

Huayabamba is really only useful as of 2008 or so, but 2005 through 2007 may contain some useful data. River level is present. Not much discharge data.

Station 4: CAMPANILLA (221821)

dat_audit[[4]] %>% filter_at(2:ncol(.), any_vars(. < 100)) %>% print(n = 50)

#> # A tibble: 23 x 6
#>     Year `Nivel06 (m) pc… `Nivel10 (m) pc… `Nivel14 (m) pc… `Nivel18 (m) pc…
#>    <int>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1  1998           41.4             41.4             41.4             41.4  
#>  2  1999            0                0                0                0    
#>  3  2000            0.273            0.273            0.273            0.273
#>  4  2001            0                0                0                0    
#>  5  2002            0                0                0                0    
#>  6  2003            0                0                0                0    
#>  7  2004            0                0                0                0    
#>  8  2005            0                0                0                0    
#>  9  2006            8.49             8.49             8.49             8.49 
#> 10  2007           16.2             16.2             16.2             16.2  
#> 11  2008            0                0                0                0    
#> 12  2009            0                0                0                0    
#> 13  2010            0                0                0                0    
#> 14  2011            0                0                0                0    
#> 15  2012            0                0                0                0    
#> 16  2013            0                0                0                0    
#> 17  2014           33.7             33.7             33.7             33.7  
#> 18  2015           75.3             75.3             75.3             75.3  
#> 19  2016           33.3             33.3             33.3             36.3  
#> 20  2017            0                0                0                0    
#> 21  2018            0                0                0                0    
#> 22  2019            0                0                0                0    
#> 23  2020           83.6             83.6             83.6             83.6  
#> # … with 1 more variable: `Caudal (m^3/s) pct NA` <dbl>

Campanilla has data from 1998 through 2017, but there is some patchiness that makes this station less useful to me. Again, very little discharge data.

Station 5: CHAZUTA (230601)

dat_audit[[5]] %>% filter_at(2:ncol(.), any_vars(. < 100)) %>% print(n = 50)

#> # A tibble: 18 x 6
#>     Year `Nivel06 (m) pc… `Nivel10 (m) pc… `Nivel14 (m) pc… `Nivel18 (m) pc…
#>    <int>            <dbl>            <dbl>            <dbl>            <dbl>
#>  1  2003            81.4             81.4             81.4             81.4 
#>  2  2004             0                0                0                0   
#>  3  2005            32.9             16.7             16.7             16.7 
#>  4  2006            24.9             24.9             24.9             24.9 
#>  5  2007            16.2             16.2             16.2             16.2 
#>  6  2008             0                0                0                0   
#>  7  2009             0                0                0                0   
#>  8  2010             0                0                0                0   
#>  9  2011             0                0                0                0   
#> 10  2012             0                0                0                0   
#> 11  2013             0                0                0                0   
#> 12  2014             0                0                0                0   
#> 13  2015            75.3             75.3             75.3             75.3 
#> 14  2016            75.1             75.1             75.1             75.1 
#> 15  2017             0                0                0                0   
#> 16  2018             0                0                0                0   
#> 17  2019             8.49             8.49             8.49             8.49
#> 18  2020            83.6             83.6             83.6             83.6 
#> # … with 1 more variable: `Caudal (m^3/s) pct NA` <dbl>

Last, and actually least, is Chazuta. Chazuta isn’t interesting for my analysis in San Rafael, but I did do some interviews in Chazuta, which will inform my wider research. The station is similar in total length to Picota, but has some years with much missing data. No discharge data is available.

Data cleaning

The Senamhi website states in a number of places that their data have not undergone quality control. This is apparent in the data, so my first step is data cleaning. I have written functions to clean up some common errors in temperature data in the senamhiR package, but I have yet to extend these functions to precipitation or river data. As such, I’ll have to do things manually for this post. Including all of the charts will get a bit cluttered, so I’ll run through just one example and then correct the rest.

As we can see, there are two outliers at Picota.

plot_data <- dat[[1]] %>% select(-matches("Caudal")) %>% gather(key = Time, value = Level, -Fecha)
plot_ly(data = plot_data, x = ~Fecha, y = ~Level, color = ~Time) %>% add_markers()

We can look at these two points within the context of nearby points. For instance, how about the low point for level at 6 pm (1.83 m). The river level three days before and after that point is shown below.

dat[[1]]$`Nivel18 (m)`[which.min(dat[[1]]$`Nivel18 (m)`) + -3:3]

#> [1] 16.20 15.98 15.79  1.83 16.52 16.53 16.88

I can’t be sure, but it looks like we have a decimal place shift. If I adjust this point to 18.3, the difference between that point and the average for 30 days forward and backward (16.42) is 1.9, which falls within 0.91 standard deviations of the daily values for the same period (2.08). I can be fairly sure, therefore, that 18.3 m is not an unrealistic level. I’ll go ahead and adjust that point to 18.3 m.

dat[[1]]$`Nivel18 (m)`[which.min(dat[[1]]$`Nivel18 (m)`)] <- 18.3

The same cannot be said for the low point for the level at 2 pm. That point (12.2 m) seems to be erroneous given the context of three days before and after, but isn’t likely to be a missing decimal place.

dat[[1]]$`Nivel14 (m)`[which.min(dat[[1]]$`Nivel14 (m)`) + -3:3]

#> [1] 17.05 16.88 17.46 12.20 17.60 17.34 17.20

I’ll have to set that point to missing.

dat[[1]]$`Nivel14 (m)`[which.min(dat[[1]]$`Nivel14 (m)`)] <- NA

Those two adjustments seem to take care of the data at Picota.

plot_data <- dat[[1]] %>% select(-matches("Caudal")) %>% gather(key = Time, value = Level, -Fecha)
plot_ly(data = plot_data, x = ~Fecha, y = ~Level, color = ~Time) %>% add_markers()

Now I can adjust the rest of the stations using a similar process, which I intend to automate in senamhiR in the future.

Biavo has six points that look to be decimal place errors. I am more certain about these, because they appear to lack precision, that is, all of the other measurements are measured to two decimal places, but these are only measured to one, meaning that 20.8 m was likely supposed to be 2.08 m.

dat[[2]]$`Nivel06 (m)`[c(11940, 13890)] <- 2.08 # was 20.8
dat[[2]]$`Nivel10 (m)`[10885] <- 2.08 # was 20.8
dat[[2]]$`Nivel10 (m)`[13460] <- 2.04 # was 20.4
dat[[2]]$`Nivel14 (m)`[13583] <- 2.05 # was 20.5
dat[[2]]$`Nivel14 (m)`[10008] <- 2.02 # was 20.2
dat[[2]]$`Nivel18 (m)`[13750] <- 2.04 # was 20.4

The other stations are cleaner.

dat[[3]]$`Nivel14 (m)`[2964] <- 10.12 # was 1012
dat[[4]]$`Nivel06 (m)`[4267] <- 9.52 # was 952
dat[[5]]$`Nivel10 (m)`[3940] <- 12.08 # was 1208

Questioning Biavo

I am unsure whether Biavo is interesting to me. The default OpenStreetMap tiles used in the above map do not show the stream at Biavo connecting to the Huallaga. However, it could be that the tiles are not very complete. Luckily, EOX IT Services GmbH have created a cloud-free satellite mapping layer and released it as open data. I have added an option to the senamhiR package to use this imagery in the map_stations() function. On the map below, one can pan from Biavo down to the Huallaga. Biavo appears to be a small first-order tributary to the Huallaga, with lots of agricultural land after the river gauge and before the Huallaga, as such, I don’t have a very good idea of whether Biavo will be a good proxy.

# This block won't run because it addAwesomeMarkers doesn't seem to work in hugodown
map_stations(hstations_filtered, type = "sentinel")


Let’s look at the relationship between flow at Biavo and flow at Picota. The Senamhi river data is collected at four-hour intervals from 6 am to 6 pm. I am more interested understanding extremes, so I’ll look at mean, maximum and minimum level on a given calendar day. I’ll just use the average level for this post.

dat <- collapse_pcd(dat)
dat <- dat %>%
  mutate(avg_lvl = rowMeans(select(dat, 3:6))) %>% rowwise() %>%
  mutate(max_lvl = max(`Nivel06 (m)`, `Nivel10 (m)`, `Nivel14 (m)`, `Nivel18 (m)`),
         min_lvl = min(`Nivel06 (m)`, `Nivel10 (m)`, `Nivel14 (m)`, `Nivel18 (m)`)) %>%
  select(-3:-7)

Now we can plot Biavo against Picota and perform a quick linear regression.

plot_data <- filter(dat, StationID %in% c("230715", "221804")) %>% select(Fecha, StationID, avg_lvl) %>% spread(StationID, avg_lvl)
plot_ly(data = plot_data, x = ~`230715`, y = ~`221804`) %>% add_markers() %>%
  layout(xaxis = list(title = "Picota"), yaxis = list(title = "Biavo"))