Accounting for missing data in monthly temperature series: Testing rule-of-thumb omission of months with missing values


The ‘3/5 rule’ is a commonly used rule-of-thumb for dealing with missing data when calculating monthly climate normals. The rule states that any month that is missing more than 3 consecutive daily values, or more than 5 daily values in total, should not be included in calculated monthly climate normals. We quantify the impact of missing data in a given year–month for between 1 and 25 missing values. As such, we describe the error the ‘3/5 rule’ (and a related rule that we have dubbed the ‘4/10 rule’) permits. We tested the statistical robustness of these rules using observed temperature data from a temperate station and a tropical station. We show that, for observed data, the ‘3/5 rule’ permits an average of between 0.06 and 0.07 standard deviations of error in the calculated monthly mean (ε) when 3 consecutive or 5 random values are missing. For its part, the ‘4/10 rule’ permits a maximum ε of between 0.07 and 0.09 when four consecutive values are missing, or up to 0.10 when ten random values are missing. The proportional impact of missing values was similar across variables. We performed a correlation analysis, and show that each additional missing value from a year–month of data increases ε by between 0.008 and 0.018 for up to 19 missing values. There is a significant relationship between the lag-1 autocorrelation of a year–month, and ε. ε can be reduced by simple linear interpolation when values are missing at random and the year–month exhibits lag-1 autocorrelation. Overall, we find that the application of any "rule of thumb" should be based on the particular characteristics of the source data and the goals of the research project.

International Journal of Climatology
Conor I. Anderson, PhD
Alumnus, Climate Lab

Conor is a recent PhD graduate from the Department of Physical and Environmental Sciences at the University of Toronto Scarborough (UTSC).