January 5, 2019

# How does a running mean affect correlation?

In many climate science studies a running mean filter is used as a low-pass filter. Authors then usually claim to find some large scale (e.g. decadal) co-variability based on running mean-filtered time series. What's the problem with that? In many climate science studies a running mean filter is used as a low-pass filter. Authors then usually claim to find some large scale (e.g. decadal) co-variability based on running mean-filtered time series. What's the problem with that?

To illustrate that, I took two time series, that have by construction a certain correlation, i.e. if X is a given random variable, then Y is constructed via

Yᵢ = cXᵢ + sqrt(1-)*e

where c is the desired correlation, i is the index, usually representing time. e follows a Gaussian Normal distribution with mean 0 and standard deviation 1. As you can see from the figure, it gets interesting once X has some autocorrelation. In that case, the variability, that is not responsible for the correlation of X and Y, is smeared out in the running mean filter and one should expect a large increase of the correlation. E.g. a true correlation of 0.4 can go easily beyond 0.6 once a running mean filter is applied.