# How does a running mean affect correlation?

In many climate science studies a running mean filter is used as a low-pass filter. Authors then usually claim to find some large scale (e.g. decadal) co-variability based on running mean-filtered time series. What's the problem with that?

In many climate science studies a running mean filter is used as a low-pass filter. Authors then usually claim to find some large scale (e.g. decadal) co-variability based on running mean-filtered time series. What's the problem with that?

To illustrate that, I took two time series, that have by construction a certain correlation, i.e. if *X* is a given random variable, then *Y* is constructed via

*Yᵢ* = *c**Xᵢ* + sqrt(1-*c²*)**e*

where *c* is the desired correlation, *i* is the index, usually representing time. *e* follows a Gaussian Normal distribution with mean 0 and standard deviation 1. As you can see from the figure, it gets interesting once *X* has some autocorrelation. In that case, the variability, that is not responsible for the correlation of *X* and *Y*, is smeared out in the running mean filter and one should expect a large increase of the correlation. E.g. a true correlation of 0.4 can go easily beyond 0.6 once a running mean filter is applied.