## Before applying the metric-based methods

*Step 0: Preprocessing*

Preprocessing the original time series may include interpolation of missing points. Interpolation is necessary especially because particular metrics need equidistant data, however it may introduce spurious correlations in the timeseries. Although it is common practice in time series analysis to apply some form of data transformation (as for instance log or square root transform spiky data), one needs to be cautious with such choices for early-warning methods. The reason is that it is exactly such irregularities in the timeseries that we are interested in when estimating metrics. Thus, it is always desirable, next to any transformations, to also perform the analysis on the original timeseries.

*Step 1: Detrending- Filtering*

Prior to estimating the metrics, one wants to be sure that there are no trends in the time series. Trends imply nonstationary conditions that may infect the metrics in unwanted ways. For example, measuring variance in a rolling window that in. There are different ways to remove trends. Linear, polynomial detrending, or first-differencing are common approaches in the literature of early-warnings. In some cases we may detrend also within the rolling window, although it has been shown that there are minor differences when compared to detrending the whole time series. Another approach that we may employ is to filter the original record by applying some kernel smoothing function (like a Gaussian one) over the time series prior to the transition and we subtract it from the original record to obtain a residual timeseries. The idea behind the filtering is that we are interested in the pattern of fluctuations around equilibrium. As we have no idea of that equilibrium, we approximate it with some function. In this process one needs to be careful in how much of the variation is to be filtered. In general, we try to not overfit the data, but yet filter out the slower trends in the time series. For example, a too wide filter will remove slow trends but may lead to spurious correlations especially at the ends of the timeseries. A too narrow filter removes the short-term fluctuations the characteristics of which are supposed to be what the metrics need to quantify. In any case, it is recommended to always analyse the original timeseries also without any detrending or filtering.

*Step 2: Choosing the size of the rolling window*

The size of the rolling window is related to the timescales of the system (response times). For systems with fast timescales short windows can be appropriate, whereas systems with slow timescales require longer rolling windows for the metrics to be able to capture changes in the signature of the time series. Typically, we use rolling windows equal to half the length of the timeseries. Short rolling windows lead to irregular trends in the estimates of the metrics. Long rolling windows smooth out the trends. Also, the shorter the rolling window is, the less accurate the estimate of the metric becomes. Again, just like with filtering, there is no golden rule for the right size of the rolling window. There is a trade-off between having a long enough window to estimate the metrics and short enough to have a sufficient number of windows in order to be able to derive a trend.

## After applying the metric-based methods

*Step 3: Quantifying trends in the metrics*

Perhaps the most reliable way to determine the evolution of the metric estimates before the transition is visual inspection. This is especially so, if one combines knowledge of the particular system in order to interpret the observed changes in the metrics. In the cases where such knowledge is limited, it may be desirable to have a formal estimate of the observed trend. To do this, we can use the nonparametric Kendall *τ* rank correlation coefficient to check against the null hypothesis of randomness for a sequence of measurements against time. Alternatively we can use the Spearman’s ρ rank correlation or the Pearson’s correlation coefficient.

*Step 4: Sensitivity analysis*

The estimates of the metrics and their trends are obviously influenced by the choices of detrending/filtering and the size of the rolling window. A systematic sensitivity analysis is, therefore, necessary. A rule of thumb is to vary the size of the rolling window from a minimum to a maximum value and quantify the trends of the metrics. Similarly, if a Gaussian filtering is used, the metrics should be estimated for different filtering sizes used in the kernel function. Contour plots of bandwidth versus size of rolling window and histograms of the estimated trends can show how sensitive the trends of the metrics are to the choice of detrending and size of rolling window.

*Step 5: Comparing trends to null models*

For most time series, it will be difficult to decide whether the observed trend in the metrics would be caused by pure chance. The significance of the trend statistic that one might calculate lies only in the amount of available points for its estimation, but tells nothing over the possibility of a false positive (type I error). In that case, one might want to test for the likelihood of the trend caused by randomness. Surrogate timeseries can be of help in that. There are various ways to create a surrogate timeseries that will be used as a null hypothesis, but there is no *cookbook* *recipe *for a specific one. The final choice is dependent on the particular data set and the knowledge of the system. Here we propose 3 different null models that may be of use.

a) We can bootstrap our data sets by reshuffling the order of the time series and by picking data with replacement to generate surrogate records of similar probability distribution (mean and variance) (null 1).

b) We can produce surrogate timeseries with same autocorrelations and same density distribution as the original, to test against the null hypothesis that our time series is a realisation of a Gaussian linear stochastic process (null 2). This can be done by replicating data of the same Fourier spectrum and amplitudes as of the original time series*.*

c) We can generate surrogate sets by an AR(1) model to test against the null hypothesis that the timeseries is produced by the simplest stationary process with similar variance, mean and autocorrelation at lag 1 with the original time series (null 3). The AR(1) model is of the form *x _{t+1}=*

*α*

_{1}*x*

_{t}+*α*

_{0}*+*

*σε*

*, where*

_{t}*α*

*=*

_{1}*A(1)*,

*σ*

*=*

^{2}*v*(1-

*α*

_{1}*),*

^{2}*α*

*=*

_{0}*μ*(1-

*α*

*), with*

_{1}*v*the variance,

*μ*the mean,

*A(1)*the autocorrelation at-lag-1 from the original residual time series, and

*σ*a scaling factor for the Gaussian random error

*ε*

*.*

_{t}After choosing the particular null model(s), we can generate 1,000 surrogate series and we estimate the metric-based indicators as we did for the original time series. This procedure supplies us with 1,000 estimates of trend for each metric that helps to estimate the probability that the estimate of the trend statistic from the original time series would be observed by chance. This probability is quantified as the fraction of the 1,000 surrogate series scoring the same value of the trend statistic in the original timeseries or a higher one. Specifically, we estimate this probability as the number of cases, in which the statistic is equal or higher than the estimate of the original time series.