Utils - Surrogate data

One of the most important steps in developing and using analyses is checking to what extent results can be explained by chance. This is commonly performed using surrogate data - using a mathematical procedure to generate simulated or synthetic data that shares as much statistical similarity as the real data as possible, but without the properties that the analysis method is supposed to detect. For example, if you are interested in seeing if some correlations are real, you might generate some data with the same power spectral density but without any genuine correlations, and then check how large the correlations that randomly occur are.

The ability to generate surrogate data is provided by osl_surrogate.m and the usage is extremely simple. First, we will load in some data

D = spm_eeg_load(fullfile(osldir,'example_data','roinets_example','subject_1'));
data = D(3:7,:)'; % First two channels aren't MEG

Here, our data matrix is just 3 channels, with rows corresponding to samples and columns corresponding to channels/signals. We can see the correlations in this data, and also have a look at the frequency spectrum

figure
imagesc(corr(data));
axis equal; axis tight;
colorbar
title('Data raw correlations')

figure
pwelch(data(:,1),[],[],[],D.fsample)

Now suppose we want to generate some surrogate data. osl_surrogate() provides three different methods:

phase_randomize generates surrogate data by taking the Fourier transform, randomizing the phase angle, and then transforming back. This exactly preserves the power spectral density
ar fits an independent univariate autoregressive model to each channel, and then feeds in white noise to generate surrogate data. This approximately preserves the power spectrum, and has no genuine correlations between channels nor any nonlinear features.
mar fits a single multivariate autoregressive model to the data, and then feeds in white noise to generate surrogate data. This approximately preserves the power spectral density and cross spectral density, which means that correlations will be present, but there will not be any nonlinear or cross-frequency interactions

Note that for 'ar' and 'mar'' it is necessary to specify the order of the autoregressive model as well

surrogate_pr = osl_surrogate(data,'phase_randomization');
surrogate_ar = osl_surrogate(data,'ar',4);
surrogate_mar = osl_surrogate(data,'mar',4);

We can then examine the correlations and power spectra in the data

figure
subplot(1,3,1)
imagesc(corr(surrogate_pr));
axis equal; axis tight;
colorbar
title('Phase randomized')
subplot(1,3,2)
imagesc(corr(surrogate_ar));
axis equal; axis tight;
colorbar
title('Univariate autoregressive')
subplot(1,3,3)
imagesc(corr(surrogate_mar));
axis equal; axis tight;
colorbar
title('Multivariate autoregressive')
set(gcf,'Position', [440   642   911   156])

figure
subplot(1,3,1)
pwelch(surrogate_pr(:,1),[],[],[],D.fsample)
colorbar
title('Phase randomized')
subplot(1,3,2)
pwelch(surrogate_ar(:,1),[],[],[],D.fsample)
colorbar
title('Univariate autoregressive')
subplot(1,3,3)
pwelch(surrogate_mar(:,1),[],[],[],D.fsample)
colorbar
title('Multivariate autoregressive')
set(gcf,'Position', [440   642   911   156])

The quality of the spectrum in the autoregressive models depends on the model order. In general, higher orders give better results, and the univariate model is more robust at fitting the spectrum of an individual channel than the multivariate model

figure
subplot(1,3,1)
pwelch(osl_surrogate(data(:,1),'ar',4),[],[],[],D.fsample)
colorbar
title('Univariate order=4')
subplot(1,3,2)
pwelch(osl_surrogate(data(:,1),'ar',10),[],[],[],D.fsample)
colorbar
title('Univariate order=10')
subplot(1,3,3)
pwelch(osl_surrogate(data(:,1),'ar',40),[],[],[],D.fsample)
colorbar
title('Univariate order=40')
set(gcf,'Position', [440   642   911   156])