I have a dataset of accelerometric data that is affected by electical spikes.
I\'m looking for a good method to filter out or reduce these spikes as need to calcula
I find that for the specific problem of one-point spikes (which arises in CCD detectors when a cosmic ray discharges a single CCD cell during the exposure) the following algorithm works very well:
N=length(y);
for i=[3:1:N-2]
# calculate the means of two nearest neighbours, and two next-nearest neighbours
y1=(y(i-1)+y(i+1))/2;
y2=(y(i-2)+y(i+2))/2;
# if those two means are close, but the current point is far off, it's a spike
if ( abs(y2-y(i)) > cutoff && abs(y1-y2) < cutoff)
z(i)=y2;
endif
endfor
Selecting the best strategy for a good cutoff selection is a separate issue; I tend to have it set to a fixed value based on the typical dark counts in the CCD. One can also use separate levels for what is "close" and what is "far off", like this:
if ( abs(y2-y(i)) > cutoff_far && abs(y1-y2) < cutoff_close )
One can also select a different criterion, like the difference between two means is X times smaller than the difference with the spike data:
if ( abs(y2-y(i)) > 10*abs(y1-y2) )
Peaks that are wider than a single-point spike survive this process unmolested.
An example of de-spiked Raman spectrum using a CCD detector
A moderator merged this question with this question - that's why it looks a little messy here. This answer considers additional issues in the second question!
The following is not an entirely clean solution, the code is adopted from my previous answer, but I added an exception for your case, so you don't need to delete values at the beginning and/or end of your data manually. It discards only these invalid values, that shouldn't cause problems.
Af = csvread(strcat('example_data_with_peak.txt'),5,0);
% Thresholds
Tl = -0.04;
To = 0.04;
% initialisation
peaks = find( Af < Tl | Af > To);
counter = 0;
while ~isempty(peaks)
peaks = find( Af < Tl | Af > To);
try
Af(peaks) = ( Af(peaks-1) + Af(peaks+1) ) / 2;
catch
if peaks(1) == 1
Af(1) = 0;
else
Af(end) = 0;
end
end
counter=counter+1;
end
figure(2);
plot(Af)
For determining the threshold you could use somethink like this, but it's also quite brute-force:
thresh = 15*mean(abs(findpeaks(Af)));
For others that may need this here's what I ended up using. Here's the data file data file link
Thanks goes to @thewaywewalk
Matlab filter electical spikes in accelerometric data
clear all, clc,clf,tic
aa=csvread(strcat('/tmp/example_data_with_peak.txt'),5,0); %will skip the first 5 rows that are text and zeros
figure(1);
plot(aa)
Af=aa;
% Thresholds
Tl = -mean(abs(aa))*10
To =mean(abs(aa))*10
% initialisation
[peaks_r,peaks_c] = find( Af < Tl | Af > To);
peaks = find( Af < Tl | Af > To);
counter = 0;
while ~isempty(peaks)
peaks = find( Af < Tl | Af > To);
try
Af(peaks) = ( Af(peaks-1) + Af(peaks+1) ) / 2;
catch
if peaks(1) == 1
Af(1) = 0;
else
Af(end) = 0;
end
end
counter=counter+1;
end
counter
figure(2);
plot(Af)
Here are the images of before and after.
I have used the file despiking from the matlab central file exchange with very good effect for similar problems, though I see you've tried that as well.
Another approach I've taken is to treat the spikes as statistical outliers and removed them using this function which uses Rosner's many outlier test. (NIST site is down for obvious reasons, so here is the Google cached version)
Edited to add: I was mistaken. My despiking algorithm did not come from the file exchange function I linked to above. It was actually pulled out of a journal article (the code is listed in the supplementary information to the paper, but they didn't publish the code to the file exchange). The paper was:
Practical Methods for Noise Removal: Applications to Spikes, Nonstationary Quasi-Periodic Noise, and Baseline Drift
Delphine Feuerstein , Kim H. Parker and Martyn G. Boutelle
Anal. Chem., 2009, 81 (12), pp 4987–4994
Since the copyright is held by the American Chemical Society and the authors, I can't copy the code here, but if you have access to a university library account, you can download a copy. If you don't, I left the link to the file exchange version, but I haven't used it so I can't vouch for its efficacy.
I would suggest some local smoothing. By defining thresholds and averaging all values below and above.
Af = data.example1;
% Thresholds
Tl = -0.6;
To = 0.6;
peaks = find( Af < Tl | Af > To);
Af(peaks) = ( Af(peaks-1) + Af(peaks+1) ) / 2;
The problem with this approach is that your outliners sometimes consist of up to 6 samples. So you need to smooth in multiple steps using a while loop:
Af = data.example1;
% Thresholds
Tl = -0.6;
To = 0.6;
% initialisation
peaks = find( Af < Tl | Af > To);
counter = 0;
while ~isempty(peaks)
peaks = find( Af < Tl | Af > To);
Af(peaks) = ( Af(peaks-1) + Af(peaks+1) ) / 2;
counter=counter+1;
end
after 6 iterations you get the following result: